How to get a list of dictionary in a single column of pandas dataframe
Question:
Hi I have a dataframe in 1NF form which I wanted to change to different format where I can access those values.
This is how my dataframe looks like
Code
ProvType
Alias
spec_code
1A12
A
Hi
1
1A12
B
Hi
2
1A12
B
Hola
2
1A12
A
Pola
3
1b32
C
Cola
7
1b32
D
Cola
6
1b32
A
Mola
1
Code
aliasList
1A12
[{alias:Hi,provtypelist:[A,B],specCodeList:[1,2]},{alias:Hola,provtypelist:[B],specCodeList:[1]},{alias:Pola,provtypelist:[A],specCodeList:[3]}]
1b32
[{alias:Cola,provtypelist:[C,D],specCodeList:[7,6]},{alias:Mola,provtypelist:[A],specCodeList:[1]}]
I want my dataframe to look like this. Dont know how the code/groupby will look like so any help is appreciated in this.
The reason i want my dataframe to look like this is so that I can insert that aliasList column into opensearch index with nested datatype.
Another way will also be appreciated.
Answers:
You can use groupby.apply
with to_dict
:
out = (
# first aggregate ProvType/spec_code to lists
df.groupby(['Code', 'Alias'], as_index=False).agg(list)
# then convert to dictionary per Code
.groupby('Code').apply(lambda g: g.drop(columns='Code').to_dict('records'))
.reset_index(name='aliasList')
)
Output:
Code aliasList
0 1A12 [{'Alias': 'Hi', 'ProvType': ['A', 'B'], 'spec_code': [1, 2]}, {'Alias': 'Hola', 'ProvType': ['B'], 'spec_code': [2]}, {'Alias': 'Pola', 'ProvType': ['A'], 'spec_code': [3]}]
1 1b32 [{'Alias': 'Cola', 'ProvType': ['C', 'D'], 'spec_code': [7, 6]}, {'Alias': 'Mola', 'ProvType': ['A'], 'spec_code': [1]}]
First list at index 0 for clarity:
[{'Alias': 'Hi', 'ProvType': ['A', 'B'], 'spec_code': [1, 2]},
{'Alias': 'Hola', 'ProvType': ['B'], 'spec_code': [2]},
{'Alias': 'Pola', 'ProvType': ['A'], 'spec_code': [3]}]
You can use groupby and apply functions for this task..
import pandas as pd
# your data..
data = {'Code': ['1A12', '1A12', '1A12', '1A12', '1b32', '1b32', '1b32'],
'ProvType': ['A', 'B', 'B', 'A', 'C', 'D', 'A'],
'Alias': ['Hi', 'Hi', 'Hola', 'Pola', 'Cola', 'Cola', 'Mola'],
'spec_code': [1, 2, 2, 3, 7, 6, 1]}
df = pd.DataFrame(data)
# group by Code and Alias, aggregate ProvType and spec_code into lists
grouped = df.groupby(['Code', 'Alias']).agg({'ProvType': list, 'spec_code': list}).reset_index()
# create new column 'aliasList'
grouped['aliasList'] = grouped.apply(lambda x: {'alias': x['Alias'],
'provtypelist': x['ProvType'],
'specCodeList': x['spec_code']}, axis=1)
# group by Code again and aggregate aliasList into a list
final_df = grouped.groupby('Code').agg({'aliasList': list}).reset_index()
print(final_df)
Hi I have a dataframe in 1NF form which I wanted to change to different format where I can access those values.
This is how my dataframe looks like
Code | ProvType | Alias | spec_code |
---|---|---|---|
1A12 | A | Hi | 1 |
1A12 | B | Hi | 2 |
1A12 | B | Hola | 2 |
1A12 | A | Pola | 3 |
1b32 | C | Cola | 7 |
1b32 | D | Cola | 6 |
1b32 | A | Mola | 1 |
Code | aliasList |
---|---|
1A12 | [{alias:Hi,provtypelist:[A,B],specCodeList:[1,2]},{alias:Hola,provtypelist:[B],specCodeList:[1]},{alias:Pola,provtypelist:[A],specCodeList:[3]}] |
1b32 | [{alias:Cola,provtypelist:[C,D],specCodeList:[7,6]},{alias:Mola,provtypelist:[A],specCodeList:[1]}] |
I want my dataframe to look like this. Dont know how the code/groupby will look like so any help is appreciated in this.
The reason i want my dataframe to look like this is so that I can insert that aliasList column into opensearch index with nested datatype.
Another way will also be appreciated.
You can use groupby.apply
with to_dict
:
out = (
# first aggregate ProvType/spec_code to lists
df.groupby(['Code', 'Alias'], as_index=False).agg(list)
# then convert to dictionary per Code
.groupby('Code').apply(lambda g: g.drop(columns='Code').to_dict('records'))
.reset_index(name='aliasList')
)
Output:
Code aliasList
0 1A12 [{'Alias': 'Hi', 'ProvType': ['A', 'B'], 'spec_code': [1, 2]}, {'Alias': 'Hola', 'ProvType': ['B'], 'spec_code': [2]}, {'Alias': 'Pola', 'ProvType': ['A'], 'spec_code': [3]}]
1 1b32 [{'Alias': 'Cola', 'ProvType': ['C', 'D'], 'spec_code': [7, 6]}, {'Alias': 'Mola', 'ProvType': ['A'], 'spec_code': [1]}]
First list at index 0 for clarity:
[{'Alias': 'Hi', 'ProvType': ['A', 'B'], 'spec_code': [1, 2]},
{'Alias': 'Hola', 'ProvType': ['B'], 'spec_code': [2]},
{'Alias': 'Pola', 'ProvType': ['A'], 'spec_code': [3]}]
You can use groupby and apply functions for this task..
import pandas as pd
# your data..
data = {'Code': ['1A12', '1A12', '1A12', '1A12', '1b32', '1b32', '1b32'],
'ProvType': ['A', 'B', 'B', 'A', 'C', 'D', 'A'],
'Alias': ['Hi', 'Hi', 'Hola', 'Pola', 'Cola', 'Cola', 'Mola'],
'spec_code': [1, 2, 2, 3, 7, 6, 1]}
df = pd.DataFrame(data)
# group by Code and Alias, aggregate ProvType and spec_code into lists
grouped = df.groupby(['Code', 'Alias']).agg({'ProvType': list, 'spec_code': list}).reset_index()
# create new column 'aliasList'
grouped['aliasList'] = grouped.apply(lambda x: {'alias': x['Alias'],
'provtypelist': x['ProvType'],
'specCodeList': x['spec_code']}, axis=1)
# group by Code again and aggregate aliasList into a list
final_df = grouped.groupby('Code').agg({'aliasList': list}).reset_index()
print(final_df)