How to get a list of dictionary in a single column of pandas dataframe

Question:

Hi I have a dataframe in 1NF form which I wanted to change to different format where I can access those values.

This is how my dataframe looks like

Code ProvType Alias spec_code
1A12 A Hi 1
1A12 B Hi 2
1A12 B Hola 2
1A12 A Pola 3
1b32 C Cola 7
1b32 D Cola 6
1b32 A Mola 1
Code aliasList
1A12 [{alias:Hi,provtypelist:[A,B],specCodeList:[1,2]},{alias:Hola,provtypelist:[B],specCodeList:[1]},{alias:Pola,provtypelist:[A],specCodeList:[3]}]
1b32 [{alias:Cola,provtypelist:[C,D],specCodeList:[7,6]},{alias:Mola,provtypelist:[A],specCodeList:[1]}]

I want my dataframe to look like this. Dont know how the code/groupby will look like so any help is appreciated in this.

The reason i want my dataframe to look like this is so that I can insert that aliasList column into opensearch index with nested datatype.

Another way will also be appreciated.

Asked By: BrownBatman

||

Answers:

You can use groupby.apply with to_dict:

out = (
  # first aggregate ProvType/spec_code to lists
  df.groupby(['Code', 'Alias'], as_index=False).agg(list)
  # then convert to dictionary per Code
   .groupby('Code').apply(lambda g: g.drop(columns='Code').to_dict('records'))
   .reset_index(name='aliasList')
)

Output:

   Code                                                                                                                                                                       aliasList
0  1A12  [{'Alias': 'Hi', 'ProvType': ['A', 'B'], 'spec_code': [1, 2]}, {'Alias': 'Hola', 'ProvType': ['B'], 'spec_code': [2]}, {'Alias': 'Pola', 'ProvType': ['A'], 'spec_code': [3]}]
1  1b32                                                        [{'Alias': 'Cola', 'ProvType': ['C', 'D'], 'spec_code': [7, 6]}, {'Alias': 'Mola', 'ProvType': ['A'], 'spec_code': [1]}]

First list at index 0 for clarity:

[{'Alias': 'Hi', 'ProvType': ['A', 'B'], 'spec_code': [1, 2]},
 {'Alias': 'Hola', 'ProvType': ['B'], 'spec_code': [2]},
 {'Alias': 'Pola', 'ProvType': ['A'], 'spec_code': [3]}]
Answered By: mozway

You can use groupby and apply functions for this task..

import pandas as pd

# your data..
data = {'Code': ['1A12', '1A12', '1A12', '1A12', '1b32', '1b32', '1b32'],
        'ProvType': ['A', 'B', 'B', 'A', 'C', 'D', 'A'],
        'Alias': ['Hi', 'Hi', 'Hola', 'Pola', 'Cola', 'Cola', 'Mola'],
        'spec_code': [1, 2, 2, 3, 7, 6, 1]}
df = pd.DataFrame(data)

# group by Code and Alias, aggregate ProvType and spec_code into lists
grouped = df.groupby(['Code', 'Alias']).agg({'ProvType': list, 'spec_code': list}).reset_index()

# create new column 'aliasList'
grouped['aliasList'] = grouped.apply(lambda x: {'alias': x['Alias'],
                                                 'provtypelist': x['ProvType'],
                                                 'specCodeList': x['spec_code']}, axis=1)

# group by Code again and aggregate aliasList into a list
final_df = grouped.groupby('Code').agg({'aliasList': list}).reset_index()

print(final_df)

Answered By: Beatdown