Converting dictionary of lists of dictionaries to a dataframe

Question:

Say I have a dict defined as:

dict = {'1': [{'name': 'Hospital 0',
               'students': 5,
               'grad': 71},
                    
              {'name': 'Hospital 1',
               'students': 8,
               'grad': 74}],
        
        '2': [{'name': 'Hospital 0',
               'students': 11,
               'grad': 72}]
                    
               {'name': 'Hospital 1',
               'students': 10,
               'grad': 78}]}

Suppose I want to make a dataframe from this formatted as follows:

step name students grad
1 Hospital 0 5 71
1 Hospital 1 8 74
2 Hospital 0 11 72
2 Hospital 1 10 78

Do you guys have any ideas?

Asked By: PurpleSky

||

Answers:

— Try to use the pandas.DataFrame,
The headers, [step name students grad]

import pandas as pd

data = []

for key, value in dict.items():
    for elem in value:
        row = {
            'Step': key,
            'Hospital Name': elem['name'],
            'Students': elem['students'],
            'Grad': elem['grad']
        }
        data. Append(row)

df = pd.DataFrame(data)
Answered By: Hope

Here is an approach using json_normalize()
Note: I am using data as variable name instead of dict which is python built-in function.

from pandas import json_normalize
import pandas as pd 

dfs = [json_normalize(data[key]).assign(step=key) for key in data if "name" in data[key][0]]
df = pd.concat(dfs, ignore_index=True)
df = df[["step", "name", "students", "grad"]]
print(df)

  step        name  students  grad
0    1  Hospital 0         5    71
1    1  Hospital 1         8    74
2    2  Hospital 0        11    72
3    2  Hospital 1        10    78
Answered By: Jamiu S.

Here is some documentation on Pandas DataFrames:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

You can also get documentation from the Python shell:

import pandas as pd
help(pd.DataFrame)

The documentation gives this example:

 |  Examples
 |  --------
 |  Constructing DataFrame from a dictionary.
 |  
 |  >>> d = {'col1': [1, 2], 'col2': [3, 4]}
 |  >>> df = pd.DataFrame(data=d)
 |  >>> df
 |     col1  col2
 |  0     1     3
 |  1     2     4

We can format your data in a slightly different way to make it easier.

% python
>>> import pandas as pd
>>> d = {}
>>> d['step'] = [1, 1, 2, 2]
>>> d['name'] = ['Hospital 0', 'Hospital 1', 'Hospital 0', 'Hospital 1']
>>> d['students'] = [5, 8, 11, 10]
>>> d['grad'] = [71, 74, 72, 78]
>>> df = pd.DataFrame(d)
>>> print(df.to_string(index=False))
 step        name  students  grad
    1  Hospital 0         5    71
    1  Hospital 1         8    74
    2  Hospital 0        11    72
    2  Hospital 1        10    78

One solution is to structure the dictionary so that it meets the requirements of the DataFrame constructor. The code above is based on the example from the Pandas documentation.

Answered By: ktm5124

using pandas library seems the best option for your issue. Hope the code below will be helpful.

import pandas as pd
df =pd.DataFrame(columns=['step','name','students','grad'])
keys_values = list(dicta.keys())
ind = 0
for key in keys_values:
    rows = dicta[key]
    for row in rows:
        df.loc[ind] = [key, row['name'], row['students'], row['grad']]
        ind += 1
print(df)