Converting dictionary of lists of dictionaries to a dataframe
Question:
Say I have a dict defined as:
dict = {'1': [{'name': 'Hospital 0',
'students': 5,
'grad': 71},
{'name': 'Hospital 1',
'students': 8,
'grad': 74}],
'2': [{'name': 'Hospital 0',
'students': 11,
'grad': 72}]
{'name': 'Hospital 1',
'students': 10,
'grad': 78}]}
Suppose I want to make a dataframe from this formatted as follows:
step
name
students
grad
1
Hospital 0
5
71
1
Hospital 1
8
74
2
Hospital 0
11
72
2
Hospital 1
10
78
Do you guys have any ideas?
Answers:
— Try to use the pandas.DataFrame,
The headers, [step name students grad]
import pandas as pd
data = []
for key, value in dict.items():
for elem in value:
row = {
'Step': key,
'Hospital Name': elem['name'],
'Students': elem['students'],
'Grad': elem['grad']
}
data. Append(row)
df = pd.DataFrame(data)
Here is an approach using json_normalize()
Note: I am using data
as variable name instead of dict
which is python built-in function.
from pandas import json_normalize
import pandas as pd
dfs = [json_normalize(data[key]).assign(step=key) for key in data if "name" in data[key][0]]
df = pd.concat(dfs, ignore_index=True)
df = df[["step", "name", "students", "grad"]]
print(df)
step name students grad
0 1 Hospital 0 5 71
1 1 Hospital 1 8 74
2 2 Hospital 0 11 72
3 2 Hospital 1 10 78
Here is some documentation on Pandas DataFrames:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html
You can also get documentation from the Python shell:
import pandas as pd
help(pd.DataFrame)
The documentation gives this example:
| Examples
| --------
| Constructing DataFrame from a dictionary.
|
| >>> d = {'col1': [1, 2], 'col2': [3, 4]}
| >>> df = pd.DataFrame(data=d)
| >>> df
| col1 col2
| 0 1 3
| 1 2 4
We can format your data in a slightly different way to make it easier.
% python
>>> import pandas as pd
>>> d = {}
>>> d['step'] = [1, 1, 2, 2]
>>> d['name'] = ['Hospital 0', 'Hospital 1', 'Hospital 0', 'Hospital 1']
>>> d['students'] = [5, 8, 11, 10]
>>> d['grad'] = [71, 74, 72, 78]
>>> df = pd.DataFrame(d)
>>> print(df.to_string(index=False))
step name students grad
1 Hospital 0 5 71
1 Hospital 1 8 74
2 Hospital 0 11 72
2 Hospital 1 10 78
One solution is to structure the dictionary so that it meets the requirements of the DataFrame constructor. The code above is based on the example from the Pandas documentation.
using pandas library seems the best option for your issue. Hope the code below will be helpful.
import pandas as pd
df =pd.DataFrame(columns=['step','name','students','grad'])
keys_values = list(dicta.keys())
ind = 0
for key in keys_values:
rows = dicta[key]
for row in rows:
df.loc[ind] = [key, row['name'], row['students'], row['grad']]
ind += 1
print(df)
Say I have a dict defined as:
dict = {'1': [{'name': 'Hospital 0',
'students': 5,
'grad': 71},
{'name': 'Hospital 1',
'students': 8,
'grad': 74}],
'2': [{'name': 'Hospital 0',
'students': 11,
'grad': 72}]
{'name': 'Hospital 1',
'students': 10,
'grad': 78}]}
Suppose I want to make a dataframe from this formatted as follows:
step | name | students | grad |
---|---|---|---|
1 | Hospital 0 | 5 | 71 |
1 | Hospital 1 | 8 | 74 |
2 | Hospital 0 | 11 | 72 |
2 | Hospital 1 | 10 | 78 |
Do you guys have any ideas?
— Try to use the pandas.DataFrame,
The headers, [step name students grad]
import pandas as pd
data = []
for key, value in dict.items():
for elem in value:
row = {
'Step': key,
'Hospital Name': elem['name'],
'Students': elem['students'],
'Grad': elem['grad']
}
data. Append(row)
df = pd.DataFrame(data)
Here is an approach using json_normalize()
Note: I am using data
as variable name instead of dict
which is python built-in function.
from pandas import json_normalize
import pandas as pd
dfs = [json_normalize(data[key]).assign(step=key) for key in data if "name" in data[key][0]]
df = pd.concat(dfs, ignore_index=True)
df = df[["step", "name", "students", "grad"]]
print(df)
step name students grad
0 1 Hospital 0 5 71
1 1 Hospital 1 8 74
2 2 Hospital 0 11 72
3 2 Hospital 1 10 78
Here is some documentation on Pandas DataFrames:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html
You can also get documentation from the Python shell:
import pandas as pd
help(pd.DataFrame)
The documentation gives this example:
| Examples
| --------
| Constructing DataFrame from a dictionary.
|
| >>> d = {'col1': [1, 2], 'col2': [3, 4]}
| >>> df = pd.DataFrame(data=d)
| >>> df
| col1 col2
| 0 1 3
| 1 2 4
We can format your data in a slightly different way to make it easier.
% python
>>> import pandas as pd
>>> d = {}
>>> d['step'] = [1, 1, 2, 2]
>>> d['name'] = ['Hospital 0', 'Hospital 1', 'Hospital 0', 'Hospital 1']
>>> d['students'] = [5, 8, 11, 10]
>>> d['grad'] = [71, 74, 72, 78]
>>> df = pd.DataFrame(d)
>>> print(df.to_string(index=False))
step name students grad
1 Hospital 0 5 71
1 Hospital 1 8 74
2 Hospital 0 11 72
2 Hospital 1 10 78
One solution is to structure the dictionary so that it meets the requirements of the DataFrame constructor. The code above is based on the example from the Pandas documentation.
using pandas library seems the best option for your issue. Hope the code below will be helpful.
import pandas as pd
df =pd.DataFrame(columns=['step','name','students','grad'])
keys_values = list(dicta.keys())
ind = 0
for key in keys_values:
rows = dicta[key]
for row in rows:
df.loc[ind] = [key, row['name'], row['students'], row['grad']]
ind += 1
print(df)