pandas json dictionary to dataframe, reducing columns by creating new columns
Question:
Following JSON File (raw data how I am getting it back from an API call):
{
"code": "200000",
"data": {
"A": "0.43221600",
"B": "0.02311155",
"C": "0.55057515",
"D": "2.15957924",
"E": "0.03818908",
"F": "0.26853420",
"G": "0.15007500",
"H": "0.00685843",
"I": "0.08500848"
}
}
Will crate this output in Pandas by using this code (one column per data set in "data"). The result is a dataframe with many columns:
import pandas as pd
import json
f = open('file.json', 'r')
j1 = json.load(f)
pd.json_normalize(j1)
code data.A data.B data.C data.D data.E data.F data.G data.H data.I
0 200000 0.43221600 0.02311155 0.55057515 2.15957924 0.03818908 0.26853420 0.15007500 0.00685843 0.08500848
I guess that Pandas should provide a built in function of the data set in the attribute "data" could be split in two new columns with names "name" and value, including a new index. But I cannot figure out how that works.
I would need this output:
name value
0 A 0.43221600
1 B 0.02311155
2 C 0.55057515
3 D 2.15957924
4 E 0.03818908
5 F 0.26853420
6 G 0.15007500
7 H 0.00685843
8 I 0.08500848
Answers:
Simpliest is use DataFrame constructor:
df = pd.DataFrame({'name': j1['data'].keys(),
'value': j1['data'].values()})
print (df)
name value
0 A 0.43221600
1 B 0.02311155
2 C 0.55057515
3 D 2.15957924
4 E 0.03818908
5 F 0.26853420
6 G 0.15007500
7 H 0.00685843
8 I 0.08500848
Or:
df = pd.DataFrame(j1['data'].items(), columns=['name','value'])
print (df)
name value
0 A 0.43221600
1 B 0.02311155
2 C 0.55057515
3 D 2.15957924
4 E 0.03818908
5 F 0.26853420
6 G 0.15007500
7 H 0.00685843
8 I 0.08500848
If need json_normalize
solution is:
df = pd.json_normalize(j1['data'])
df = df.T.rename_axis('name')[0].reset_index(name='value')
print (df)
name value
0 A 0.43221600
1 B 0.02311155
2 C 0.55057515
3 D 2.15957924
4 E 0.03818908
5 F 0.26853420
6 G 0.15007500
7 H 0.00685843
8 I 0.08500848
EDIT: Added solution for code
column:
df = pd.json_normalize(j1)
df = df.melt('code', var_name='name')
df['name'] = df['name'].str.extract('.(.*)$')
print (df)
code name value
0 200000 A 0.43221600
1 200000 B 0.02311155
2 200000 C 0.55057515
3 200000 D 2.15957924
4 200000 E 0.03818908
5 200000 F 0.26853420
6 200000 G 0.15007500
7 200000 H 0.00685843
8 200000 I 0.08500848
You can reuse j1
:
df = pd.DataFrame(j1['data'].items(), columns=['name', 'value'])
print(df)
# Output
name value
0 A 0.43221600
1 B 0.02311155
2 C 0.55057515
3 D 2.15957924
4 E 0.03818908
5 F 0.26853420
6 G 0.15007500
7 H 0.00685843
8 I 0.08500848
pd.DataFrame.from_dict(j1)
should give you the result you need
Following JSON File (raw data how I am getting it back from an API call):
{
"code": "200000",
"data": {
"A": "0.43221600",
"B": "0.02311155",
"C": "0.55057515",
"D": "2.15957924",
"E": "0.03818908",
"F": "0.26853420",
"G": "0.15007500",
"H": "0.00685843",
"I": "0.08500848"
}
}
Will crate this output in Pandas by using this code (one column per data set in "data"). The result is a dataframe with many columns:
import pandas as pd
import json
f = open('file.json', 'r')
j1 = json.load(f)
pd.json_normalize(j1)
code data.A data.B data.C data.D data.E data.F data.G data.H data.I
0 200000 0.43221600 0.02311155 0.55057515 2.15957924 0.03818908 0.26853420 0.15007500 0.00685843 0.08500848
I guess that Pandas should provide a built in function of the data set in the attribute "data" could be split in two new columns with names "name" and value, including a new index. But I cannot figure out how that works.
I would need this output:
name value
0 A 0.43221600
1 B 0.02311155
2 C 0.55057515
3 D 2.15957924
4 E 0.03818908
5 F 0.26853420
6 G 0.15007500
7 H 0.00685843
8 I 0.08500848
Simpliest is use DataFrame constructor:
df = pd.DataFrame({'name': j1['data'].keys(),
'value': j1['data'].values()})
print (df)
name value
0 A 0.43221600
1 B 0.02311155
2 C 0.55057515
3 D 2.15957924
4 E 0.03818908
5 F 0.26853420
6 G 0.15007500
7 H 0.00685843
8 I 0.08500848
Or:
df = pd.DataFrame(j1['data'].items(), columns=['name','value'])
print (df)
name value
0 A 0.43221600
1 B 0.02311155
2 C 0.55057515
3 D 2.15957924
4 E 0.03818908
5 F 0.26853420
6 G 0.15007500
7 H 0.00685843
8 I 0.08500848
If need json_normalize
solution is:
df = pd.json_normalize(j1['data'])
df = df.T.rename_axis('name')[0].reset_index(name='value')
print (df)
name value
0 A 0.43221600
1 B 0.02311155
2 C 0.55057515
3 D 2.15957924
4 E 0.03818908
5 F 0.26853420
6 G 0.15007500
7 H 0.00685843
8 I 0.08500848
EDIT: Added solution for code
column:
df = pd.json_normalize(j1)
df = df.melt('code', var_name='name')
df['name'] = df['name'].str.extract('.(.*)$')
print (df)
code name value
0 200000 A 0.43221600
1 200000 B 0.02311155
2 200000 C 0.55057515
3 200000 D 2.15957924
4 200000 E 0.03818908
5 200000 F 0.26853420
6 200000 G 0.15007500
7 200000 H 0.00685843
8 200000 I 0.08500848
You can reuse j1
:
df = pd.DataFrame(j1['data'].items(), columns=['name', 'value'])
print(df)
# Output
name value
0 A 0.43221600
1 B 0.02311155
2 C 0.55057515
3 D 2.15957924
4 E 0.03818908
5 F 0.26853420
6 G 0.15007500
7 H 0.00685843
8 I 0.08500848
pd.DataFrame.from_dict(j1)
should give you the result you need