pandas json dictionary to dataframe, reducing columns by creating new columns

Question:

Following JSON File (raw data how I am getting it back from an API call):

{
    "code": "200000",
    "data": {
        "A": "0.43221600",
        "B": "0.02311155",
        "C": "0.55057515",
        "D": "2.15957924",
        "E": "0.03818908",
        "F": "0.26853420",
        "G": "0.15007500",
        "H": "0.00685843",
        "I": "0.08500848"
    }
}

Will crate this output in Pandas by using this code (one column per data set in "data"). The result is a dataframe with many columns:

import pandas as pd
import json 
f = open('file.json', 'r')
j1 = json.load(f)
pd.json_normalize(j1)

    code    data.A  data.B  data.C  data.D  data.E  data.F  data.G  data.H  data.I
0   200000  0.43221600  0.02311155  0.55057515  2.15957924  0.03818908  0.26853420  0.15007500  0.00685843  0.08500848

I guess that Pandas should provide a built in function of the data set in the attribute "data" could be split in two new columns with names "name" and value, including a new index. But I cannot figure out how that works.

I would need this output:

    name    value
0   A       0.43221600
1   B       0.02311155
2   C       0.55057515
3   D       2.15957924
4   E       0.03818908
5   F       0.26853420
6   G       0.15007500
7   H       0.00685843
8   I       0.08500848
Asked By: josoko

||

Answers:

Simpliest is use DataFrame constructor:

df = pd.DataFrame({'name': j1['data'].keys(),
                  'value': j1['data'].values()})
print (df)
  name       value
0    A  0.43221600
1    B  0.02311155
2    C  0.55057515
3    D  2.15957924
4    E  0.03818908
5    F  0.26853420
6    G  0.15007500
7    H  0.00685843
8    I  0.08500848

Or:

df = pd.DataFrame(j1['data'].items(), columns=['name','value'])
print (df)
  name       value
0    A  0.43221600
1    B  0.02311155
2    C  0.55057515
3    D  2.15957924
4    E  0.03818908
5    F  0.26853420
6    G  0.15007500
7    H  0.00685843
8    I  0.08500848

If need json_normalize solution is:

df = pd.json_normalize(j1['data'])

df = df.T.rename_axis('name')[0].reset_index(name='value')
print (df)
  name       value
0    A  0.43221600
1    B  0.02311155
2    C  0.55057515
3    D  2.15957924
4    E  0.03818908
5    F  0.26853420
6    G  0.15007500
7    H  0.00685843
8    I  0.08500848

EDIT: Added solution for code column:

df = pd.json_normalize(j1)

df = df.melt('code', var_name='name')
df['name'] = df['name'].str.extract('.(.*)$')
print (df)
     code name       value
0  200000    A  0.43221600
1  200000    B  0.02311155
2  200000    C  0.55057515
3  200000    D  2.15957924
4  200000    E  0.03818908
5  200000    F  0.26853420
6  200000    G  0.15007500
7  200000    H  0.00685843
8  200000    I  0.08500848
Answered By: jezrael

You can reuse j1:

df = pd.DataFrame(j1['data'].items(), columns=['name', 'value'])
print(df)

# Output
  name       value
0    A  0.43221600
1    B  0.02311155
2    C  0.55057515
3    D  2.15957924
4    E  0.03818908
5    F  0.26853420
6    G  0.15007500
7    H  0.00685843
8    I  0.08500848
Answered By: Corralien

pd.DataFrame.from_dict(j1)

should give you the result you need

Answered By: geert_the_engineer
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.