How to make Pandas unpack JSON data into proper DataFrame instead of list of dicts

Question:

I’m trying to parse the data at http://dev.hsl.fi/tmp/citybikes/stations_20170503T071501Z into a Pandas DataFrame. Using read_json gives me a list of dicts instead of a proper DataFrame with the variable names as columns:

In [1]:

data = pd.read_json("http://dev.hsl.fi/tmp/citybikes/stations_20170503T071501Z")
print(data)

Out[1]:

                                                result
0    {'name': '001 Kaivopuisto', 'coordinates': '60...
1    {'name': '002 Laivasillankatu', 'coordinates':...
..                                                 ...
149  {'name': '160 Nokkala', 'coordinates': '60.147...
150  {'name': '997 Workshop Helsinki', 'coordinates...

[151 rows x 1 columns]

This happens with all orient option. I’ve tried json_normalize() to no avail as well and a few other things I found here. How could I make this into a sensible DataFrame? Thanks!

Asked By: basse

||

Answers:

Option 1
Use pd.DataFrame on the list of dictionaries

pd.DataFrame(data['result'].values.tolist())

   avl_bikes          coordinates  free_slots                    name  operative style  total_slots
0         12  60.155411,24.950391          18         001 Kaivopuisto       True    CB           30
1          3  60.159715,24.955212           9     002 Laivasillankatu       True                 12
2          0  60.158172,24.944808          16  003 Kapteeninpuistikko       True                 16
3          0  60.160944,24.941859          14           004 Viiskulma       True                 14
4         16  60.157935,24.936083          16           005 Sepänkatu       True                 32

Option 2
Use apply

data.result.apply(pd.Series)

   avl_bikes          coordinates  free_slots                    name  operative style  total_slots
0         12  60.155411,24.950391          18         001 Kaivopuisto       True    CB           30
1          3  60.159715,24.955212           9     002 Laivasillankatu       True                 12
2          0  60.158172,24.944808          16  003 Kapteeninpuistikko       True                 16
3          0  60.160944,24.941859          14           004 Viiskulma       True                 14
4         16  60.157935,24.936083          16           005 Sepänkatu       True                 32

Option 3
Or you could fetch the json yourself and strip out the results

import urllib, json
url = "http://dev.hsl.fi/tmp/citybikes/stations_20170503T071501Z"
response = urllib.request.urlopen(url)
data = json.loads(response.read())

df = pd.DataFrame(data['result'])
df

   avl_bikes          coordinates  free_slots                    name  operative style  total_slots
0         12  60.155411,24.950391          18         001 Kaivopuisto       True    CB           30
1          3  60.159715,24.955212           9     002 Laivasillankatu       True                 12
2          0  60.158172,24.944808          16  003 Kapteeninpuistikko       True                 16
3          0  60.160944,24.941859          14           004 Viiskulma       True                 14
4         16  60.157935,24.936083          16           005 Sepänkatu       True                 32
Answered By: piRSquared

The approaches in the accepted answer work great, so this is just a more recent (2022) FYI:

In later versions of Pandas (1.0>), you can also use json_normalize (documentation).

json_obj = {
    
    'key': 123,
    'field1': 'blah',
    'info': {
        'contacts': {
          'email': {
              'foo': '[email protected]',
              'bar': '[email protected]'
          },
          'tel': '123456789',
      }
    }
}


pd.json_normalize(json_obj)
Answered By: Peter
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.