How to make Pandas unpack JSON data into proper DataFrame instead of list of dicts
Question:
I’m trying to parse the data at http://dev.hsl.fi/tmp/citybikes/stations_20170503T071501Z into a Pandas DataFrame. Using read_json
gives me a list of dicts instead of a proper DataFrame with the variable names as columns:
In [1]:
data = pd.read_json("http://dev.hsl.fi/tmp/citybikes/stations_20170503T071501Z")
print(data)
Out[1]:
result
0 {'name': '001 Kaivopuisto', 'coordinates': '60...
1 {'name': '002 Laivasillankatu', 'coordinates':...
.. ...
149 {'name': '160 Nokkala', 'coordinates': '60.147...
150 {'name': '997 Workshop Helsinki', 'coordinates...
[151 rows x 1 columns]
This happens with all orient
option. I’ve tried json_normalize()
to no avail as well and a few other things I found here. How could I make this into a sensible DataFrame? Thanks!
Answers:
Option 1
Use pd.DataFrame
on the list of dictionaries
pd.DataFrame(data['result'].values.tolist())
avl_bikes coordinates free_slots name operative style total_slots
0 12 60.155411,24.950391 18 001 Kaivopuisto True CB 30
1 3 60.159715,24.955212 9 002 Laivasillankatu True 12
2 0 60.158172,24.944808 16 003 Kapteeninpuistikko True 16
3 0 60.160944,24.941859 14 004 Viiskulma True 14
4 16 60.157935,24.936083 16 005 Sepänkatu True 32
Option 2
Use apply
data.result.apply(pd.Series)
avl_bikes coordinates free_slots name operative style total_slots
0 12 60.155411,24.950391 18 001 Kaivopuisto True CB 30
1 3 60.159715,24.955212 9 002 Laivasillankatu True 12
2 0 60.158172,24.944808 16 003 Kapteeninpuistikko True 16
3 0 60.160944,24.941859 14 004 Viiskulma True 14
4 16 60.157935,24.936083 16 005 Sepänkatu True 32
Option 3
Or you could fetch the json
yourself and strip out the results
import urllib, json
url = "http://dev.hsl.fi/tmp/citybikes/stations_20170503T071501Z"
response = urllib.request.urlopen(url)
data = json.loads(response.read())
df = pd.DataFrame(data['result'])
df
avl_bikes coordinates free_slots name operative style total_slots
0 12 60.155411,24.950391 18 001 Kaivopuisto True CB 30
1 3 60.159715,24.955212 9 002 Laivasillankatu True 12
2 0 60.158172,24.944808 16 003 Kapteeninpuistikko True 16
3 0 60.160944,24.941859 14 004 Viiskulma True 14
4 16 60.157935,24.936083 16 005 Sepänkatu True 32
The approaches in the accepted answer work great, so this is just a more recent (2022) FYI:
In later versions of Pandas (1.0>), you can also use json_normalize
(documentation).
json_obj = {
'key': 123,
'field1': 'blah',
'info': {
'contacts': {
'email': {
'foo': '[email protected]',
'bar': '[email protected]'
},
'tel': '123456789',
}
}
}
pd.json_normalize(json_obj)
I’m trying to parse the data at http://dev.hsl.fi/tmp/citybikes/stations_20170503T071501Z into a Pandas DataFrame. Using read_json
gives me a list of dicts instead of a proper DataFrame with the variable names as columns:
In [1]:
data = pd.read_json("http://dev.hsl.fi/tmp/citybikes/stations_20170503T071501Z")
print(data)
Out[1]:
result
0 {'name': '001 Kaivopuisto', 'coordinates': '60...
1 {'name': '002 Laivasillankatu', 'coordinates':...
.. ...
149 {'name': '160 Nokkala', 'coordinates': '60.147...
150 {'name': '997 Workshop Helsinki', 'coordinates...
[151 rows x 1 columns]
This happens with all orient
option. I’ve tried json_normalize()
to no avail as well and a few other things I found here. How could I make this into a sensible DataFrame? Thanks!
Option 1
Use pd.DataFrame
on the list of dictionaries
pd.DataFrame(data['result'].values.tolist())
avl_bikes coordinates free_slots name operative style total_slots
0 12 60.155411,24.950391 18 001 Kaivopuisto True CB 30
1 3 60.159715,24.955212 9 002 Laivasillankatu True 12
2 0 60.158172,24.944808 16 003 Kapteeninpuistikko True 16
3 0 60.160944,24.941859 14 004 Viiskulma True 14
4 16 60.157935,24.936083 16 005 Sepänkatu True 32
Option 2
Use apply
data.result.apply(pd.Series)
avl_bikes coordinates free_slots name operative style total_slots
0 12 60.155411,24.950391 18 001 Kaivopuisto True CB 30
1 3 60.159715,24.955212 9 002 Laivasillankatu True 12
2 0 60.158172,24.944808 16 003 Kapteeninpuistikko True 16
3 0 60.160944,24.941859 14 004 Viiskulma True 14
4 16 60.157935,24.936083 16 005 Sepänkatu True 32
Option 3
Or you could fetch the json
yourself and strip out the results
import urllib, json
url = "http://dev.hsl.fi/tmp/citybikes/stations_20170503T071501Z"
response = urllib.request.urlopen(url)
data = json.loads(response.read())
df = pd.DataFrame(data['result'])
df
avl_bikes coordinates free_slots name operative style total_slots
0 12 60.155411,24.950391 18 001 Kaivopuisto True CB 30
1 3 60.159715,24.955212 9 002 Laivasillankatu True 12
2 0 60.158172,24.944808 16 003 Kapteeninpuistikko True 16
3 0 60.160944,24.941859 14 004 Viiskulma True 14
4 16 60.157935,24.936083 16 005 Sepänkatu True 32
The approaches in the accepted answer work great, so this is just a more recent (2022) FYI:
In later versions of Pandas (1.0>), you can also use json_normalize
(documentation).
json_obj = {
'key': 123,
'field1': 'blah',
'info': {
'contacts': {
'email': {
'foo': '[email protected]',
'bar': '[email protected]'
},
'tel': '123456789',
}
}
}
pd.json_normalize(json_obj)