How to change json data into dataframe
Question:
I need one help to convert json data into dataframe. Could you please help me how to do this?
Example:
JSON DATA
{
"user_id": "vmani4",
"password": "*****",
"api_name": "KOL",
"body": {
"api_name": "KOL",
"columns": [
"kol_id",
"jnj_id",
"kol_full_nm",
"thrc_cd"
],
"filter": {
"kol_id": "101152",
"jnj_id": "7124166",
"thrc_nm": "VIR"
}
}
}
Desirable output:
user_id password api_name columns filter filter_value
vmani ****** KOL kol_id kol_id 101152
jnj_id jnj_id 7124166
kol_full_nm thrc_nm VIR
thrc_cd
Answers:
data
will be the JSON
.
- Use
pandas.json_normalize
to load the JSON into a DataFrame
, and drop the unneeded columns.
- Use
pandas.DataFrame.explode
, to expand the 'body.columns'
list into separate rows.
- Create a separate
DataFrame
for data['body']['filter']
- Use
pandas.DataFrame.join
to combine the two DataFrames
.
- There isn’t a way to map all of
'filter'
to all 'body.columns'
.
'thrc_nm'
doesn’t map to anything in 'body.columns'
.
'filter'
and 'filter_value'
are added as separate columns, ordered by their order in the JSON, and not associated with the 'body.columns'
.
- Tested in
python 3.10
, pandas 1.4.3
import pandas as pd
# load the json data
df = pd.json_normalize(data).drop(columns=['body.filter.kol_id', 'body.filter.jnj_id', 'body.filter.thrc_nm'])
# explode the column
df = df.explode('body.columns', ignore_index=True)
# load and clean data[body][filter]
df_filter = pd.DataFrame.from_dict(data['body']['filter'], orient='index').reset_index().rename(columns={'index': 'filter', 0: 'filter_value'})
# join the dataframes
dfj = df.join(df_filter)
# display(dfj)
user_id password api_name body.api_name body.columns filter filter_value
0 vmani4 ***** KOL KOL kol_id kol_id 101152
1 vmani4 ***** KOL KOL jnj_id jnj_id 7124166
2 vmani4 ***** KOL KOL kol_full_nm thrc_nm VIR
3 vmani4 ***** KOL KOL thrc_cd NaN NaN
Option
- I think it’s easier to have each filter as a column, with the value below it
# load data into a dataframe
df = pd.json_normalize(data)
# explode the column
df = df.explode('body.columns', ignore_index=True)
# display(df)
user_id password api_name body.api_name body.columns body.filter.kol_id body.filter.jnj_id body.filter.thrc_nm
0 vmani4 ***** KOL KOL kol_id 101152 7124166 VIR
1 vmani4 ***** KOL KOL jnj_id 101152 7124166 VIR
2 vmani4 ***** KOL KOL kol_full_nm 101152 7124166 VIR
3 vmani4 ***** KOL KOL thrc_cd 101152 7124166 VIR
I’m not familiar with DataFrame but I tried my best to come up with the solution of you desired output in proper way.
Code
import pandas as pd
import json
import numpy as np
json_data = """ {
"user_id": "vmani4",
"password": "*****",
"api_name": "KOL",
"body": {
"api_name": "KOL",
"columns": [
"kol_id",
"jnj_id",
"kol_full_nm",
"thrc_cd"
],
"filter": {
"kol_id": "101152",
"jnj_id": "7124166",
"thrc_nm": "VIR"
}
}
}"""
python_data = json.loads(json_data)
filter = {}
list_for_filter = []
filter_value = {}
list_for_filter_value = []
first_level = {}
for_colums = {}
for x, y in python_data.items():
if type(y) is dict:
for j, k in y.items():
if j == 'columns':
for_colums[j] = k
if type(k) is dict:
for m, n in k.items():
list_for_filter.append(m)
list_for_filter_value.append(n)
break
first_level[x] = [y]
filter['filter'] = list_for_filter
filter_value['filter_value'] = list_for_filter_value
res = {**first_level, **for_colums, **filter, **filter_value}
df = pd.concat([pd.Series(v, name=k) for k, v in res.items()], axis=1)
print(df)
output
user_id password api_name columns filter filter_value
0 vmani4 ***** KOL kol_id kol_id 101152
1 NaN NaN NaN jnj_id jnj_id 7124166
2 NaN NaN NaN kol_full_nm thrc_nm VIR
3 NaN NaN NaN thrc_cd NaN NaN
Let me give you short hand about my code first created a lot of lists
and dicts
the reason why I did so is that I saw in your desired output some columns that weren’t actually in your code like filter_value
.
I also loop trough the dict items in order to make another dict which will satisfy the desired output.
after of all because of the length of lists in the DataFrame where not equal that’s why I used concat
and series
I need one help to convert json data into dataframe. Could you please help me how to do this?
Example:
JSON DATA
{
"user_id": "vmani4",
"password": "*****",
"api_name": "KOL",
"body": {
"api_name": "KOL",
"columns": [
"kol_id",
"jnj_id",
"kol_full_nm",
"thrc_cd"
],
"filter": {
"kol_id": "101152",
"jnj_id": "7124166",
"thrc_nm": "VIR"
}
}
}
Desirable output:
user_id password api_name columns filter filter_value
vmani ****** KOL kol_id kol_id 101152
jnj_id jnj_id 7124166
kol_full_nm thrc_nm VIR
thrc_cd
data
will be theJSON
.- Use
pandas.json_normalize
to load the JSON into aDataFrame
, and drop the unneeded columns. - Use
pandas.DataFrame.explode
, to expand the'body.columns'
list into separate rows. - Create a separate
DataFrame
fordata['body']['filter']
- Use
pandas.DataFrame.join
to combine the twoDataFrames
. - There isn’t a way to map all of
'filter'
to all'body.columns'
.'thrc_nm'
doesn’t map to anything in'body.columns'
.'filter'
and'filter_value'
are added as separate columns, ordered by their order in the JSON, and not associated with the'body.columns'
.
- Tested in
python 3.10
,pandas 1.4.3
import pandas as pd
# load the json data
df = pd.json_normalize(data).drop(columns=['body.filter.kol_id', 'body.filter.jnj_id', 'body.filter.thrc_nm'])
# explode the column
df = df.explode('body.columns', ignore_index=True)
# load and clean data[body][filter]
df_filter = pd.DataFrame.from_dict(data['body']['filter'], orient='index').reset_index().rename(columns={'index': 'filter', 0: 'filter_value'})
# join the dataframes
dfj = df.join(df_filter)
# display(dfj)
user_id password api_name body.api_name body.columns filter filter_value
0 vmani4 ***** KOL KOL kol_id kol_id 101152
1 vmani4 ***** KOL KOL jnj_id jnj_id 7124166
2 vmani4 ***** KOL KOL kol_full_nm thrc_nm VIR
3 vmani4 ***** KOL KOL thrc_cd NaN NaN
Option
- I think it’s easier to have each filter as a column, with the value below it
# load data into a dataframe
df = pd.json_normalize(data)
# explode the column
df = df.explode('body.columns', ignore_index=True)
# display(df)
user_id password api_name body.api_name body.columns body.filter.kol_id body.filter.jnj_id body.filter.thrc_nm
0 vmani4 ***** KOL KOL kol_id 101152 7124166 VIR
1 vmani4 ***** KOL KOL jnj_id 101152 7124166 VIR
2 vmani4 ***** KOL KOL kol_full_nm 101152 7124166 VIR
3 vmani4 ***** KOL KOL thrc_cd 101152 7124166 VIR
I’m not familiar with DataFrame but I tried my best to come up with the solution of you desired output in proper way.
Code
import pandas as pd
import json
import numpy as np
json_data = """ {
"user_id": "vmani4",
"password": "*****",
"api_name": "KOL",
"body": {
"api_name": "KOL",
"columns": [
"kol_id",
"jnj_id",
"kol_full_nm",
"thrc_cd"
],
"filter": {
"kol_id": "101152",
"jnj_id": "7124166",
"thrc_nm": "VIR"
}
}
}"""
python_data = json.loads(json_data)
filter = {}
list_for_filter = []
filter_value = {}
list_for_filter_value = []
first_level = {}
for_colums = {}
for x, y in python_data.items():
if type(y) is dict:
for j, k in y.items():
if j == 'columns':
for_colums[j] = k
if type(k) is dict:
for m, n in k.items():
list_for_filter.append(m)
list_for_filter_value.append(n)
break
first_level[x] = [y]
filter['filter'] = list_for_filter
filter_value['filter_value'] = list_for_filter_value
res = {**first_level, **for_colums, **filter, **filter_value}
df = pd.concat([pd.Series(v, name=k) for k, v in res.items()], axis=1)
print(df)
output
user_id password api_name columns filter filter_value
0 vmani4 ***** KOL kol_id kol_id 101152
1 NaN NaN NaN jnj_id jnj_id 7124166
2 NaN NaN NaN kol_full_nm thrc_nm VIR
3 NaN NaN NaN thrc_cd NaN NaN
Let me give you short hand about my code first created a lot of lists
and dicts
the reason why I did so is that I saw in your desired output some columns that weren’t actually in your code like filter_value
.
I also loop trough the dict items in order to make another dict which will satisfy the desired output.
after of all because of the length of lists in the DataFrame where not equal that’s why I used concat
and series