How to generate a json file with a nested dictionary from pandas df?
Question:
I need to generate a json file with a specific format from a pandas dataframe. The dataframe looks like this:
user_id
product_id
date
1
23
01-01-2022
1
24
05-01-2022
2
56
05-06-2022
3
23
02-07-2022
3
24
01-02-2022
3
56
02-01-2022
And the json file needs to have the following format:
{
"user_id": 1,
"items": [{
"product_id": 23,
"date": 01-01-2022
}, {
"product_id": 24,
"date": 05-01-2022
}]
}
{
"userid": 2,
"items": [{
"product_id": 56,
"date": 05-06-2022
}]
}
...etc
I’ve tried the following, but it’s not the right format:
result = (now.groupby('user_id')['product_id','date'].apply(lambda x: dict(x.values)).to_json())
Any help would be much appreciated!
Answers:
out = (df[['product_id','date']].apply(dict, axis=1)
.groupby(df['user_id']).apply(list)
.to_frame('items').reset_index()
.to_dict('records'))
print(out)
[{'user_id': 1, 'items': [{'product_id': 23, 'date': '01-01-2022'}, {'product_id': 24, 'date': '05-01-2022'}]},
{'user_id': 2, 'items': [{'product_id': 56, 'date': '05-06-2022'}]},
{'user_id': 3, 'items': [{'product_id': 23, 'date': '02-07-2022'}, {'product_id': 24, 'date': '01-02-2022'}, {'product_id': 56, 'date': '02-01-2022'}]}]
The below code can solve the issue. It first converts the datetime to string for the date column. Then, it converts the dataframe into the desired format.
data is your data table saved as the excel file.
# Import libraries
import pandas as pd
import openpyxl
import json
# Read the excel data
data = pd.read_excel("data.xlsx", sheet_name=0)
# Change the data type of the date column (day-month-year)
data['date'] = data['date'].apply(lambda x: x.strftime('%d-%m-%Y'))
# Convert to desired json format
json_data = (data.groupby(['user_id'])
.apply(lambda x: x[['product_id','date']].to_dict('records'))
.reset_index()
.rename(columns={0:'items'})
.to_json(orient='records'))
# Pretty print the result
# https://stackoverflow.com/a/12944035/10905535
json_data = json.loads(json_data)
print(json.dumps(json_data, indent=4, sort_keys=False))
The output:
[
{
"user_id": 1,
"items": [
{
"product_id": 23,
"date": "01-01-2022"
},
{
"product_id": 24,
"date": "05-01-2022"
}
]
},
{
"user_id": 2,
"items": [
{
"product_id": 56,
"date": "05-06-2022"
}
]
},
{
"user_id": 3,
"items": [
{
"product_id": 23,
"date": "02-07-2022"
},
{
"product_id": 24,
"date": "01-02-2022"
},
{
"product_id": 56,
"date": "02-01-2022"
}
]
}
]
I need to generate a json file with a specific format from a pandas dataframe. The dataframe looks like this:
user_id | product_id | date |
---|---|---|
1 | 23 | 01-01-2022 |
1 | 24 | 05-01-2022 |
2 | 56 | 05-06-2022 |
3 | 23 | 02-07-2022 |
3 | 24 | 01-02-2022 |
3 | 56 | 02-01-2022 |
And the json file needs to have the following format:
{
"user_id": 1,
"items": [{
"product_id": 23,
"date": 01-01-2022
}, {
"product_id": 24,
"date": 05-01-2022
}]
}
{
"userid": 2,
"items": [{
"product_id": 56,
"date": 05-06-2022
}]
}
...etc
I’ve tried the following, but it’s not the right format:
result = (now.groupby('user_id')['product_id','date'].apply(lambda x: dict(x.values)).to_json())
Any help would be much appreciated!
out = (df[['product_id','date']].apply(dict, axis=1)
.groupby(df['user_id']).apply(list)
.to_frame('items').reset_index()
.to_dict('records'))
print(out)
[{'user_id': 1, 'items': [{'product_id': 23, 'date': '01-01-2022'}, {'product_id': 24, 'date': '05-01-2022'}]},
{'user_id': 2, 'items': [{'product_id': 56, 'date': '05-06-2022'}]},
{'user_id': 3, 'items': [{'product_id': 23, 'date': '02-07-2022'}, {'product_id': 24, 'date': '01-02-2022'}, {'product_id': 56, 'date': '02-01-2022'}]}]
The below code can solve the issue. It first converts the datetime to string for the date column. Then, it converts the dataframe into the desired format.
data is your data table saved as the excel file.
# Import libraries
import pandas as pd
import openpyxl
import json
# Read the excel data
data = pd.read_excel("data.xlsx", sheet_name=0)
# Change the data type of the date column (day-month-year)
data['date'] = data['date'].apply(lambda x: x.strftime('%d-%m-%Y'))
# Convert to desired json format
json_data = (data.groupby(['user_id'])
.apply(lambda x: x[['product_id','date']].to_dict('records'))
.reset_index()
.rename(columns={0:'items'})
.to_json(orient='records'))
# Pretty print the result
# https://stackoverflow.com/a/12944035/10905535
json_data = json.loads(json_data)
print(json.dumps(json_data, indent=4, sort_keys=False))
The output:
[
{
"user_id": 1,
"items": [
{
"product_id": 23,
"date": "01-01-2022"
},
{
"product_id": 24,
"date": "05-01-2022"
}
]
},
{
"user_id": 2,
"items": [
{
"product_id": 56,
"date": "05-06-2022"
}
]
},
{
"user_id": 3,
"items": [
{
"product_id": 23,
"date": "02-07-2022"
},
{
"product_id": 24,
"date": "01-02-2022"
},
{
"product_id": 56,
"date": "02-01-2022"
}
]
}
]