How to generate a json file with a nested dictionary from pandas df?

Question:

I need to generate a json file with a specific format from a pandas dataframe. The dataframe looks like this:

user_id product_id date
1 23 01-01-2022
1 24 05-01-2022
2 56 05-06-2022
3 23 02-07-2022
3 24 01-02-2022
3 56 02-01-2022

And the json file needs to have the following format:

{
  "user_id": 1,
  "items": [{
        "product_id": 23,
        "date": 01-01-2022
        }, {
        "product_id": 24,
        "date": 05-01-2022
        }]
}
{
 "userid": 2,
 "items": [{
        "product_id": 56,
        "date": 05-06-2022
        }]
}
...etc

I’ve tried the following, but it’s not the right format:

result = (now.groupby('user_id')['product_id','date'].apply(lambda x: dict(x.values)).to_json())

Any help would be much appreciated!

Asked By: adsh

||

Answers:

out = (df[['product_id','date']].apply(dict, axis=1)
       .groupby(df['user_id']).apply(list)
       .to_frame('items').reset_index()
       .to_dict('records'))
print(out)

[{'user_id': 1, 'items': [{'product_id': 23, 'date': '01-01-2022'}, {'product_id': 24, 'date': '05-01-2022'}]},
{'user_id': 2, 'items': [{'product_id': 56, 'date': '05-06-2022'}]}, 
{'user_id': 3, 'items': [{'product_id': 23, 'date': '02-07-2022'}, {'product_id': 24, 'date': '01-02-2022'}, {'product_id': 56, 'date': '02-01-2022'}]}]
Answered By: Ynjxsjmh

The below code can solve the issue. It first converts the datetime to string for the date column. Then, it converts the dataframe into the desired format.

data is your data table saved as the excel file.

# Import libraries
import pandas as pd
import openpyxl
import json

# Read the excel data
data = pd.read_excel("data.xlsx", sheet_name=0)

# Change the data type of the date column (day-month-year)
data['date'] = data['date'].apply(lambda x: x.strftime('%d-%m-%Y'))

# Convert to desired json format
json_data = (data.groupby(['user_id'])
               .apply(lambda x: x[['product_id','date']].to_dict('records'))
               .reset_index()
               .rename(columns={0:'items'})
               .to_json(orient='records'))

# Pretty print the result
# https://stackoverflow.com/a/12944035/10905535
json_data = json.loads(json_data)
print(json.dumps(json_data, indent=4, sort_keys=False))

The output:

[
    {
        "user_id": 1,
        "items": [
            {
                "product_id": 23,
                "date": "01-01-2022"
            },
            {
                "product_id": 24,
                "date": "05-01-2022"
            }
        ]
    },
    {
        "user_id": 2,
        "items": [
            {
                "product_id": 56,
                "date": "05-06-2022"
            }
        ]
    },
    {
        "user_id": 3,
        "items": [
            {
                "product_id": 23,
                "date": "02-07-2022"
            },
            {
                "product_id": 24,
                "date": "01-02-2022"
            },
            {
                "product_id": 56,
                "date": "02-01-2022"
            }
        ]
    }
]
Answered By: hsaltan