Python Create dataframe from nested dict with lists
Question:
I am trying to create a dataframe / csv that looks like this
App
id
stages
requestCpu
requestMemory
appName
123
dev
1000
1024
appName
123
staging
3200
1024
The dict data looks like this and includes quite a lot of apps, however all the data inside the apps looks the same with the dict layout:
test_data = {"appName": {"id": "123", "stages": {"dev": [{"request.cpu": 1000}, {"request.memory": 1024}], "staging": [{"request.cpu": 3200}, {"request.memory": 1024}]}}, "appName2"...}
I used something like this before:
df = pd.DataFrame.from_dict(test_data, orient='index')
df = pd.concat([df.drop(['stages'], axis=1), (df['stages'].apply(pd.Series))], axis=1)
df.index.name = "App"
However this wasn’t able to split up the list part and also the stages were now in columns so not how i wanted it to look..
Any help much appreciated, thanks
Answers:
Easiest solution would be to iterate the rows prior to loading it with pandas:
import pandas as pd
test_data = {"appName": {"id": "123", "stages": {"dev": [{"request.cpu": 1000}, {"request.memory": 1024}], "staging": [{"request.cpu": 3200}, {"request.memory": 1024}]}}, "appName2": {"id": "456", "stages": {"dev": [{"request.cpu": 1000}, {"request.memory": 1024}], "staging": [{"request.cpu": 3200}, {"request.memory": 1024}]}}}
rows = []
for app, app_data in test_data.items():
for stage, stage_data in app_data["stages"].items():
row = {
"App": app,
"id": app_data["id"],
"stages": stage
}
for metric in stage_data:
metric_name, metric_value = list(metric.items())[0]
row[metric_name] = metric_value
rows.append(row)
df = pd.json_normalize(rows)
# Reorder columns
df = df[["App", "id", "stages", "request.cpu", "request.memory"]]
Output:
App
id
stages
request.cpu
request.memory
0
appName
123
dev
1000
1024
1
appName
123
staging
3200
1024
2
appName2
456
dev
1000
1024
3
appName2
456
staging
3200
1024
I am trying to create a dataframe / csv that looks like this
App | id | stages | requestCpu | requestMemory |
---|---|---|---|---|
appName | 123 | dev | 1000 | 1024 |
appName | 123 | staging | 3200 | 1024 |
The dict data looks like this and includes quite a lot of apps, however all the data inside the apps looks the same with the dict layout:
test_data = {"appName": {"id": "123", "stages": {"dev": [{"request.cpu": 1000}, {"request.memory": 1024}], "staging": [{"request.cpu": 3200}, {"request.memory": 1024}]}}, "appName2"...}
I used something like this before:
df = pd.DataFrame.from_dict(test_data, orient='index')
df = pd.concat([df.drop(['stages'], axis=1), (df['stages'].apply(pd.Series))], axis=1)
df.index.name = "App"
However this wasn’t able to split up the list part and also the stages were now in columns so not how i wanted it to look..
Any help much appreciated, thanks
Easiest solution would be to iterate the rows prior to loading it with pandas:
import pandas as pd
test_data = {"appName": {"id": "123", "stages": {"dev": [{"request.cpu": 1000}, {"request.memory": 1024}], "staging": [{"request.cpu": 3200}, {"request.memory": 1024}]}}, "appName2": {"id": "456", "stages": {"dev": [{"request.cpu": 1000}, {"request.memory": 1024}], "staging": [{"request.cpu": 3200}, {"request.memory": 1024}]}}}
rows = []
for app, app_data in test_data.items():
for stage, stage_data in app_data["stages"].items():
row = {
"App": app,
"id": app_data["id"],
"stages": stage
}
for metric in stage_data:
metric_name, metric_value = list(metric.items())[0]
row[metric_name] = metric_value
rows.append(row)
df = pd.json_normalize(rows)
# Reorder columns
df = df[["App", "id", "stages", "request.cpu", "request.memory"]]
Output:
App | id | stages | request.cpu | request.memory | |
---|---|---|---|---|---|
0 | appName | 123 | dev | 1000 | 1024 |
1 | appName | 123 | staging | 3200 | 1024 |
2 | appName2 | 456 | dev | 1000 | 1024 |
3 | appName2 | 456 | staging | 3200 | 1024 |