Converting Dataframe to Dict and Sending to API
Question:
I have a dataframe that I need to convert to a dict and send via API as Json.
This is my df:
Category
Topic
Steps
Stud1
Stud2
Stud3
Cat1
topc1
step1
10
15
30
Cat1
topc2
step2
16
26
42
Cat3
topc3
step3
05
62
50
I want to generate dict something like this:
{Cat1: {
topc1:{
step1:
[
{Stud1:10},
{Stud2:15},
{Stud3:30}
]
}
}
topc2:{
step2:
[
{Stud1:10},
{Stud2:15},
{Stud3:30}
]
}
}
}
I know I can achieve this with for loop and iterrows()
but that would be slow result.
I have tried using to_dict()
function from pandas with group_by()
on Category, Topic, Step but that doesn’t give required output. I have tried many ways like stack()
and I even merging the rows of same name but it yields no result. Can anyone tell me best efficient way to achieve this.
Answers:
I’m not completely sure I understand the question fully. But you could try this:
result = {}
for (key1, key2, key3), sdf in df.groupby(["Category", "Topic", "Steps"]):
inner = result.setdefault(key1, {}).setdefault(key2, {})
inner[key3] = [
{key: value}
for record in sdf[["Stud1", "Stud2", "Stud3"]].to_dict(orient="records")
for key, value in record.items()
]
Result for your example df
:
{'Cat1': {'topc1': {'step1': [{'Stud1': 10}, {'Stud2': 15}, {'Stud3': 30}]},
'topc2': {'step2': [{'Stud1': 16}, {'Stud2': 26}, {'Stud3': 42}]}},
'Cat3': {'topc3': {'step3': [{'Stud1': 5}, {'Stud2': 62}, {'Stud3': 50}]}}}
This, more direct, variation is bit faster, but not much:
result = {}
for (key1, key2, key3), sdf in df.groupby(["Category", "Topic", "Steps"]):
inner = result.setdefault(key1, {}).setdefault(key2, {})
inner[key3] = [
{key: value}
for values in zip(sdf["Stud1"], sdf["Stud2"], sdf["Stud3"])
for key, value in zip(("Stud1", "Stud2", "Stud3"), values)
]
def function1(ss:pd.Series):
return str({k:v for d in ss.tolist() for k,v in d.items()})
df.groupby(['Category','Topic','Steps']).apply(lambda dd:dd[['Stud1','Stud2','Stud3']].to_dict('r'))
.reset_index(level=2).apply(lambda ss:{ss.iloc[0]:ss.iloc[1]},axis=1)
.reset_index(level=1).apply(lambda ss:{ss.iloc[0]:ss.iloc[1]},axis=1)
.groupby(level=0).apply(function1).map(eval).to_dict()
out:
{'Cat1': {'topc1': {'step1': [{'Stud1': 10, 'Stud2': 15, 'Stud3': 30}]},
'topc2': {'step2': [{'Stud1': 16, 'Stud2': 26, 'Stud3': 42}]}},
'Cat3': {'topc3': {'step3': [{'Stud1': 5, 'Stud2': 62, 'Stud3': 50}]}}}
I have a dataframe that I need to convert to a dict and send via API as Json.
This is my df:
Category | Topic | Steps | Stud1 | Stud2 | Stud3 |
---|---|---|---|---|---|
Cat1 | topc1 | step1 | 10 | 15 | 30 |
Cat1 | topc2 | step2 | 16 | 26 | 42 |
Cat3 | topc3 | step3 | 05 | 62 | 50 |
I want to generate dict something like this:
{Cat1: {
topc1:{
step1:
[
{Stud1:10},
{Stud2:15},
{Stud3:30}
]
}
}
topc2:{
step2:
[
{Stud1:10},
{Stud2:15},
{Stud3:30}
]
}
}
}
I know I can achieve this with for loop and iterrows()
but that would be slow result.
I have tried using to_dict()
function from pandas with group_by()
on Category, Topic, Step but that doesn’t give required output. I have tried many ways like stack()
and I even merging the rows of same name but it yields no result. Can anyone tell me best efficient way to achieve this.
I’m not completely sure I understand the question fully. But you could try this:
result = {}
for (key1, key2, key3), sdf in df.groupby(["Category", "Topic", "Steps"]):
inner = result.setdefault(key1, {}).setdefault(key2, {})
inner[key3] = [
{key: value}
for record in sdf[["Stud1", "Stud2", "Stud3"]].to_dict(orient="records")
for key, value in record.items()
]
Result for your example df
:
{'Cat1': {'topc1': {'step1': [{'Stud1': 10}, {'Stud2': 15}, {'Stud3': 30}]},
'topc2': {'step2': [{'Stud1': 16}, {'Stud2': 26}, {'Stud3': 42}]}},
'Cat3': {'topc3': {'step3': [{'Stud1': 5}, {'Stud2': 62}, {'Stud3': 50}]}}}
This, more direct, variation is bit faster, but not much:
result = {}
for (key1, key2, key3), sdf in df.groupby(["Category", "Topic", "Steps"]):
inner = result.setdefault(key1, {}).setdefault(key2, {})
inner[key3] = [
{key: value}
for values in zip(sdf["Stud1"], sdf["Stud2"], sdf["Stud3"])
for key, value in zip(("Stud1", "Stud2", "Stud3"), values)
]
def function1(ss:pd.Series):
return str({k:v for d in ss.tolist() for k,v in d.items()})
df.groupby(['Category','Topic','Steps']).apply(lambda dd:dd[['Stud1','Stud2','Stud3']].to_dict('r'))
.reset_index(level=2).apply(lambda ss:{ss.iloc[0]:ss.iloc[1]},axis=1)
.reset_index(level=1).apply(lambda ss:{ss.iloc[0]:ss.iloc[1]},axis=1)
.groupby(level=0).apply(function1).map(eval).to_dict()
out:
{'Cat1': {'topc1': {'step1': [{'Stud1': 10, 'Stud2': 15, 'Stud3': 30}]},
'topc2': {'step2': [{'Stud1': 16, 'Stud2': 26, 'Stud3': 42}]}},
'Cat3': {'topc3': {'step3': [{'Stud1': 5, 'Stud2': 62, 'Stud3': 50}]}}}