Converting Dataframe to Dict and Sending to API

Question:

I have a dataframe that I need to convert to a dict and send via API as Json.

This is my df:

Category Topic Steps Stud1 Stud2 Stud3
Cat1 topc1 step1 10 15 30
Cat1 topc2 step2 16 26 42
Cat3 topc3 step3 05 62 50

I want to generate dict something like this:

{Cat1: {
        topc1:{
               step1:
                     [
                       {Stud1:10},
                       {Stud2:15},
                       {Stud3:30}
                     ]
               }
        }
        topc2:{
               step2:
                     [
                       {Stud1:10},
                       {Stud2:15},
                       {Stud3:30}
                     ]
               }
        }
}

I know I can achieve this with for loop and iterrows() but that would be slow result.

I have tried using to_dict() function from pandas with group_by() on Category, Topic, Step but that doesn’t give required output. I have tried many ways like stack() and I even merging the rows of same name but it yields no result. Can anyone tell me best efficient way to achieve this.

Asked By: Sahil Mohile

||

Answers:

I’m not completely sure I understand the question fully. But you could try this:

result = {}
for (key1, key2, key3), sdf in df.groupby(["Category", "Topic", "Steps"]):
    inner = result.setdefault(key1, {}).setdefault(key2, {})
    inner[key3] = [
        {key: value}
        for record in sdf[["Stud1", "Stud2", "Stud3"]].to_dict(orient="records")
        for key, value in record.items()
    ]

Result for your example df:

{'Cat1': {'topc1': {'step1': [{'Stud1': 10}, {'Stud2': 15}, {'Stud3': 30}]},
          'topc2': {'step2': [{'Stud1': 16}, {'Stud2': 26}, {'Stud3': 42}]}},
 'Cat3': {'topc3': {'step3': [{'Stud1': 5}, {'Stud2': 62}, {'Stud3': 50}]}}}

This, more direct, variation is bit faster, but not much:

result = {}
for (key1, key2, key3), sdf in df.groupby(["Category", "Topic", "Steps"]):
    inner = result.setdefault(key1, {}).setdefault(key2, {})
    inner[key3] = [
        {key: value}
        for values in zip(sdf["Stud1"], sdf["Stud2"], sdf["Stud3"])
        for key, value in zip(("Stud1", "Stud2", "Stud3"), values)
    ]
Answered By: Timus
def function1(ss:pd.Series):
    return str({k:v for d in ss.tolist() for k,v in d.items()})

df.groupby(['Category','Topic','Steps']).apply(lambda dd:dd[['Stud1','Stud2','Stud3']].to_dict('r'))
    .reset_index(level=2).apply(lambda ss:{ss.iloc[0]:ss.iloc[1]},axis=1)
    .reset_index(level=1).apply(lambda ss:{ss.iloc[0]:ss.iloc[1]},axis=1)
    .groupby(level=0).apply(function1).map(eval).to_dict()

out:

{'Cat1': {'topc1': {'step1': [{'Stud1': 10, 'Stud2': 15, 'Stud3': 30}]},
  'topc2': {'step2': [{'Stud1': 16, 'Stud2': 26, 'Stud3': 42}]}},
 'Cat3': {'topc3': {'step3': [{'Stud1': 5, 'Stud2': 62, 'Stud3': 50}]}}}
Answered By: G.G