How to transform pandas dataframe into specific dictionary?

Question:

I have a pandas dataframe:

 id      val       label
"a1"    "ab"      "first"
"a1"    "aa"      "second"
"a1"    "ca"      "third"
"b1"    "cc"      "first"
"b1"    "kf"      "second"
"b1"    "ff"      "third"
"c1"    "wer"     "first"
"c1"    "iid"     "second"
"c1"    "ff"      "third"

I want to transform it into dictionary wwhere key will be values from columns "id" and values will be dictionaries with keys "label" and values from column "val". so the output must be:

{"a1": {"first": {"ab"}, "second": {"aa"}, "third": {"ca"}},
 "b1": {"first": {"cc"}, "second": {"kf"}, "third": {"ff"}},
 "c1": {"first": {"wer"}, "second": {"iid"}, "third": {"ff"}},
}

how could I do that?

Asked By: Ir8_mind

||

Answers:

You can groupby on id with a lambda function to convert the label/val pairs into a dict, then to_dict on the result to get your desired output:

df.groupby('id').apply(lambda x:dict(zip(x['label'], x['val']))).to_dict()

Output for your sample data:

{
 'a1': {'first': 'ab', 'second': 'aa', 'third': 'ca'},
 'b1': {'first': 'cc', 'second': 'kf', 'third': 'ff'},
 'c1': {'first': 'wer', 'second': 'iid', 'third': 'ff'}
}

If you want the values in the inner dictionaries to be sets, you can convert them on the fly:

df.groupby('id').apply(lambda x:dict(zip(x['label'], ({v} for v in x['val'])))).to_dict()

Output:

{
 'a1': {'first': {'ab'}, 'second': {'aa'}, 'third': {'ca'}},
 'b1': {'first': {'cc'}, 'second': {'kf'}, 'third': {'ff'}},
 'c1': {'first': {'wer'}, 'second': {'iid'}, 'third': {'ff'}}
}
Answered By: Nick

First I recreated your dataframe like

import pandas as pd
import io


text = """
 id      val       label
"a1"    "ab"      "first"
"a1"    "aa"      "second"
"a1"    "ca"      "third"
"b1"    "cc"      "first"
"b1"    "kf"      "second"
"b1"    "ff"      "third"
"c1"    "wer"     "first"
"c1"    "iid"     "second"
"c1"    "ff"      "third"
"""

df = pd.read_csv(io.StringIO(text), sep="s+")

Then I initialized an empty dict

reordered_dict = {}

then I calculated unique values in df.id and iterated over them to populate values:

for unique_id in df.id.unique():
    all_matched_items = df.where(df["id"] == unique_id).dropna(axis=0, how="all") # only contains rows where id matches
    items_dict = all_matched_items.set_index("label")["val"].to_dict() # change index to label and export as dict
    for key, val in items_dict.items():
        items_dict[key] = {val} # format asked by you
    reordered_dict[unique_id] = items_dict

print(reordered_dict)

The output is then

{'a1': {'first': {'ab'}, 'second': {'aa'}, 'third': {'ca'}}, 'b1': {'first': {'cc'}, 'second': {'kf'}, 'third': {'ff'}}, 'c1': {'first': {'wer'}, 'second': {'iid'}, 'third': {'ff'}}}
Answered By: Bijay Regmi