How to transform pandas dataframe into specific dictionary?
Question:
I have a pandas dataframe:
id val label
"a1" "ab" "first"
"a1" "aa" "second"
"a1" "ca" "third"
"b1" "cc" "first"
"b1" "kf" "second"
"b1" "ff" "third"
"c1" "wer" "first"
"c1" "iid" "second"
"c1" "ff" "third"
I want to transform it into dictionary wwhere key will be values from columns "id" and values will be dictionaries with keys "label" and values from column "val". so the output must be:
{"a1": {"first": {"ab"}, "second": {"aa"}, "third": {"ca"}},
"b1": {"first": {"cc"}, "second": {"kf"}, "third": {"ff"}},
"c1": {"first": {"wer"}, "second": {"iid"}, "third": {"ff"}},
}
how could I do that?
Answers:
You can groupby
on id
with a lambda function to convert the label
/val
pairs into a dict, then to_dict
on the result to get your desired output:
df.groupby('id').apply(lambda x:dict(zip(x['label'], x['val']))).to_dict()
Output for your sample data:
{
'a1': {'first': 'ab', 'second': 'aa', 'third': 'ca'},
'b1': {'first': 'cc', 'second': 'kf', 'third': 'ff'},
'c1': {'first': 'wer', 'second': 'iid', 'third': 'ff'}
}
If you want the values in the inner dictionaries to be sets, you can convert them on the fly:
df.groupby('id').apply(lambda x:dict(zip(x['label'], ({v} for v in x['val'])))).to_dict()
Output:
{
'a1': {'first': {'ab'}, 'second': {'aa'}, 'third': {'ca'}},
'b1': {'first': {'cc'}, 'second': {'kf'}, 'third': {'ff'}},
'c1': {'first': {'wer'}, 'second': {'iid'}, 'third': {'ff'}}
}
First I recreated your dataframe like
import pandas as pd
import io
text = """
id val label
"a1" "ab" "first"
"a1" "aa" "second"
"a1" "ca" "third"
"b1" "cc" "first"
"b1" "kf" "second"
"b1" "ff" "third"
"c1" "wer" "first"
"c1" "iid" "second"
"c1" "ff" "third"
"""
df = pd.read_csv(io.StringIO(text), sep="s+")
Then I initialized an empty dict
reordered_dict = {}
then I calculated unique values in df.id
and iterated over them to populate values:
for unique_id in df.id.unique():
all_matched_items = df.where(df["id"] == unique_id).dropna(axis=0, how="all") # only contains rows where id matches
items_dict = all_matched_items.set_index("label")["val"].to_dict() # change index to label and export as dict
for key, val in items_dict.items():
items_dict[key] = {val} # format asked by you
reordered_dict[unique_id] = items_dict
print(reordered_dict)
The output is then
{'a1': {'first': {'ab'}, 'second': {'aa'}, 'third': {'ca'}}, 'b1': {'first': {'cc'}, 'second': {'kf'}, 'third': {'ff'}}, 'c1': {'first': {'wer'}, 'second': {'iid'}, 'third': {'ff'}}}
I have a pandas dataframe:
id val label
"a1" "ab" "first"
"a1" "aa" "second"
"a1" "ca" "third"
"b1" "cc" "first"
"b1" "kf" "second"
"b1" "ff" "third"
"c1" "wer" "first"
"c1" "iid" "second"
"c1" "ff" "third"
I want to transform it into dictionary wwhere key will be values from columns "id" and values will be dictionaries with keys "label" and values from column "val". so the output must be:
{"a1": {"first": {"ab"}, "second": {"aa"}, "third": {"ca"}},
"b1": {"first": {"cc"}, "second": {"kf"}, "third": {"ff"}},
"c1": {"first": {"wer"}, "second": {"iid"}, "third": {"ff"}},
}
how could I do that?
You can groupby
on id
with a lambda function to convert the label
/val
pairs into a dict, then to_dict
on the result to get your desired output:
df.groupby('id').apply(lambda x:dict(zip(x['label'], x['val']))).to_dict()
Output for your sample data:
{
'a1': {'first': 'ab', 'second': 'aa', 'third': 'ca'},
'b1': {'first': 'cc', 'second': 'kf', 'third': 'ff'},
'c1': {'first': 'wer', 'second': 'iid', 'third': 'ff'}
}
If you want the values in the inner dictionaries to be sets, you can convert them on the fly:
df.groupby('id').apply(lambda x:dict(zip(x['label'], ({v} for v in x['val'])))).to_dict()
Output:
{
'a1': {'first': {'ab'}, 'second': {'aa'}, 'third': {'ca'}},
'b1': {'first': {'cc'}, 'second': {'kf'}, 'third': {'ff'}},
'c1': {'first': {'wer'}, 'second': {'iid'}, 'third': {'ff'}}
}
First I recreated your dataframe like
import pandas as pd
import io
text = """
id val label
"a1" "ab" "first"
"a1" "aa" "second"
"a1" "ca" "third"
"b1" "cc" "first"
"b1" "kf" "second"
"b1" "ff" "third"
"c1" "wer" "first"
"c1" "iid" "second"
"c1" "ff" "third"
"""
df = pd.read_csv(io.StringIO(text), sep="s+")
Then I initialized an empty dict
reordered_dict = {}
then I calculated unique values in df.id
and iterated over them to populate values:
for unique_id in df.id.unique():
all_matched_items = df.where(df["id"] == unique_id).dropna(axis=0, how="all") # only contains rows where id matches
items_dict = all_matched_items.set_index("label")["val"].to_dict() # change index to label and export as dict
for key, val in items_dict.items():
items_dict[key] = {val} # format asked by you
reordered_dict[unique_id] = items_dict
print(reordered_dict)
The output is then
{'a1': {'first': {'ab'}, 'second': {'aa'}, 'third': {'ca'}}, 'b1': {'first': {'cc'}, 'second': {'kf'}, 'third': {'ff'}}, 'c1': {'first': {'wer'}, 'second': {'iid'}, 'third': {'ff'}}}