Pandas groupby and then apply to_dict('records')

Question:

Suppose I have the following data frame:

df = pd.DataFrame({'a': [1,1,1,2], 'b': ['a', 'a', 'b', 'c'], 'd': [1, 2, 3, 4]})

And I want to end with the following dict:

{1: [{'b':'a', 'd': 1}, {'b': 'a', 'd': 2}, {'b': 'b', 'd': 3}], 2: [{'b': 'c', 'd': 4}]}

Basically, I want to group by a and for each data frame I want to apply to_dict('records').

What I tried was the following:

# dict ok but not a list 
df.groupby('a').agg(list).to_dict('index')
{1: {'b': ['a', 'a', 'b'], 'd': [1, 2, 3]}, 2: {'b': ['c'], 'd': [4]}}
# the index disappears
df.groupby('a').agg(list).to_dict('records')
[{'b': ['a', 'a', 'b'], 'd': [1, 2, 3]}, {'b': ['c'], 'd': [4]}]
df.set_index('a').to_dict('index')
ValueError: DataFrame index must be unique for orient='index'

I think I can do it using a for-loop but I’m almost sure there is a pythonic way to do it.

Asked By: Bruno Mello

||

Answers:

Following your logic, I think one way to avoid a for-loop, is to use GroupBy.apply with zip inside a listcomp to iterate over both columns in // :

out = df.groupby("a").apply(lambda x: [{"b": y, "d": z}
                                       for y, z in zip(x["b"], x["d"])]).to_dict()

If you need to zip more than two columns (dynamically), use this variant :

out = df.groupby("a").apply(lambda x: [dict(zip(x.columns[1:], row))
                                 for row in x[x.columns[1:]].to_numpy()]).to_dict()


Output :

print(out)

#{1: [{'b': 'a', 'd': 1}, {'b': 'a', 'd': 2}, {'b': 'b', 'd': 3}], 2: [{'b': 'c', 'd': 4}]}
Answered By: Timeless

You could do:

df.assign(dicts=df.drop(columns="a").to_dict("records")).groupby("a")["dicts"].agg(
    list
).to_dict()
Answered By: SomeDude

Here is a way using groupby() and apply()

df.groupby('a').apply(lambda x: x[['b','d']].to_dict('records')).to_dict()

Output:

{1: [{'b': 'a', 'd': 1}, {'b': 'a', 'd': 2}, {'b': 'b', 'd': 3}],
 2: [{'b': 'c', 'd': 4}]}
Answered By: rhug123
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.