Combine string in a column based on groups in a pandas DataFrame

Question:

I have a Pandas DataFrame, DF:

Column A Column B Column C
Apple red Texas
Apple red California
Banana yellow Indiana
Banana yellow Florida

I would like to get it in a dictionary in the form:

{ "Apple red" : ['Texas', 'California'], "Banana yellow" : ['Indiana', 'Florida'] }

where Key = concatenation of strings in column A and column B (and)

Value = all corresponding strings from column C (based on groupby) in a list.

I am not sure how to achieve this.

Key Note: It should also work if there are more than 3 columns to be grouped for dictionary’s key

Asked By: Krishna

||

Answers:

Try:

x = dict(
    df.groupby(df["Column A"] + " " + df["Column B"])["Column C"].agg(list)
)

print(x)

Prints:

{'Apple red': ['Texas', 'California'], 'Banana yellow': ['Indiana', 'Florida']}
Answered By: Andrej Kesely

One option, which should be performant as well, is with a default dictionary:

from collections import defaultdict

out = defaultdict(list)

for a, b, c in zip(df['Column A'], df['Column B'], df['Column C']):
    key = a + " " + b
    out[key].append(c)

out
defaultdict(list,
            {'Apple red': ['Texas', 'California'],
             'Banana yellow': ['Indiana', 'Florida']})
Answered By: sammywemmy
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.