Convert a `dict[str, list[any]]` into a binary `pandas.DataFrame`

Question

I have the following dictionary

d = {
    "anna": ["apple", "strawberry", "banana"],
    "bob": ["strawberry", "banana", "peach"],
    "chris": ["apple", "banana", "peach", "mango"]
}

and I want to convert it into the following pandas.DataFrame

       apple banana mango peach strawberry
anna       1      1     0     0          1
bob        0      1     0     1          1
chris      1      1     1     1          0

It is not very complicated to implement in Python (see below), but I was wondering if there is already something in pandas to do it automatically (or if the implementation below can be optimized)

Thanks in advance!

Python current implementation

import numpy as np
import pandas as pd

d = {
    "anna": ["apple", "strawberry", "banana"],
    "bob": ["strawberry", "banana", "peach"],
    "chris": ["apple", "banana", "peach", "mango"]
}
fruits = sorted(set(np.hstack(d.values())))
df = pd.DataFrame(columns=fruits)
for client, client_fruits in d.items():
    s = pd.Series({
        fruit: fruit in client_fruits for fruit in fruits
    }).astype(int)
    df = pd.concat([df, pd.DataFrame({client: s}).T])
print(df)

Asked By: Lewan

||

Source

Answer 1

One option using str.get_dummies:

out = pd.Series({k: '|'.join(v) for k,v in d.items()}).str.get_dummies()

Or from_dict and pandas.get_dummies:

out = (pd.get_dummies(pd.DataFrame.from_dict(d, orient='index').stack())
         .groupby(level=0).max()
       )

Or with a crosstab:

out = pd.crosstab(*zip(*((k,v) for k,l in d.items() for v in l))).clip(upper=1)

Output:

       apple  banana  mango  peach  strawberry
anna       1       1      0      0           1
bob        0       1      0      1           1
chris      1       1      1      1           0

Answered By: mozway

Answer 2

df1=pd.concat([pd.DataFrame({k:v}) for k,v in d.items()],axis=1).stack().droplevel(0)
pd.crosstab(df1.index,df1)

out

col_0  apple  banana  mango  peach  strawberry
row_0                                         
anna       1       1      0      0           1
bob        0       1      0      1           1
chris      1       1      1      1           0

Answered By: G.G

Answer 3

You can use str.join() on a Series.

pd.Series(d).str.join('|').str.get_dummies()

Output:

       apple  banana  mango  peach  strawberry
anna       1       1      0      0           1
bob        0       1      0      1           1
chris      1       1      1      1           0

Answered By: rhug123

Convert a `dict[str, list[any]]` into a binary `pandas.DataFrame`

Question:

Answers: