Convert a `dict[str, list[any]]` into a binary `pandas.DataFrame`

Question:

I have the following dictionary

d = {
    "anna": ["apple", "strawberry", "banana"],
    "bob": ["strawberry", "banana", "peach"],
    "chris": ["apple", "banana", "peach", "mango"]
}

and I want to convert it into the following pandas.DataFrame

       apple banana mango peach strawberry
anna       1      1     0     0          1
bob        0      1     0     1          1
chris      1      1     1     1          0

It is not very complicated to implement in Python (see below), but I was wondering if there is already something in pandas to do it automatically (or if the implementation below can be optimized)

Thanks in advance!


Python current implementation

import numpy as np
import pandas as pd

d = {
    "anna": ["apple", "strawberry", "banana"],
    "bob": ["strawberry", "banana", "peach"],
    "chris": ["apple", "banana", "peach", "mango"]
}
fruits = sorted(set(np.hstack(d.values())))
df = pd.DataFrame(columns=fruits)
for client, client_fruits in d.items():
    s = pd.Series({
        fruit: fruit in client_fruits for fruit in fruits
    }).astype(int)
    df = pd.concat([df, pd.DataFrame({client: s}).T])
print(df)
Asked By: Lewan

||

Answers:

One option using str.get_dummies:

out = pd.Series({k: '|'.join(v) for k,v in d.items()}).str.get_dummies()

Or from_dict and pandas.get_dummies:

out = (pd.get_dummies(pd.DataFrame.from_dict(d, orient='index').stack())
         .groupby(level=0).max()
       )

Or with a crosstab:

out = pd.crosstab(*zip(*((k,v) for k,l in d.items() for v in l))).clip(upper=1)

Output:

       apple  banana  mango  peach  strawberry
anna       1       1      0      0           1
bob        0       1      0      1           1
chris      1       1      1      1           0
Answered By: mozway
df1=pd.concat([pd.DataFrame({k:v}) for k,v in d.items()],axis=1).stack().droplevel(0)
pd.crosstab(df1.index,df1)

out

col_0  apple  banana  mango  peach  strawberry
row_0                                         
anna       1       1      0      0           1
bob        0       1      0      1           1
chris      1       1      1      1           0
Answered By: G.G

You can use str.join() on a Series.

pd.Series(d).str.join('|').str.get_dummies()

Output:

       apple  banana  mango  peach  strawberry
anna       1       1      0      0           1
bob        0       1      0      1           1
chris      1       1      1      1           0
Answered By: rhug123
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.