Getting binary labels on from a dataframe and a list of labels

Question:

Suppose I have the following list of labels,

labs = ['G1','G2','G3','G4','G5','G6','G7']

and also suppose that I have the following df:

   group entity_label
0      0           G1
1      0           G2
3      1           G5
4      1           G1
5      2           G1
6      2           G2
7      2           G3

to produce the above df you can use:

df_test = pd.DataFrame({'group': [0,0,0,1,1,2,2,2,2],
                       'entity_label':['G1','G2','G2','G5','G1','G1','G2','G3','G3']})

df_test.drop_duplicates(subset=['group','entity_label'], keep='first')

for each group I want to use a mapping to look up on the labels and make a new dataframe with binary labels

   group    entity_label_binary
0      0  [1, 1, 0, 0, 0, 0, 0]
1      1  [1, 0, 0, 0, 1, 0, 0]
2      2  [1, 1, 1, 0, 0, 0, 0]

namely for group 0 we have G1 and G2 hence 1s in above table and so on. I wonder how one can do this?

Asked By: Wiliam

||

Answers:

One option, based on crosstab:

labs = ['G1','G2','G3','G4','G5','G6','G7']

(pd.crosstab(df_test['group'], df_test['entity_label'])
   .clip(upper=1)
   .reindex(columns=labs, fill_value=0)
   .agg(list, axis=1)
   .reset_index(name='entity_label_binary')
)

Variant, with get_dummies and groupby.max:

(pd.get_dummies(df_test['entity_label'])
   .groupby(df_test['group']).max()
   .reindex(columns=labs, fill_value=0)
   .agg(list, axis=1)
   .reset_index(name='entity_label_binary')
)

Output:

   group    entity_label_binary
0      0  [1, 1, 0, 0, 0, 0, 0]
1      1  [1, 0, 0, 0, 1, 0, 0]
2      2  [1, 1, 1, 0, 0, 0, 0]
Answered By: mozway
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.