Translate list of labels into array of labels per ID in python

Question:

I have a data frame with texts and labels. Each text has multiple rows with on label.

dummy_df = pd.DataFrame([['Text1','label1'], ['Text1', 'label2']], columns=["TEXT", "LABELS"])

I would like to have the following to apply MultiLabelBinarizer() function.

TEXT | LABEL
Text1| [[label1,label2]]

Reference 1
Reference 2

Asked By: sveer

||

Answers:

If need nested lists use lambda function in GroupBy.agg:

df = dummy_df.groupby('TEXT')['LABELS'].agg(lambda x: [x.tolist()]).reset_index()
print (df)
    TEXT              LABELS
0  Text1  [[label1, label2]]

Not nested lists:

df1 = dummy_df.groupby('TEXT')['LABELS'].agg(list).reset_index()
print (df1)
    TEXT            LABELS
0  Text1  [label1, label2]
Answered By: jezrael
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.