Translate list of labels into array of labels per ID in python

Question

I have a data frame with texts and labels. Each text has multiple rows with on label.

dummy_df = pd.DataFrame([['Text1','label1'], ['Text1', 'label2']], columns=["TEXT", "LABELS"])

I would like to have the following to apply MultiLabelBinarizer() function.

TEXT | LABEL
Text1| [[label1,label2]]

Reference 1
Reference 2

Asked By: sveer

||

Source

Answer 1

If need nested lists use lambda function in GroupBy.agg:

df = dummy_df.groupby('TEXT')['LABELS'].agg(lambda x: [x.tolist()]).reset_index()
print (df)
    TEXT              LABELS
0  Text1  [[label1, label2]]

Not nested lists:

df1 = dummy_df.groupby('TEXT')['LABELS'].agg(list).reset_index()
print (df1)
    TEXT            LABELS
0  Text1  [label1, label2]

Answered By: jezrael

Translate list of labels into array of labels per ID in python

Question:

Answers: