Translate list of labels into array of labels per ID in python
Question:
I have a data frame with texts and labels. Each text has multiple rows with on label.
dummy_df = pd.DataFrame([['Text1','label1'], ['Text1', 'label2']], columns=["TEXT", "LABELS"])
I would like to have the following to apply MultiLabelBinarizer() function.
TEXT | LABEL
Text1| [[label1,label2]]
Answers:
If need nested lists use lambda function in GroupBy.agg
:
df = dummy_df.groupby('TEXT')['LABELS'].agg(lambda x: [x.tolist()]).reset_index()
print (df)
TEXT LABELS
0 Text1 [[label1, label2]]
Not nested lists:
df1 = dummy_df.groupby('TEXT')['LABELS'].agg(list).reset_index()
print (df1)
TEXT LABELS
0 Text1 [label1, label2]
I have a data frame with texts and labels. Each text has multiple rows with on label.
dummy_df = pd.DataFrame([['Text1','label1'], ['Text1', 'label2']], columns=["TEXT", "LABELS"])
I would like to have the following to apply MultiLabelBinarizer() function.
TEXT | LABEL
Text1| [[label1,label2]]
If need nested lists use lambda function in GroupBy.agg
:
df = dummy_df.groupby('TEXT')['LABELS'].agg(lambda x: [x.tolist()]).reset_index()
print (df)
TEXT LABELS
0 Text1 [[label1, label2]]
Not nested lists:
df1 = dummy_df.groupby('TEXT')['LABELS'].agg(list).reset_index()
print (df1)
TEXT LABELS
0 Text1 [label1, label2]