manipulating data frame with pandas from a list of dictionary with lists
Question:
import pandas as pd
data = [{'sequence': 'he left me',
'labels': ['relationship', 'sad', 'happy', 'depression', 'suicidal'],
'scores': [0.9898561835289001,
0.9809304475784302,
0.3625302314758301,
0.31606775522232056,
0.04021124914288521]},
{'sequence': 'I lost my job',
'labels': ['sad', 'relationship', 'depression', 'happy', 'suicidal'],
'scores': [0.123456,
0.56789,
0.78901,
0.12345,
0.67890]}]
df = pd.DataFrame(data)
df = pd.concat([df['sequence'], pd.DataFrame(df['scores'].tolist(),columns=df['labels'].iloc[0])], axis=1)
print(df)
that’s my code, it’s not giving me the right output.
here’s the output.
sequence relationship sad happy depression suicidal
0 he left me 0.989856 0.98093 0.36253 0.316068 0.040211
1 I lost my job 0.123456 0.56789 0.78901 0.123450 0.678900
you can see that the scores are not correct. ‘sad’ should be 0.123456, but instead it’s 0.56789. I need help here, am kinda new so having hard time.
I think I need help with this line
df = pd.concat([df['sequence'], pd.DataFrame(df['scores'].tolist(),columns=df['labels'].iloc[0])], axis=1)
I went from this
df = df.rename(columns={'scores': df['labels'].iloc[0]})
and then this
df = df.rename(columns={'scores': df['labels'].iloc[0][0]})
after that tried this
df = pd.concat([df['sequence'], pd.DataFrame(df['scores'].tolist(), columns=df['labels'])], axis=1)
and finally
df = pd.concat([df['sequence'], pd.DataFrame(df['scores'].tolist(), columns=df['labels'].iloc[0])], axis=1)
I want each of those labels to have their correct scores for every row, not just the first row.
Answers:
I’d suggest you preprocess your data so that labels and values are related directly, not through the order they appear in their respective lists:
data_processed = [
{
"sequence": record["sequence"],
**{
label: value
for label, value in zip(record["labels"], record["scores"])
},
}
for record in data
]
Now you can convert this directly to a DataFrame:
df = pd.DataFrame(data_processed)
import pandas as pd
data = [{'sequence': 'he left me',
'labels': ['relationship', 'sad', 'happy', 'depression', 'suicidal'],
'scores': [0.9898561835289001,
0.9809304475784302,
0.3625302314758301,
0.31606775522232056,
0.04021124914288521]},
{'sequence': 'I lost my job',
'labels': ['sad', 'relationship', 'depression', 'happy', 'suicidal'],
'scores': [0.123456,
0.56789,
0.78901,
0.12345,
0.67890]}]
df = pd.DataFrame(data)
df = pd.concat([df['sequence'], pd.DataFrame(df['scores'].tolist(),columns=df['labels'].iloc[0])], axis=1)
print(df)
that’s my code, it’s not giving me the right output.
here’s the output.
sequence relationship sad happy depression suicidal
0 he left me 0.989856 0.98093 0.36253 0.316068 0.040211
1 I lost my job 0.123456 0.56789 0.78901 0.123450 0.678900
you can see that the scores are not correct. ‘sad’ should be 0.123456, but instead it’s 0.56789. I need help here, am kinda new so having hard time.
I think I need help with this line
df = pd.concat([df['sequence'], pd.DataFrame(df['scores'].tolist(),columns=df['labels'].iloc[0])], axis=1)
I went from this
df = df.rename(columns={'scores': df['labels'].iloc[0]})
and then this
df = df.rename(columns={'scores': df['labels'].iloc[0][0]})
after that tried this
df = pd.concat([df['sequence'], pd.DataFrame(df['scores'].tolist(), columns=df['labels'])], axis=1)
and finally
df = pd.concat([df['sequence'], pd.DataFrame(df['scores'].tolist(), columns=df['labels'].iloc[0])], axis=1)
I want each of those labels to have their correct scores for every row, not just the first row.
I’d suggest you preprocess your data so that labels and values are related directly, not through the order they appear in their respective lists:
data_processed = [
{
"sequence": record["sequence"],
**{
label: value
for label, value in zip(record["labels"], record["scores"])
},
}
for record in data
]
Now you can convert this directly to a DataFrame:
df = pd.DataFrame(data_processed)