Split Pandas column [{}] into several columns? in python

Question:

im getting confused with the data type of my pandas dataframe and dont know how to split my entries into several columns.

Data looks like:

       Name1                           Name2 
0  [0.1,0.2,0.3]     [{'label': 'Neutral',  'score': 0.60}]
1  [0.4,0.5,0.6]     [{'label': 'Negative', 'score': 0.60}]
2  [0.7,0.8,0.9]     [{'label': 'Positive', 'score': 0.60}]

The result should look like:

       Name1       N1    N2    N3                  Name2                    Label     Score
0  [0.1,0.2,0.3]  0.1   0.2   0.3   [{'label': 'Neutral','score': 0.60}]   Neutral    0.60
1  [0.4,0.5,0.6]  0.4.  0.5.  0.6   [{'label': 'Negative','score': 0.60}]  Negative   0.60
2  [0.7,0.8,0.9]  0.7   0.8   0.9   [{'label': 'Positive','score': 0.60}]  Positive   0.60

original sample

Not quite confident with python but i need to work with a large dataset of a fwe 100k entries.

Help much appreciated!

Best

Asked By: Laurence Bach

||

Answers:

You can use to_list() function on a specific column to make columns out of list.

More of that you can find under this link:
https://datascienceparichay.com/article/split-pandas-column-of-lists-into-multiple-columns/

To do a similar thing with dict refer to this page:
https://stackoverflow.com/questions/38231591/split-explode-a-column-of-dictionaries-into-separate-columns-with-pandas

Answered By: mlokos

You can use pandas.DataFrame.join and pandas.Series.tolist.

df = df.join(
    pd.DataFrame(df['Name1'].tolist(), columns=['N1', 'N2', 'N3']
                )).join(pd.DataFrame(df['Name2'].apply(lambda x: x[0]).tolist()))

print(df)

Output:

             Name1                                  Name2   N1   N2   N3   label       score  
0  [0.1, 0.2, 0.3]   [{'label': 'Neutral', 'score': 0.6}]  0.1  0.2  0.3   Neutral     0.6   
1  [0.4, 0.5, 0.6]  [{'label': 'Negative', 'score': 0.6}]  0.4  0.5  0.6   Negative    0.6     
2  [0.7, 0.8, 0.9]  [{'label': 'Positive', 'score': 0.6}]  0.7  0.8  0.9   Positive    0.6   

Input DataFrame:

df = pd.DataFrame({
    'Name1' : [[0.1,0.2,0.3], [0.4,0.5,0.6], [0.7,0.8,0.9]]   , 
    'Name2' : [
        [{'label': 'Neutral',  'score': 0.60}], 
        [{'label': 'Negative', 'score': 0.60}],
        [{'label': 'Positive', 'score': 0.60}]
    ]
})
Answered By: I'mahdi
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.