Explode is not working on pandas dataframe

Question:

I have a dataframe with following columns

    col1        col2            col3                  col4            
0   HP:0005709  ['HP:0001770']  Toe syndactyly        SNOMEDCT_US:32113001, C0265660
1   HP:0005709  ['HP:0001780']  Abnormality of toe    C2674738
2   EFO:0009136 ['HP:0001507']  Growth abnormality    C0262361

I would like to explode "col4", i tried different ways to do it but nothing is working.
The dtype of the column is "object".

My tries are following:

  1. df.explode('col4')

df['col4']=df['col4'].str.split(',') df = df.set_index(['col2']).apply(pd.Series.explode).reset_index()

  1. import ast df[['col4']] = df[['col4']].applymap(ast.literal_eval) df = df.apply(pd.Series.explode)

The expected output is:

    col1        col2            col3                col4                
0   HP:0005709  ['HP:0001770']  Toe syndactyly      SNOMEDCT_US:32113001
0   HP:0005709  ['HP:0001770']  Toe syndactyly      C0265660
1   HP:0005709  ['HP:0001780']  Abnormality of toe  C2674738
2   EFO:0009136 ['HP:0001507']  Growth abnormality  C0262361
Asked By: rshar

||

Answers:

Your input data is confusing because it has 5 headers for only 4 columns (or is the index a "normal" column?). In order to explode col4 first split it to convert its element to list, then explode:

df['col4'] = df['col4'].str.split(',s*', regex=True)
df = df.explode('col4')

Output:

          col1            col2                col3                  col4
0   HP:0005709  ['HP:0001770']      Toe syndactyly  SNOMEDCT_US:32113001
0   HP:0005709  ['HP:0001770']      Toe syndactyly              C0265660
1   HP:0005709  ['HP:0001780']  Abnormality of toe              C2674738
2  EFO:0009136  ['HP:0001507']  Growth abnormality              C0262361
Answered By: Tranbi

IIUC, try:

out = df.assign(**{'col4': df['col4'].str.split(', ')}).explode('col4')
print(out)

# Output
          col1            col2                col3                  col4
0   HP:0005709  ['HP:0001770']      Toe syndactyly  SNOMEDCT_US:32113001
0   HP:0005709  ['HP:0001770']      Toe syndactyly              C0265660
1   HP:0005709  ['HP:0001780']  Abnormality of toe              C2674738
2  EFO:0009136  ['HP:0001507']  Growth abnormality              C0262361
Answered By: Corralien
columns = ['col1', 'col2', 'col3', 'col4', 'col5']
data = [['0', 'HP:0005709', "['HP:0001770']", 'Toe syndactyly','SNOMEDCT_US:32113001, C0265660'],
        ['1', 'HP:0005709', "['HP:0001780']", 'Abnormality of toe', 'C2674738'],
        ['2', 'EFO:0009136', "['HP:0001507']", 'Growth abnormality', 'C0262361']]

df = pd.DataFrame(data,columns=columns)
print(df)
    col1    col2        col3            col4                col5
0   0       HP:0005709  ['HP:0001770']  Toe syndactyly      SNOMEDCT_US:32113001, C0265660
1   1       HP:0005709  ['HP:0001780']  Abnormality of toe  C2674738
2   2       EFO:0009136 ['HP:0001507']  Growth abnormality  C0262361
column = 'col5'
df[column] = df[column].str.split(',')
new_df = df.explode('col5')
print(new_df)
    col1    col2        col3            col4                col5
0   0       HP:0005709  ['HP:0001770']  Toe syndactyly      SNOMEDCT_US:32113001
0   0       HP:0005709  ['HP:0001770']  Toe syndactyly      C0265660
1   1       HP:0005709  ['HP:0001780']  Abnormality of toe  C2674738
2   2       EFO:0009136 ['HP:0001507']  Growth abnormality  C0262361
Answered By: Mazhar