Splitting a column with delimiter and place a value in the right column

Question:

I have a data frame with a column that potentially can be filled with 3 options (a,b, and/or c) with a comma delimiter.

import pandas as pd

df = pd.DataFrame({'col1':['a,b,c', 'b', 'a,c', 'b,c', 'a,b']})

I want to split this column based on ‘,’

df['col1'].str.split(',', expand=True)

A problem with this is that new columns are filled from the first column where I want to fill the columns based on values.

For example all a’s in the first column, b’s in the second column, c’s in the third column.

Asked By: Yun Tae Hwang

||

Answers:

Instead of expand, we explode into a long format then pivot.

df['col1'].str.split(',').explode().reset_index().pivot(index = 'index', columns = 'col1', values = 'col1')
Answered By: Michael Cao

Here is another method, using .crosstab:

df = df.assign(col1=df["col1"].str.split(",")).explode("col1")
df = pd.crosstab(df.index, df["col1"]).rename_axis(index=None, columns=None)
df = df * df.columns # if you want only 0-1 indices if there's value, you can omit this step 

print(df)

Prints:

   a  b  c
0  a  b  c
1     b   
2  a     c
3     b  c
4  a  b   

To rename columns:

df = df.rename(columns={"a": "col1", "b": "col2", "c": "col3"})
Answered By: Andrej Kesely

Using str.get_dummies:

tmp = df['col1'].str.get_dummies(',')

out = tmp.mul(tmp.columns)

Output:

   a  b  c
0  a  b  c
1     b   
2  a     c
3     b  c
4  a  b   

With NaNs and custom headers:

tmp = df['col1'].str.get_dummies(',')

out = (tmp.mul(tmp.columns).where(tmp>0)
          .rename(columns={'a': 'X', 'b': 'Y', 'c': 'Z'})
       )

Output:

     X    Y    Z
0    a    b    c
1  NaN    b  NaN
2    a  NaN    c
3  NaN    b    c
4    a    b  NaN
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.