Duplicating rows where a cell contains multiple pieces of data

Question:

I would like to take a dataframe and duplicate certain rows.
One column, called name, may have multiple names.
An example dataframe is contructed below:

data = [
    ['Joe', '17-11-2018', '2'],
    ['Karen', '17-11-2018', '4'],
    ['Bill, Avery', '17-11-2018', '6'],
    ['Sam', '18-11-2018', '4'],
    ['Alex, Frank', '18-11-2018', '6'],
    ['Chris', '18-11-2018', '8'],
]
df = pd.DataFrame(data, columns = ['name','date','number'])

This yields the following dataframe:

          name        date number
0          Joe  17-11-2018      2
1        Karen  17-11-2018      4
2  Bill, Avery  17-11-2018      6
3          Sam  18-11-2018      4
4  Alex, Frank  18-11-2018      6
5        Chris  18-11-2018      8

I would like to take all rows where there are multiple names (comma-separated) and duplicate them for each individual name. The resulting dataframe should look like this:

    name        date number
0    Joe  17-11-2018      2
1  Karen  17-11-2018      4
2   Bill  17-11-2018      6
3  Avery  17-11-2018      6
4    Sam  18-11-2018      4
5   Alex  18-11-2018      6
6  Frank  18-11-2018      6
7  Chris  18-11-2018      8
Asked By: Jack Walsh

||

Answers:

After str.split , it become a unnest problem

df['name']=df.name.str.split(',')

unnesting(df,['name'])
Out[97]: 
     name        date number
0     Joe  17-11-2018      2
1   Karen  17-11-2018      4
2    Bill  17-11-2018      6
2   Avery  17-11-2018      6
3     Sam  18-11-2018      4
4    Alex  18-11-2018      6
4   Frank  18-11-2018      6
5   Chris  18-11-2018      8

def unnesting(df, explode):
    idx = df.index.repeat(df[explode[0]].str.len())
    df1 = pd.concat([
        pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
    df1.index = idx
    return df1.join(df.drop(explode, 1), how='left')
Answered By: BENY

Jack. I don’t use dataframes much, but the following code should work before df = pd.DataFrame(data, columns = ['name','date','number'])

new_data = []
for item in data:
    if "," in item[0]:
        new_data.append([item[0].split(", ")[0], item[1], item[2]])
        new_data.append([item[0].split(", ")[1], item[1], item[2]])
    else:
        new_data.append(item)
Answered By: TechPerson

Update 2023

All the answers have old methods, we now have Series.explode method, which can unnest a list. So the modern way to do this is:

df.assign(name=df["name"].str.split(", ")).explode("name", ignore_index=True)

    name        date number
0    Joe  17-11-2018      2
1  Karen  17-11-2018      4
2   Bill  17-11-2018      6
3  Avery  17-11-2018      6
4    Sam  18-11-2018      4
5   Alex  18-11-2018      6
6  Frank  18-11-2018      6
7  Chris  18-11-2018      8

Old answer

For a string with a separator you can use the following function found in this answer:

def explode_str(df, col, sep):
    s = df[col]
    i = np.arange(len(s)).repeat(s.str.count(sep) + 1)
    return df.iloc[i].assign(**{col: sep.join(s).split(sep)})

explode_str(df, 'name', ',')

     name        date number
0     Joe  17-11-2018      2
1   Karen  17-11-2018      4
2    Bill  17-11-2018      6
2   Avery  17-11-2018      6
3     Sam  18-11-2018      4
4    Alex  18-11-2018      6
4   Frank  18-11-2018      6
5   Chris  18-11-2018      8
Answered By: Erfan

I believe I read something like this.
As soon as I locate the link, I will share it.

from itertools import chain
import time
from numpy.matlib import randn
from pandas import DataFrame as df
import numpy as np
import pandas as pd 
from itertools import chain
from numpy.matlib import randn
from pandas import DataFrame as df
import numpy as np
import pandas as pd 
import re

data = [
    ['J', '17-11-2018', '2'],
    ['K', '17-11-2018', '4'],
    ['B, A', '17-11-2018', '6'],
    ['S', '18-11-2018', '4'],
    ['L, F', '18-11-2018', '6'],
    ['C', '18-11-2018', '8'],
]
df = pd.DataFrame(data, columns = ['P','Q','R'])
print(df)

"""
      P           Q  R
0     J  17-11-2018  2
1     K  17-11-2018  4
2  B, A  17-11-2018  6
3     S  18-11-2018  4
4  L, F  18-11-2018  6
5     C  18-11-2018  8
                    
"""


m1 = (lambda col:pd.Series(col).str.split(','))
aa = df.set_index(['R','Q']).apply(m1)
print(aa)
"""
                   P
R Q                  
2 17-11-2018      [J]
4 17-11-2018      [K]
6 17-11-2018  [B,  A]
4 18-11-2018      [S]
6 18-11-2018  [L,  F]
8 18-11-2018      [C]

"""
res = aa.explode('P').reset_index().reindex(df.columns,axis=1)
print(res)

"""
  P           Q  R
0   J  17-11-2018  2
1   K  17-11-2018  4
2   B  17-11-2018  6
3   A  17-11-2018  6
4   S  18-11-2018  4
5   L  18-11-2018  6
6   F  18-11-2018  6
7   C  18-11-2018  8

"""
Answered By: Soudipta Dutta
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.