Pandas – Duplicate rows with phone numbers based on type

Question

My dataframe is in the following format:

name	phone	other_phone
alice	(111) 111-1111	(222) 222-2222, (333) 333-3333
bob	(444) 444-4444	(555) 555-5555, (666) 666-6666
colin	(777) 777-7777	(888) 888-8888
david	(999) 999-9999	NaN

I want to split the other_phone column on the comma (if there is one) and duplicate rows such that the output is:

name	phone	phone_type
alice	(111) 111-1111	work
alice	(222) 222-2222	other
alice	(333) 333-3333	other
bob	(444) 444-4444	work
bob	(555) 555-5555	other
bob	(666) 666-6666	other
colin	(777) 777-7777	work
colin	(888) 888-8888	other
david	(999) 999-9999	work

The overall goal is to prevent there being multiple phone numbers in a single row. How could I accomplish this?

Asked By: KLG

||

Source

Answer 1

This will (obviously) take a little bit of reshaping! Specifically we’ll need to create 2 separate DataFrames that have the necessary "phone" and "name", then your "phone_type" is derived based on what DataFrame you belong to.

The easiest part is to create a work_phones DataFrame, since that’s already nicely represented by the data.
A little bit trickier will be to split out the "other_phone" data. We’ll need to split these values and then explode them to vertically stack all of the values. Finally we’ll need to reach back into the source DataFrame to grab the correct "name"s for each phone number.
Finally, we stick these 2 parts on top of each other via pd.concat!

Assuming your DataFrame is stored in a variable called df…

import pandas as pd

# easy
work_phones = df[['name', 'phone']]

# little trickier
other_phones = (
    df['other_phone'].str.split(',')
    .explode()
    .to_frame('phone')
    
    # grab the correct name from the original DataFrame
    .join(df['name'])
)

Now that we have our parts, we just need to stack them on top of each other with pd.concat and a few arguments.

final = (
    pd.concat([work_phones, other_phones], names=['phone_type'], keys=['work', 'other'])
    .reset_index(level=0)

    # sort the data to match OP output
    .sort_values(['name', 'phone_type'], ascending=[True, False])
)

print(final)
  phone_type   name            phone
0       work  alice   (111) 111-1111
0      other  alice   (222) 222-2222
0      other  alice   (333) 333-3333
1       work    bob   (444) 444-4444
1      other    bob   (555) 555-5555
1      other    bob   (666) 666-6666
2       work  colin   (777) 777-7777
2      other  colin   (888) 888-8888
3       work  david   (999) 999-9999
3      other  david              NaN

Answered By: Cameron Riddell

Answer 2

import time
import timeit
from pandas import DataFrame
import numpy as np
import pandas as pd
from datetime import datetime

import pandas as pd
import numpy as np

df = pd.DataFrame({'name': ['alice', 'bob', 'colin', 'david'],
                   'phone': ['111', '444', '777', '999'],
                   'other_phone':
                   ['222, 333', '555, 666', '888', np.nan]})
print(df)
"""
    name phone other_phone
0  alice   111    222, 333
1    bob   444    555, 666
2  colin   777         888
3  david   999         NaN
"""

df = (
    df.join(
        df.pop( 'other_phone').str.split(',',expand=True)
        .stack()
        .reset_index(level=1,drop=True)
        .rename( 'other_phone')
        
        )
    
    )
print(df)
"""
    name phone other_phone
0  alice   111         222
0  alice   111         333
1    bob   444         555
1    bob   444         666
2  colin   777         888
3  david   999         NaN
"""
workfones = df[['name','phone']]
print(workfones)
"""
    name phone
0  alice   111
0  alice   111
1    bob   444
1    bob   444
2  colin   777
3  david   999
"""
homefones =  df[['name','other_phone']].rename(columns={'other_phone':'phone'})
print(homefones)

"""
    name other_phone
0  alice         222
0  alice         333
1    bob         555
1    bob         666
2  colin         888
3  david         NaN
"""
res = pd.concat([workfones, homefones], names=['phone_type'], keys=['office', 'home'])
   
print(res)

"""
              name phone
phone_type               
office     0  alice   111
           0  alice   111
           1    bob   444
           1    bob   444
           2  colin   777
           3  david   999
home       0  alice   222
           0  alice   333
           1    bob   555
           1    bob   666
           2  colin   888
           3  david   NaN

"""

res1 = (res.reset_index(level=0) )

print(res1)
"""
phone_type   name phone
0     office  alice   111
0     office  alice   111
1     office    bob   444
1     office    bob   444
2     office  colin   777
3     office  david   999
0       home  alice   222
0       home  alice   333
1       home    bob   555
1       home    bob   666
2       home  colin   888
3       home  david   NaN
"""
res2 = res1.sort_values(by=['name','phone_type'])
print(res2)
"""
 phone_type   name phone
0       home  alice   222
0       home  alice   333
0     office  alice   111
0     office  alice   111
1       home    bob   555
1       home    bob   666
1     office    bob   444
1     office    bob   444
2       home  colin   888
2     office  colin   777
3       home  david   NaN
3     office  david   999
"""

res3 = res2.reset_index(level=0).drop('index', axis=1)
print(res3)
"""
   phone_type   name phone
0        home  alice   222
1        home  alice   333
2      office  alice   111
3      office  alice   111
4        home    bob   555
5        home    bob   666
6      office    bob   444
7      office    bob   444
8        home  colin   888
9      office  colin   777
10       home  david   NaN
11     office  david   999
"""

Answered By: Soudipta Dutta

Pandas – Duplicate rows with phone numbers based on type

Question:

Answers: