Setting specific rows to the value found in a row if differing index

Question:

I work with a lot of CSV data for my job. I am trying to use Pandas to convert the member ‘Email’ to populate into the row of their spouses ‘PrimaryMemberEmail’ column. Here is a sample of what I mean:

import pandas as pd

user_data = {'FirstName':['John','Jane','Bob'],
             'Lastname':['Snack','Snack','Tack'],
             'EmployeeID':['12345','12345S','54321'],
             'Email':['[email protected]','NaN','[email protected]'],
             'DOB':['09/07/1988','12/25/1990','07/13/1964'],
             'Role':['Employee On Plan','Spouse On Plan','Employee Off Plan'],
             'PrimaryMemberEmail':['NaN','NaN','NaN'],
             'PrimaryMemberEmployeeId':['NaN','12345','NaN']
             }

df = pd.DataFrame(user_data)

I have thousands of rows like this. I need to only populate the ‘PrimaryMemberEmail’ when the user is a spouse with the ‘Email’ of their associated primary holders email. So in this case I would want to autopopulate the ‘PrimaryMemberEmail’ for Jane Snack to be that of her spouse, John Snack, which is ‘[email protected]’ I cannot find a good way to do this. currently I am using:

for i in (df['EmployeeId']):
    p = (p + len(df['EmployeeId']) - (len(df['EmployeeId'])-1))
    EEID = df['EmployeeId'].iloc[p]

    if 'S' in EEID:
        df['PrimaryMemberEmail'].iloc[p] = df['Email'].iloc[p-1]

What bothers me is that this only works if my file comes in correctly, like how I showed in the example DataFrame. Also my NaN values do not work with dropna() or other methods, but that is a question for another time.

I am new to python and programming. I am trying to add value to myself in my current health career and I find this all very fascinating. Any help is appreciated.

Asked By: Pongotan

||

Answers:

IIUC, map the values and fillna:

df['PrimaryMemberEmail'] = (df['PrimaryMemberEmployeeId']
                            .map(df.set_index('EmployeeID')['PrimaryMemberEmail'])
                            .fillna(df['PrimaryMemberEmail'])
                           )

Alternatively, if you have real NaNs, (not strings), use boolean indexing:

df.loc[df['PrimaryMemberEmployeeId'].notna(),
       'PrimaryMemberEmail'] = df['PrimaryMemberEmployeeId'].map(df.set_index('EmployeeID')['PrimaryMemberEmail'])

output:

  FirstName Lastname EmployeeID         DOB               Role PrimaryMemberEmail PrimaryMemberEmployeeId
0      John     Mack      12345  09/07/1988   Employee On Plan    [email protected]                     NaN
1      Jane    Snack     12345S  12/25/1990     Spouse On Plan    [email protected]                   12345
2       Bob     Tack      54321  07/13/1964  Employee Off Plan     [email protected]                     NaN
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.