Pandas Dataframe: Fillna based on an existing values within another column

Question:

So I am trying to fill a dataframe like this:

import pandas as pd
import numpy as np

data = {'Name': ['James Martin', 'James Martin', 'Jill Valentine',
                 'Ben Murphy', 'Ben Murphy'],
        'Email': [np.nan, '[email protected]', '[email protected]', np.nan, '[email protected]']}
df = pd.DataFrame(data)
print(df)

# Output
             Name      Email
0    James Martin        NaN
1    James Martin  [email protected]
2  Jill Valentine  [email protected]
3      Ben Murphy        NaN
4      Ben Murphy  [email protected]

I want to fillna in the ‘Email’ column with an email but this email has to be an email that already exists in this column, so James Martin appears twice in this dataframe and I’d like to use the second entry that contains his email to fill the entry that doesn’t contain his email.

Like this:

# Output
             Name      Email
0    James Martin  [email protected]
1    James Martin  [email protected]
2  Jill Valentine  [email protected]
3      Ben Murphy  [email protected]
4      Ben Murphy  [email protected]

I’ve tried

df['Email'] = pd.np.where((df['Name']==df['Name']), df['Email'], '0')

But I’ve had no luck with any of these, anyone have any solutions. Thanks

(Apologies for the poor format, I really would like to know how to build a dataframe and see its output in my SO question.)

Asked By: shobiwankenobi

||

Answers:

You can use groupby_transform to get expected result:

df['Email'] = df['Email'].fillna(df.groupby('Name')['Email'].transform('first'))
print(df)

# Output
             Name      Email
0    James Martin  [email protected]
1    James Martin  [email protected]
2  Jill Valentine  [email protected]
3      Ben Murphy  [email protected]
4      Ben Murphy  [email protected]

Input dataframe:

import pandas as pd
import numpy as np

data = {'Name': ['James Martin', 'James Martin', 'Jill Valentine',
                 'Ben Murphy', 'Ben Murphy'],
        'Email': [np.nan, '[email protected]', '[email protected]', np.nan, '[email protected]']}
df = pd.DataFrame(data)
print(df)

# Output
             Name      Email
0    James Martin        NaN
1    James Martin  [email protected]
2  Jill Valentine  [email protected]
3      Ben Murphy        NaN
4      Ben Murphy  [email protected]
Answered By: Corralien
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.