Pandas Dataframe: Fillna based on an existing values within another column
Question:
So I am trying to fill a dataframe like this:
import pandas as pd
import numpy as np
data = {'Name': ['James Martin', 'James Martin', 'Jill Valentine',
'Ben Murphy', 'Ben Murphy'],
'Email': [np.nan, '[email protected]', '[email protected]', np.nan, '[email protected]']}
df = pd.DataFrame(data)
print(df)
# Output
Name Email
0 James Martin NaN
1 James Martin [email protected]
2 Jill Valentine [email protected]
3 Ben Murphy NaN
4 Ben Murphy [email protected]
I want to fillna in the ‘Email’ column with an email but this email has to be an email that already exists in this column, so James Martin appears twice in this dataframe and I’d like to use the second entry that contains his email to fill the entry that doesn’t contain his email.
Like this:
# Output
Name Email
0 James Martin [email protected]
1 James Martin [email protected]
2 Jill Valentine [email protected]
3 Ben Murphy [email protected]
4 Ben Murphy [email protected]
I’ve tried
df['Email'] = pd.np.where((df['Name']==df['Name']), df['Email'], '0')
But I’ve had no luck with any of these, anyone have any solutions. Thanks
(Apologies for the poor format, I really would like to know how to build a dataframe and see its output in my SO question.)
Answers:
You can use groupby_transform
to get expected result:
df['Email'] = df['Email'].fillna(df.groupby('Name')['Email'].transform('first'))
print(df)
# Output
Name Email
0 James Martin [email protected]
1 James Martin [email protected]
2 Jill Valentine [email protected]
3 Ben Murphy [email protected]
4 Ben Murphy [email protected]
Input dataframe:
import pandas as pd
import numpy as np
data = {'Name': ['James Martin', 'James Martin', 'Jill Valentine',
'Ben Murphy', 'Ben Murphy'],
'Email': [np.nan, '[email protected]', '[email protected]', np.nan, '[email protected]']}
df = pd.DataFrame(data)
print(df)
# Output
Name Email
0 James Martin NaN
1 James Martin [email protected]
2 Jill Valentine [email protected]
3 Ben Murphy NaN
4 Ben Murphy [email protected]
So I am trying to fill a dataframe like this:
import pandas as pd
import numpy as np
data = {'Name': ['James Martin', 'James Martin', 'Jill Valentine',
'Ben Murphy', 'Ben Murphy'],
'Email': [np.nan, '[email protected]', '[email protected]', np.nan, '[email protected]']}
df = pd.DataFrame(data)
print(df)
# Output
Name Email
0 James Martin NaN
1 James Martin [email protected]
2 Jill Valentine [email protected]
3 Ben Murphy NaN
4 Ben Murphy [email protected]
I want to fillna in the ‘Email’ column with an email but this email has to be an email that already exists in this column, so James Martin appears twice in this dataframe and I’d like to use the second entry that contains his email to fill the entry that doesn’t contain his email.
Like this:
# Output
Name Email
0 James Martin [email protected]
1 James Martin [email protected]
2 Jill Valentine [email protected]
3 Ben Murphy [email protected]
4 Ben Murphy [email protected]
I’ve tried
df['Email'] = pd.np.where((df['Name']==df['Name']), df['Email'], '0')
But I’ve had no luck with any of these, anyone have any solutions. Thanks
(Apologies for the poor format, I really would like to know how to build a dataframe and see its output in my SO question.)
You can use groupby_transform
to get expected result:
df['Email'] = df['Email'].fillna(df.groupby('Name')['Email'].transform('first'))
print(df)
# Output
Name Email
0 James Martin [email protected]
1 James Martin [email protected]
2 Jill Valentine [email protected]
3 Ben Murphy [email protected]
4 Ben Murphy [email protected]
Input dataframe:
import pandas as pd
import numpy as np
data = {'Name': ['James Martin', 'James Martin', 'Jill Valentine',
'Ben Murphy', 'Ben Murphy'],
'Email': [np.nan, '[email protected]', '[email protected]', np.nan, '[email protected]']}
df = pd.DataFrame(data)
print(df)
# Output
Name Email
0 James Martin NaN
1 James Martin [email protected]
2 Jill Valentine [email protected]
3 Ben Murphy NaN
4 Ben Murphy [email protected]