Fill empty rows based on condition other column pandas
Question:
I’m struggling in Python (Pandas) with a way to fill empty rows from one column based in the following example:
| email | run | other cols ....
| [email protected] | 12345 |
| [email protected] | 134254 |
| [email protected] | 23232 |
| [email protected] | |
| | 134254 |
| | 134254 |
| [email protected] | |
due I have other columns, the rows aren’t duplicates, so I would like to fill the empty rows depending if I have the same information in other rows like this:
| email | run | other cols ....
| [email protected] | 12345 |
| [email protected] | 134254 |
| [email protected] | 23232 |
| [email protected] | 23232 |
| [email protected] | 134254 |
| [email protected] | 134254 |
| [email protected] | 12345 |
Anyone could help me please?
Answers:
You can perform several groupby
:
out = df.assign(run=df['run'].fillna(df.groupby('email')['run'].transform('first')),
email=df['email'].fillna(df.groupby('run')['email'].transform('first'))
)
Using a helper function:
def fill_from(target, group, df=df):
return df[target].fillna(df.groupby(group)[target].transform('first'))
out = df.assign(run=fill_from('run', 'email'), email=fill_from('email', 'run'))
Output:
email run other cols
0 [email protected] 12345.0 NaN
1 [email protected] 134254.0 NaN
2 [email protected] 23232.0 NaN
3 [email protected] 23232.0 NaN
4 [email protected] 134254.0 NaN
5 [email protected] 134254.0 NaN
6 [email protected] 12345.0 NaN
I’m struggling in Python (Pandas) with a way to fill empty rows from one column based in the following example:
| email | run | other cols ....
| [email protected] | 12345 |
| [email protected] | 134254 |
| [email protected] | 23232 |
| [email protected] | |
| | 134254 |
| | 134254 |
| [email protected] | |
due I have other columns, the rows aren’t duplicates, so I would like to fill the empty rows depending if I have the same information in other rows like this:
| email | run | other cols ....
| [email protected] | 12345 |
| [email protected] | 134254 |
| [email protected] | 23232 |
| [email protected] | 23232 |
| [email protected] | 134254 |
| [email protected] | 134254 |
| [email protected] | 12345 |
Anyone could help me please?
You can perform several groupby
:
out = df.assign(run=df['run'].fillna(df.groupby('email')['run'].transform('first')),
email=df['email'].fillna(df.groupby('run')['email'].transform('first'))
)
Using a helper function:
def fill_from(target, group, df=df):
return df[target].fillna(df.groupby(group)[target].transform('first'))
out = df.assign(run=fill_from('run', 'email'), email=fill_from('email', 'run'))
Output:
email run other cols
0 [email protected] 12345.0 NaN
1 [email protected] 134254.0 NaN
2 [email protected] 23232.0 NaN
3 [email protected] 23232.0 NaN
4 [email protected] 134254.0 NaN
5 [email protected] 134254.0 NaN
6 [email protected] 12345.0 NaN