Filling data in one column if there are matching values in another column
Question:
I have a DF with parent/child items and I need to associate a time for the parent to all the children items. The time is only listed when the parent matches the child and I need that time to populate on all the children.
This is a simple example.
data = {
'Parent' : ['a123', 'a123', 'a123', 'a123', 'a234', 'a234', 'a234', 'a234'],
'Child' : ['a123', 'a1231', 'a1232', 'a1233', 'a2341', 'a234', 'a2342', 'a2343'],
'Time' : [51, 0, 0, 0, 0, 39, 0, 0],
}
The expected results are:
results= {
'Parent' : ['a123', 'a123', 'a123', 'a123', 'a234', 'a234', 'a234', 'a234'],
'Child' : ['a123', 'a1231', 'a1232', 'a1233', 'a2341', 'a234', 'a2342', 'a2343'],
'Time' : [51, 51, 51, 51, 39, 39, 39, 39],
}
Seems like it should be easy, but I can’t wrap my head around where to start.
Answers:
We could try this:
df.groupby('Parent').apply(lambda x: x['Time'].where(x['Parent'].eq(x['Child'])).fillna(method = 'ffill').fillna(method = 'bfill')).reset_index()
Parent level_1 Time
0 a123 0 51.0
1 a123 1 51.0
2 a123 2 51.0
3 a123 3 51.0
4 a234 4 39.0
5 a234 5 39.0
6 a234 6 39.0
7 a234 7 39.0
You can create the Series that maps the Time for each Parent and then use that to set the time column. This works assuming there is only ever a single unique ‘Time’ for each Parent.
s = (df.query('Parent == Child')
.drop_duplicates('Parent')
.set_index('Parent')['Time'])
#Parent
#a123 51
#a234 39
#Name: Time, dtype: int64
df['Time'] = df['Parent'].map(s)
print(df)
# Parent Child Time
#0 a123 a123 51
#1 a123 a1231 51
#2 a123 a1232 51
#3 a123 a1233 51
#4 a234 a2341 39
#5 a234 a234 39
#6 a234 a2342 39
#7 a234 a2343 39
If time is positive for the parent, or null, you can use a simple groupby.transform('max')
:
df['Time'] = df.groupby('Parent')['Time'].transform('max')
Else, you can use:
df['Time'] = (df['Time']
.where(df['Parent'].eq(df['Child']))
.groupby(df['Parent']).transform('first')
.convert_dtypes()
)
Output:
Parent Child Time
0 a123 a123 51
1 a123 a1231 51
2 a123 a1232 51
3 a123 a1233 51
4 a234 a2341 39
5 a234 a234 39
6 a234 a2342 39
7 a234 a2343 39
I have a DF with parent/child items and I need to associate a time for the parent to all the children items. The time is only listed when the parent matches the child and I need that time to populate on all the children.
This is a simple example.
data = {
'Parent' : ['a123', 'a123', 'a123', 'a123', 'a234', 'a234', 'a234', 'a234'],
'Child' : ['a123', 'a1231', 'a1232', 'a1233', 'a2341', 'a234', 'a2342', 'a2343'],
'Time' : [51, 0, 0, 0, 0, 39, 0, 0],
}
The expected results are:
results= {
'Parent' : ['a123', 'a123', 'a123', 'a123', 'a234', 'a234', 'a234', 'a234'],
'Child' : ['a123', 'a1231', 'a1232', 'a1233', 'a2341', 'a234', 'a2342', 'a2343'],
'Time' : [51, 51, 51, 51, 39, 39, 39, 39],
}
Seems like it should be easy, but I can’t wrap my head around where to start.
We could try this:
df.groupby('Parent').apply(lambda x: x['Time'].where(x['Parent'].eq(x['Child'])).fillna(method = 'ffill').fillna(method = 'bfill')).reset_index()
Parent level_1 Time
0 a123 0 51.0
1 a123 1 51.0
2 a123 2 51.0
3 a123 3 51.0
4 a234 4 39.0
5 a234 5 39.0
6 a234 6 39.0
7 a234 7 39.0
You can create the Series that maps the Time for each Parent and then use that to set the time column. This works assuming there is only ever a single unique ‘Time’ for each Parent.
s = (df.query('Parent == Child')
.drop_duplicates('Parent')
.set_index('Parent')['Time'])
#Parent
#a123 51
#a234 39
#Name: Time, dtype: int64
df['Time'] = df['Parent'].map(s)
print(df)
# Parent Child Time
#0 a123 a123 51
#1 a123 a1231 51
#2 a123 a1232 51
#3 a123 a1233 51
#4 a234 a2341 39
#5 a234 a234 39
#6 a234 a2342 39
#7 a234 a2343 39
If time is positive for the parent, or null, you can use a simple groupby.transform('max')
:
df['Time'] = df.groupby('Parent')['Time'].transform('max')
Else, you can use:
df['Time'] = (df['Time']
.where(df['Parent'].eq(df['Child']))
.groupby(df['Parent']).transform('first')
.convert_dtypes()
)
Output:
Parent Child Time
0 a123 a123 51
1 a123 a1231 51
2 a123 a1232 51
3 a123 a1233 51
4 a234 a2341 39
5 a234 a234 39
6 a234 a2342 39
7 a234 a2343 39