Filling data in one column if there are matching values in another column

Question:

I have a DF with parent/child items and I need to associate a time for the parent to all the children items. The time is only listed when the parent matches the child and I need that time to populate on all the children.

This is a simple example.

data = {

     'Parent' : ['a123', 'a123', 'a123', 'a123', 'a234', 'a234', 'a234', 'a234'],
     'Child' : ['a123', 'a1231', 'a1232', 'a1233', 'a2341', 'a234', 'a2342', 'a2343'],
     'Time' : [51, 0, 0, 0, 0, 39, 0, 0],
}

The expected results are:

results= {

     'Parent' : ['a123', 'a123', 'a123', 'a123', 'a234', 'a234', 'a234', 'a234'],
     'Child' : ['a123', 'a1231', 'a1232', 'a1233', 'a2341', 'a234', 'a2342', 'a2343'],
     'Time' : [51, 51, 51, 51, 39, 39, 39, 39],
}

Seems like it should be easy, but I can’t wrap my head around where to start.

Asked By: Josh L

||

Answers:

We could try this:

df.groupby('Parent').apply(lambda x: x['Time'].where(x['Parent'].eq(x['Child'])).fillna(method = 'ffill').fillna(method = 'bfill')).reset_index()


  Parent  level_1  Time
0   a123        0  51.0
1   a123        1  51.0
2   a123        2  51.0
3   a123        3  51.0
4   a234        4  39.0
5   a234        5  39.0
6   a234        6  39.0
7   a234        7  39.0
Answered By: Anoushiravan R

You can create the Series that maps the Time for each Parent and then use that to set the time column. This works assuming there is only ever a single unique ‘Time’ for each Parent.

s = (df.query('Parent == Child')
      .drop_duplicates('Parent')
      .set_index('Parent')['Time'])
#Parent
#a123    51
#a234    39
#Name: Time, dtype: int64

df['Time'] = df['Parent'].map(s)

print(df)
#  Parent  Child  Time
#0   a123   a123    51
#1   a123  a1231    51
#2   a123  a1232    51
#3   a123  a1233    51
#4   a234  a2341    39
#5   a234   a234    39
#6   a234  a2342    39
#7   a234  a2343    39
Answered By: ALollz

If time is positive for the parent, or null, you can use a simple groupby.transform('max'):

df['Time'] = df.groupby('Parent')['Time'].transform('max')

Else, you can use:

df['Time'] = (df['Time']
 .where(df['Parent'].eq(df['Child']))
 .groupby(df['Parent']).transform('first')
 .convert_dtypes()
)

Output:

  Parent  Child  Time
0   a123   a123    51
1   a123  a1231    51
2   a123  a1232    51
3   a123  a1233    51
4   a234  a2341    39
5   a234   a234    39
6   a234  a2342    39
7   a234  a2343    39
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.