Groupby and if else condition

Question:

I have a dataframe like this:

df1 = pd.DataFrame({"ID1" : [1,1,1,1,1,1,1,1,1,1,1,1,
                       2,2,2,2,2,2,2,2,2,2,2,2,
                       3,3,3,3,3,3,3,3,3,3,3,3],
              
              "ID2" : ["A","A","A","A", "B","B","B","B", "C","C","C","C",
                       "A","A","A","A", "B","B","B","B", "C","C","C","C",
                       "A","A","A","A", "B","B","B","B", "C","C","C","C"],
              
              "value" : [1,2,3,4,10,20,30,40,100,200,300,400,
                      11,12,13,14,101,202,303,404,1001,2002,3003,4004,
                      15,23,33,45,107,204,302,405,1005,2006,3070,4080],
              
              "label": ["old", "new","old", "new","old", "new","old", "new","old", "new","old", "new",
                        "old", "new","old", "new","old", "new","old", "new","old", "new","old", "new",
                        "old", "new","old", "new","old", "new","old", "new","outdated", "new","outdated", "new"]})

For each combination of ID1 and ID2, I need to replace the new value with the first old value of that combination (in this example, replace all new values for (ID1=1 and ID2=A) with 1 and for (ID1=1 and ID2=B) with 10 and so on… for every ID1 + ID2 combination.
The resulting dataset should look something like this:


ID1 ID2 value   label
0   1   A   1   old
1   1   A   1   new
2   1   A   3   old
3   1   A   1   new
4   1   B   10  old
5   1   B   10  new
6   1   B   30  old
7   1   B   10  new
8   1   C   100 old
9   1   C   100 new
10  1   C   300 old
11  1   C   100 new
12  2   A   11  old
13  2   A   11  new
14  2   A   13  old
15  2   A   11  new
16  2   B   101 old
17  2   B   101 new
18  2   B   303 old
19  2   B   101 new
20  2   C   1001 old
21  2   C   1001 new
22  2   C   3003 old
23  2   C   1001 new
24  3   A   15  old
25  3   A   15  new
26  3   A   33  old
27  3   A   15  new
28  3   B   107 old
29  3   B   107 new
30  3   B   302 old
31  3   B   107 new
32  3   C   1005 outdated
33  3   C   1005 new
34  3   C   3070 outdated
35  3   C   1005 new

I tried defining a function for this and then applying that with the groupby statement but this doesn’t work:

def new_f(df_group):
    if df_group['label'=='new']:
        df_group['modified'] = df_group['value'][0]
    else:
        df_group['modified'] = df_group['value']
df2 = df1.groupby(["ID1","ID2"],as_index = False ).apply(new_f)

Any help would be much appreciated, thanks!

Asked By: aseb

||

Answers:

You can mask per group:

df1['value'] = (df1
    .groupby(['ID1', 'ID2'], group_keys=False)
    .apply(lambda g: g['value'].mask((m:=g['label'].eq('new')),
                                     g.loc[~m, 'value'].iloc[0]))
)

output:

    ID1 ID2  value label
0     1   A      1   old
1     1   A      1   new
2     1   A      3   old
3     1   A      1   new
4     1   B     10   old
5     1   B     10   new
6     1   B     30   old
7     1   B     10   new
8     1   C    100   old
9     1   C    100   new
10    1   C    300   old
11    1   C    100   new
12    2   A     11   old
13    2   A     11   new
14    2   A     13   old
15    2   A     11   new
16    2   B    101   old
17    2   B    101   new
18    2   B    303   old
19    2   B    101   new
20    2   C   1001   old
21    2   C   1001   new
22    2   C   3003   old
23    2   C   1001   new
24    3   A     15   old
25    3   A     15   new
26    3   A     33   old
27    3   A     15   new
28    3   B    107   old
29    3   B    107   new
30    3   B    302   old
31    3   B    107   new
32    3   C   1005   old
33    3   C   1005   new
34    3   C   3070   old
35    3   C   1005   new
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.