getting the first row of two masks that meets conditions and creating a new column
Question:
This is my dataframe:
df = pd.DataFrame({'a': [20, 21, 333, 444, 1, 666], 'b': [20, 20, 20, 20, 20, 20], 'c': [222, 211, 2, 1, 100, 200]})
I want to use two masks. The first one finds the second row that a
is greater than b
. and creates column d
. This mask is:
mask = (df.a >= df.b)
df.loc[mask.cumsum().eq(2) & mask, 'd'] = 'x'
Now I want to add another mask. Basically what I want is to find the first row that has two conditions.
a) It is after the first mask (That is, it is after the second row that a
>= b
)
b) Column c
is greater than column b
My desired output is as follows:
a b c d
0 20 20 222 NaN
1 21 20 211 NaN
2 333 20 2 NaN
3 444 20 1 NaN
4 1 20 100 x
5 666 20 200 NaN
I tried a couple of ways but the fact that it has to be after the first mask made it difficult for me.
Answers:
You can try the following monstrosity:
mask2 = (mask.cumsum().eq(2) & mask) # or even just mask.cumsum().eq(2), & mask seems pointless here
df.loc[(mask2.cumsum().ge(1) & ~mask2 & (df.c >= df.b)).cumsum().eq(1), 'd'] = 'x'
Though probably someone smart will have a better way =)
With single expression and pandas.Series.argmax
:
df.loc[(mask.cumsum().gt(2) & (df['c'] > df['b'])).argmax(), 'd'] = 'x'
a b c d
0 20 20 222 NaN
1 21 20 211 NaN
2 333 20 2 NaN
3 444 20 1 NaN
4 555 20 100 x
5 666 20 200 NaN
This is my dataframe:
df = pd.DataFrame({'a': [20, 21, 333, 444, 1, 666], 'b': [20, 20, 20, 20, 20, 20], 'c': [222, 211, 2, 1, 100, 200]})
I want to use two masks. The first one finds the second row that a
is greater than b
. and creates column d
. This mask is:
mask = (df.a >= df.b)
df.loc[mask.cumsum().eq(2) & mask, 'd'] = 'x'
Now I want to add another mask. Basically what I want is to find the first row that has two conditions.
a) It is after the first mask (That is, it is after the second row that a
>= b
)
b) Column c
is greater than column b
My desired output is as follows:
a b c d
0 20 20 222 NaN
1 21 20 211 NaN
2 333 20 2 NaN
3 444 20 1 NaN
4 1 20 100 x
5 666 20 200 NaN
I tried a couple of ways but the fact that it has to be after the first mask made it difficult for me.
You can try the following monstrosity:
mask2 = (mask.cumsum().eq(2) & mask) # or even just mask.cumsum().eq(2), & mask seems pointless here
df.loc[(mask2.cumsum().ge(1) & ~mask2 & (df.c >= df.b)).cumsum().eq(1), 'd'] = 'x'
Though probably someone smart will have a better way =)
With single expression and pandas.Series.argmax
:
df.loc[(mask.cumsum().gt(2) & (df['c'] > df['b'])).argmax(), 'd'] = 'x'
a b c d
0 20 20 222 NaN
1 21 20 211 NaN
2 333 20 2 NaN
3 444 20 1 NaN
4 555 20 100 x
5 666 20 200 NaN