Finding first occurrence of even numbers
Question:
This is my dataframe:
df = pd.DataFrame(
{
'a': [20, 21, 333, 55, 444, 1000, 900, 44,100, 200, 100],
'b': [2, 2, 2, 4, 4, 4, 4, 3, 2, 2, 6]
}
)
And this is the output that I want:
a
b
c
0
20
2
x
1
21
2
NaN
2
333
2
NaN
3
55
4
x
4
444
4
NaN
5
1000
4
NaN
6
900
4
NaN
7
44
3
NaN
8
100
2
x
9
200
2
NaN
10
100
6
x
I want to create column c
which marks the first occurrence of an even number. It does not matter whether the even number is repeated consecutively or not. First occurrence is what I want.
For example the first row is marked because it is the first occurrence of 2 in column b
. And the streak of 2 ends. Accordingly, that is why the first 4 is marked.
I tried this code:
def finding_first_even_number(df):
mask = (df.b % 2 == 0)
df.loc[mask.cumsum().eq(1) & mask, 'c'] = 'x'
return df
df = df.groupby('b').apply(finding_first_even_number)
But it does not give me the output that I want.
Answers:
Solution
# counter to identify different blocks of
# consecutive rows having same value in b
b = df['b'].diff().ne(0).cumsum()
# boolean mask to identify if the value is even
# and its the first occurrence in block
mask = (df['b'] % 2 == 0) & ~b.duplicated()
# boolean indexing to flag the True values to `x`
df.loc[mask, 'c'] = 'x'
Result
a
b
c
0
20
2
x
1
21
2
NaN
2
333
2
NaN
3
55
4
x
4
444
4
NaN
5
1000
4
NaN
6
900
4
NaN
7
44
3
NaN
8
100
2
x
9
200
2
NaN
10
100
6
x
This is my dataframe:
df = pd.DataFrame(
{
'a': [20, 21, 333, 55, 444, 1000, 900, 44,100, 200, 100],
'b': [2, 2, 2, 4, 4, 4, 4, 3, 2, 2, 6]
}
)
And this is the output that I want:
a | b | c | |
---|---|---|---|
0 | 20 | 2 | x |
1 | 21 | 2 | NaN |
2 | 333 | 2 | NaN |
3 | 55 | 4 | x |
4 | 444 | 4 | NaN |
5 | 1000 | 4 | NaN |
6 | 900 | 4 | NaN |
7 | 44 | 3 | NaN |
8 | 100 | 2 | x |
9 | 200 | 2 | NaN |
10 | 100 | 6 | x |
I want to create column c
which marks the first occurrence of an even number. It does not matter whether the even number is repeated consecutively or not. First occurrence is what I want.
For example the first row is marked because it is the first occurrence of 2 in column b
. And the streak of 2 ends. Accordingly, that is why the first 4 is marked.
I tried this code:
def finding_first_even_number(df):
mask = (df.b % 2 == 0)
df.loc[mask.cumsum().eq(1) & mask, 'c'] = 'x'
return df
df = df.groupby('b').apply(finding_first_even_number)
But it does not give me the output that I want.
Solution
# counter to identify different blocks of
# consecutive rows having same value in b
b = df['b'].diff().ne(0).cumsum()
# boolean mask to identify if the value is even
# and its the first occurrence in block
mask = (df['b'] % 2 == 0) & ~b.duplicated()
# boolean indexing to flag the True values to `x`
df.loc[mask, 'c'] = 'x'
Result
a | b | c | |
---|---|---|---|
0 | 20 | 2 | x |
1 | 21 | 2 | NaN |
2 | 333 | 2 | NaN |
3 | 55 | 4 | x |
4 | 444 | 4 | NaN |
5 | 1000 | 4 | NaN |
6 | 900 | 4 | NaN |
7 | 44 | 3 | NaN |
8 | 100 | 2 | x |
9 | 200 | 2 | NaN |
10 | 100 | 6 | x |