Increment counter the first time a number is reached

Question:

This is probably a very silly question. But, I’ll still go ahead and ask. How would you increment a counter only the first time a particular value is reached?

For example, if I have step below as a column of the df and would want to add a counter column called ‘counter’ which increments the first time the ‘step’ column has a value of 6

enter image description here

Asked By: hegdep

||

Answers:

If your DataFrame is called df, one possible way without iteration is

df['counter'] = 0
df.loc[1:, 'counter'] = ((df['steps'].values[1:] == 6) & (df['steps'].values[:-1] != 6)).cumsum()

This creates two boolean arrays, the conjunction of which is True when the previous row did not contain a 6 and the current row does contain a 6. You can sum this array to obtain the counter.

Answered By: nnnmmm

That’s not a silly question. To get the desired output in your counter column, you can try (for example) this:

steps = [2, 2, 2, 3, 4, 4, 5, 6, 6, 6, 6, 7, 5, 6, 6, 6, 7, 5, 6, 7, 5]
counter = [idx for idx in range(len(steps)) if steps[idx] == 6 and (idx==0 or steps[idx-1] != 6)]
print(counter)

results in:

>> [7, 13, 18]

, which are the indices in steps where a first 6 occurred. You can now get the total times that has happened with len(counter), or reproduce the second column the exact way you have given it with

counter_column = [0]
for idx in range(len(steps)):
    counter_column.append(counter_column[-1])
    if idx in counter:
        counter_column[-1] += 1
Answered By: Arne

If your DataFrame is called df, it’s

import pandas as pd

q_list = [2, 2, 2, 3, 4, 4, 5, 6, 6, 6, 6, 7, 5, 6, 6, 6, 7, 5, 6, 7, 5]
df = pd.DataFrame(q_list, columns=['step'])
counter = 0 
flag = False
for index, row in df.iterrows():
    if row ['step'] == 6 and flag == False:
         counter += 1
         flag = True
    elif row ['step'] != 6 and flag == True:
         flag = False
    df.set_value(index,'counter',counter) 
Answered By: Rene B.

You can use .shift() in pandas

Notice how you only want to increment if value of df['step'] is 6
and value of df.shift(1)['step'] is not 6.

df['counter'] = ((df['step']==6) & (df.shift(1)['step']!=6 )).cumsum()
print(df)

Output

      step  counter
0      2        0
1      2        0
2      2        0
3      3        0
4      4        0
5      4        0
6      5        0
7      6        1
8      6        1
9      6        1
10     6        1
11     7        1
12     5        1
13     6        2
14     6        2
15     6        2
16     7        2
17     5        2
18     6        3
19     7        3
20     5        3

Explanation

a. df['step']==6 gives boolean values – True if the step is 6

0     False
1     False
2     False
3     False
4     False
5     False
6     False
7      True
8      True
9      True
10     True
11    False
12    False
13     True
14     True
15     True
16    False
17    False
18     True
19    False
20    False
Name: step, dtype: bool

b. df.shift(1)['step']!=6 shifts the data by 1 row and then checks if value is equal to 6.

When both these conditions satisfy, you want to increment – .cumsum() will take care of that. Hope that helps!

P.S – Although it’s a good question, going forward please avoid pasting images. You can directly paste data and format as code. Helps the people who are answering to copy-paste

Answered By: Vivek Kalyanarangan

Use:

df = pd.DataFrame({'step':[2, 2, 2, 3, 4, 4, 5, 6, 6, 6, 6, 7, 5, 6, 6, 6, 7, 5, 6, 7, 5]})

a = df['step'] == 6
b = (~a).shift()
b[0] = a[0]
df['counter1'] = (a & b).cumsum()

print (df)
    step  counter
0      2        0
1      2        0
2      2        0
3      3        0
4      4        0
5      4        0
6      5        0
7      6        1
8      6        1
9      6        1
10     6        1
11     7        1
12     5        1
13     6        2
14     6        2
15     6        2
16     7        2
17     5        2
18     6        3
19     7        3
20     5        3

Explanation:

Get boolean mask for comparing with 6:

a = df['step'] == 6

Invert Series and shift:

b = (~a).shift()

If first value is 6 then get no first group, so need set first value by first a value:

b[0] = a[0]

Chain conditions by bitwise and&:

c = a & b

Get cumulative sum:

d = c.cumsum()

print (pd.concat([df['step'], a, b, c, d], axis=1, keys=('abcde')))

    a      b      c      d  e
0   2  False  False  False  0
1   2  False   True  False  0
2   2  False   True  False  0
3   3  False   True  False  0
4   4  False   True  False  0
5   4  False   True  False  0
6   5  False   True  False  0
7   6   True   True   True  1
8   6   True  False  False  1
9   6   True  False  False  1
10  6   True  False  False  1
11  7  False  False  False  1
12  5  False   True  False  1
13  6   True   True   True  2
14  6   True  False  False  2
15  6   True  False  False  2
16  7  False  False  False  2
17  5  False   True  False  2
18  6   True   True   True  3
19  7  False  False  False  3
20  5  False   True  False  3

If performance is important, use numpy solution:

a = (df['step'] == 6).values
b = np.insert((~a)[:-1], 0, a[0])
df['counter1'] = np.cumsum(a & b)
Answered By: jezrael
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.