How to get the 'number' of occurrence of a value in a column in another column in a pandas dataframe
Question:
I need to get the exact ‘number’ of occurrence of a particular value in a column in another column in dataframe.
To make it simpler, need the cumulative count of a repetitive value in a column
Consider the following example:
col_A
col_B
test1
some1
test2
some2
test2
some3
test2
some4
test3
some5
test1
some6
test3
some7
I need a col_C that gets the exact occurence number of col_A values
Consider the following:
col_A
col_B
col_C
test1
some1
1
test2
some2
1
test2
some3
2
test2
some4
3
test3
some5
1
test1
some6
2
test3
some7
2
In row1 – ‘test1’ appears for the 1st time, so we got ‘1’ in col_C; in row2 – ‘test2’ appears for the 1st time, so we got ‘1’ in col_C; in row3 – ‘test2’ appears for the 2nd time, so we got ‘2’ in col_C and so on
Did use value_counts, but am only getting the count of col_A values in col_C. I need the exact occurrence.
Thanks for the help!
Answers:
What you’re looking for is cumcount() from groupby. It counts the amount of the same values in each column starting from 0.
This code should work.
s = df.groupby('col_A').cumcount()
df['col_C'] = s+1
I need to get the exact ‘number’ of occurrence of a particular value in a column in another column in dataframe.
To make it simpler, need the cumulative count of a repetitive value in a column
Consider the following example:
col_A | col_B |
---|---|
test1 | some1 |
test2 | some2 |
test2 | some3 |
test2 | some4 |
test3 | some5 |
test1 | some6 |
test3 | some7 |
I need a col_C that gets the exact occurence number of col_A values
Consider the following:
col_A | col_B | col_C |
---|---|---|
test1 | some1 | 1 |
test2 | some2 | 1 |
test2 | some3 | 2 |
test2 | some4 | 3 |
test3 | some5 | 1 |
test1 | some6 | 2 |
test3 | some7 | 2 |
In row1 – ‘test1’ appears for the 1st time, so we got ‘1’ in col_C; in row2 – ‘test2’ appears for the 1st time, so we got ‘1’ in col_C; in row3 – ‘test2’ appears for the 2nd time, so we got ‘2’ in col_C and so on
Did use value_counts, but am only getting the count of col_A values in col_C. I need the exact occurrence.
Thanks for the help!
What you’re looking for is cumcount() from groupby. It counts the amount of the same values in each column starting from 0.
This code should work.
s = df.groupby('col_A').cumcount()
df['col_C'] = s+1