Running Distinct Count in Pandas by a group

Question

I have a dataframe ‘df’, with the following structure:

Input:

ID	Product	Price
1	P1	10
2	P1	11
3	P2	12
4	P2	12
5	P2	15

Expected Output:

ID	Product	Price	Distinct_Running_Count
1	P1	10	1
2	P1	11	2
3	P2	12	1
4	P2	12	1
5	P2	15	2

Problem:

I want to create a new column called ‘Distinct_Running_Count’, with the following logic:

Perform a running distinct count of a column ‘Product’ based on
price
Some products don’t have any price change, thus ‘Distinct_Running_Count’ will be 1
Every subsequent price change, the ‘Distinct_Running_Count’ will be incremented

Solutions Tried:

df['Distinct_Running_Count'] = df.groupby(['Product', 'Price']).cumcount() + 1

df['Distinct_Running_Count'] = df.groupby(['Product', 'Price']).transform('nunique')

Issue:

The above solution either provides running count or the total uniques counts but not what I expect

Asked By: Anubhav Dikshit

||

Source

Answer 1

You can try to compare the row and next row in Price column and calculate the cumsum

df['Distinct_Running_Count'] = (df.groupby(['Product'])['Price']
                                .transform(lambda col: col.ne(col.shift().fillna(col)).cumsum().add(1)))

print(df)

   ID Product  Price  Distinct_Running_Count
0   1      P1     10                       1
1   2      P1     11                       2
2   3      P2     12                       1
3   4      P2     12                       1
4   5      P2     15                       2

Answered By: Ynjxsjmh

Answer 2

My answer uses a few steps.
First, get the unique rows (based on Product and Price).

Then, use cumcount() to create your desired column.

Finally, merge this dataframe with your original dataframe.

df_without_dup = df[~df[['Product', 'Price']].duplicated()][['Product', 'Price']]
df_without_dup['Distinct_Running_Count'] = df_without_dup.groupby(['Product']).cumcount() + 1
df = df.merge(df_without_dup, on=['Product', 'Price'], how='left')

df_without_dup = 
  Product Price  Distinct_Running_Count
0      P1    12                       1
1      P1    11                       2
2      P2    12                       1
4      P2    15                       2

Output:

  ID Product Price  Distinct_Running_Count
0  1      P1    12                       1
1  2      P1    11                       2
2  3      P2    12                       1
3  4      P2    12                       1
4  5      P2    15                       2

Answered By: Tobias Molenaar

Running Distinct Count in Pandas by a group

Question:

Answers: