Running Distinct Count in Pandas by a group

Question:

I have a dataframe ‘df’, with the following structure:

Input:

ID Product Price
1 P1 10
2 P1 11
3 P2 12
4 P2 12
5 P2 15

Expected Output:

ID Product Price Distinct_Running_Count
1 P1 10 1
2 P1 11 2
3 P2 12 1
4 P2 12 1
5 P2 15 2

Problem:

I want to create a new column called ‘Distinct_Running_Count’, with the following logic:

  • Perform a running distinct count of a column ‘Product’ based on
    price
  • Some products don’t have any price change, thus ‘Distinct_Running_Count’ will be 1
  • Every subsequent price change, the ‘Distinct_Running_Count’ will be incremented

Solutions Tried:

df['Distinct_Running_Count'] = df.groupby(['Product', 'Price']).cumcount() + 1
df['Distinct_Running_Count'] = df.groupby(['Product', 'Price']).transform('nunique')

Issue:

The above solution either provides running count or the total uniques counts but not what I expect

Asked By: Anubhav Dikshit

||

Answers:

You can try to compare the row and next row in Price column and calculate the cumsum

df['Distinct_Running_Count'] = (df.groupby(['Product'])['Price']
                                .transform(lambda col: col.ne(col.shift().fillna(col)).cumsum().add(1)))
print(df)

   ID Product  Price  Distinct_Running_Count
0   1      P1     10                       1
1   2      P1     11                       2
2   3      P2     12                       1
3   4      P2     12                       1
4   5      P2     15                       2
Answered By: Ynjxsjmh

My answer uses a few steps.
First, get the unique rows (based on Product and Price).

Then, use cumcount() to create your desired column.

Finally, merge this dataframe with your original dataframe.

df_without_dup = df[~df[['Product', 'Price']].duplicated()][['Product', 'Price']]
df_without_dup['Distinct_Running_Count'] = df_without_dup.groupby(['Product']).cumcount() + 1
df = df.merge(df_without_dup, on=['Product', 'Price'], how='left')
df_without_dup = 
  Product Price  Distinct_Running_Count
0      P1    12                       1
1      P1    11                       2
2      P2    12                       1
4      P2    15                       2

Output:

  ID Product Price  Distinct_Running_Count
0  1      P1    12                       1
1  2      P1    11                       2
2  3      P2    12                       1
3  4      P2    12                       1
4  5      P2    15                       2
Answered By: Tobias Molenaar
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.