Pandas: get unique rows with unique value in column grouped by another column value

Question:

Let’s say I hace a datframe like this:

df = pd.DataFrame({
'COUNTRY_CODE': ['CO','CO','CO','BR','BR','BR'], 
'VERTICAL_GROUP_ID': [2,2,3,2,3,3], 
'SUB_VERTICAL': ['SUPER','SUPER','HOME','LICOR','SPORTS','HOME'], 
'PRODUCT_ID': [1111,1111,1111,1111,1111,2222], 
'SHOWN': [7,8,12,14,16,1], 
})

I want to get in another dataframe, for each COUNTRY_CODE/PRODUCT_ID combination, only ONE row for each VERTICAL_GROUP.

So for the df above, I’d like to get something like this:

COUNTRY_CODE VERTICAL_GROUP_ID SUB_VERTICAL PRODUCT_ID SHOWN
CO 2 SUPER 1111 7
CO 3 HOME 1111 12
BR 2 LICOR 1111 14
BR 3 SPORTS 1111 16
BR 3 HOME 2222 1

It doesn’t matter which rows of each COUNTRY_CODE/PRODUCT_ID/VERTICAL_GROUP combination I keep, as long as I get only 1 for each VERTICAL_GROUP.

Whats the best way of doing this?
I tried mixing a grouby("PRODUCT_ID") with a drop_duplicates(subset=[‘VERTICAL_GROUP_ID’]), but I keepe doing something wrong,because I keep getting errors like

AttributeError: ‘DataFrameGroupBy’ object has no attribute ‘drop_duplicates’

Asked By: Alain

||

Answers:

You could try:

df.groupby(['COUNTRY_CODE', 'VERTICAL_GROUP_ID', 'PRODUCT_ID']).agg('first').reset_index()

  COUNTRY_CODE  VERTICAL_GROUP_ID  PRODUCT_ID SUB_VERTICAL  SHOWN
0           BR                  2        1111        LICOR     14
1           BR                  3        1111       SPORTS     16
2           BR                  3        2222         HOME      1
3           CO                  2        1111        SUPER      7
4           CO                  3        1111         HOME     12
Answered By: Anoushiravan R
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.