How to make new dataframe from existing dataframe with unique rows values of one column and corresponding row values from other columns?

Question:

I have a dataframe ‘raw’ that looks like this –
enter image description here

It has many rows with duplicate values in each column.
I want to make a new dataframe ‘new_df’ which has unique customer_code corresponding and market_code.
The new_df should look like this –
enter image description here

Asked By: Atif

||

Answers:

It sounds like you simply want to create a DataFrame with unique customer_code which also shows market_code. Here’s a way to do it:

df = df[['customer_code','market_code']].drop_duplicates('customer_code')

Output:

  customer_code market_code
0        Cus001     Mark001
1        Cus003     Mark003
3        Cus004     Mark003
4        Cus005     Mark004

The part reading df[['customer_code','market_code']] gives us a DataFrame containing only the two columns of interest, and the drop_duplicates('customer_code') part eliminates all but the first occurrence of duplicate values in the customer_code column (though you could instead keep the last occurrence of each duplicate by calling it using the keep='last' argument).

Answered By: constantstranger
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.