Exclude/Filter values from dataframe with function .isin() in Pandas

Question:

I’m working on a Pandas dataframe with transactional data (customer purchases) and want to exclude rows with certain customer numbers contained in a column ‘CUSTOMER_ID’.

To achieve this, I created a list with the customer numbers to be exluded:
excluded_customers = ['2000', '2100', '3100', '4000', '4100', '4200', '4300', '4400', '4700', '6802']

Then I used the .isin() function to filter my df accordingly and save it in a new df2:
df2 = df[(df['CUSTOMER_ID'].isin(excluded_customers) == False)]

Then I want to sort the new df2 by column ‘CUSTOMER_ID’ in ascending order. However, the excluded customer numbers still appear in the dataframe:
df2.sort_values(by=['CUSTOMER_ID'])

I would much appreciate some hints why they aren’t dropped from the df.

Thank you!

Asked By: codesign

||

Answers:

Convert column to strings and for invert mask use ~:

df2 = (df[~df['CUSTOMER_ID'].astype(str).isin(excluded_customers)]
          .sort_values(by=['CUSTOMER_ID']))
Answered By: jezrael