Kaggle Data Clean Up

Question:

I am trying to clean unwanted values from my dataset, I am currently trying to clean the gender column and there are a lot of ‘joke’ answers that I wish to remove but currently I only know how to remove these one by one. Is there a more efficient way to clean the data so that I can just be left with Male and Female?

Here is what the unique values currently look like:

['Male' 'Female' 'Nonbinary' 'Nonbinary woman' 'Trans male' 'human'
 'Trans guy' 'Athlet' 'Living being ' 'I am a meat popsicle' 'Transmale'
 'Transmasculine Genderqueer' 'Nonbinary girl' 'Diverse and flexible'
 'He/Him' 'Male teen' 'PINEAPPLE (rpz le meilleur sex)' 'Agender' 'Mixed'
 'Gamer' 'trans man' 'Female High-schooler' 'Trans female' 'Trans Female'
 'trans male' 'a human person :) (male)' "this question doesn't matter"
 'A guy who is determined to learn web development' 'non-binary woman'
 'student?' 'Male, just Male' 'questioning '
 'possibly a descendant of a Norse God.' 'Ajith' 'Helicopter '
 "I'm human. What else matters?" 'bi sexual' 'transguy' 'Trans Girl'
 'Carbon-15' 'Life'
 'I am a woman but "female" and "male" are not the best terms - you should use man or woman since you're asking about gender not biological sex. Also, the term "female" is often used in a dehumanizing manner. '
 'pre anything MtF' 'transman' 'Motiviert, zielstrebend und zuverlässig'
 'demigirl' 'My gender is: Apache Helicopter' 'There are just two genders'
 'WTF' 'Transgender man'
 'Male but I like how you phrased that question :P' '(Trans) Woman'
 'Trans-NB' 'transgender' 'Cyborg' 'Attack Helicopter' 'Alpha ;)'
 'Straight forward lol.' 'Genderfluid'
 'I don't "think of myself" as anything, I am a male.'
 "LOL how dumb. Why do people get sucked into this nonsense? I'm giraffe okay!?"
 'A Creative, compassionate citizen of the Global Garden.' 'bi'
 'Bigender She/her he/him' 'Genetically and scientifically male'
 'Am a human not an alien' 'Trans Masc ' 'attack helicopter'
 'Michelle "Big Mike" Obama' 'Homosexual']

I have tried to do:
df_clean = df_clean[df_clean["Gender"] == 'Male' or 'Female'] but cannot have them on the same line and when i put them in 2 seperate lines it just removes the whole list.

Asked By: Richard Bradley

||

Answers:

You’re going to want to use the | instead of or when filter dataframes:

df_clean = df_clean[(df_clean["Gender"] == 'Male') | (df_clean["Gender"] == 'Female')]

You can take a look at the documentation for more info.

Answered By: Marcelo Paco