How to remove all rows of a datframe column that contain a question mark instead of occupation

Question:

This is my attempt:

df['occupation']= df['occupation'].str.replace('?',  '')

df.dropna(subset=['occupation'], inplace=True)

but it is not working, How do i remove all of the rows of the occupation column that i read from a csv file that contain a ? rather than an occupation

Asked By: John Johnson

||

Answers:

You can try this…

df = df[df.occupation != "?"]
Answered By: Wardy

If you’re reading the csv with pd.read_csv(), you can pass na_values.

# to treat '?' as NaN in all columns:
pd.read_csv(fname, na_values='?')

# to treat '?' as NaN in just the occupation column:
pd.read_csv(fname, na_values={'occupation': '?'})

Then, you can dropna or fillna('') on that column as you see fit.

Answered By: user5002062

Clean up the white space and use an ‘unselect’ filter:

import pandas as pd
bugs = ['grasshopper','cricket','ant','spider']
fruit = ['lemon','komquat','watermelon','apple']
squashed = ['  ?   ','Yes','No','Eww']

df = pd.DataFrame(list(zip(bugs,fruit,squashed)), columns = ['Bugs','Fruit','Squashed'])
print(df.head())

enter image description here

df = df[df['Squashed'].apply(lambda x: x.strip()) != '?']
print('after stripping white space and after unselect')
print(df.head())

enter image description here

Why

The dataframe method .dropna() won’t detect blanks (ie ”) but will look for Nan or NaT or None.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html

However, using .replace() to set the value to missing won’t work because .replace() requires the type to match and None doesn’t match any type you’ll have in a column already.

Better to clean up the white space (which is the simple case) using lambda on each entry to apply the string transformation.

Answered By: MisterJT
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.