Find non-numeric values in pandas dataframe column

Question:

I got a a column in a dataframe that contains numbers and strings. So I replaced the strings by numbers via df.column.replace(["A", "B", "C", "D"], [1, 2, 3, 4], inplace=True).

But the column is still dtype “object”. I can not sort the column (TypeError error: ‘<‘ not supported between instances of ‘str’ and ‘int’).

Now how can I identify those numbers that are strings? I tried print(df[pd.to_numeric(df['column']).isnull()]) and it gives back an empty dataframe, as expected. However I read that this does not work in my case (actual numbers saved as strings). So how can I identify those numbers saved as a string?

Am I right that if a column only contains REAL numbers (int or float) it will automatically change to dtype int or float?

Thank you!

Asked By: Scrabyard

||

Answers:

you can change dtype

    df.column.dtype=df.column.astype(int)
Answered By: NANDHA KUMAR

You can use pd.to_numeric with something like:

df['column'] = pd.to_numeric(df['column'], errors='coerce')

For the errors argument you have few option, see reference documentation here

Answered By: Francesco

Expanding on Francesco’s answer, it’s possible to create a mask of non-numeric values and identify unique instances to handle or remove.
This uses the fact that where values cant be coerced, they are treated as nulls.

is_non_numeric = pd.to_numeric(df['column'], errors='coerce').isnull()
df[is_non_numeric]['column'].unique()

Or alternatively in a single line:

df[pd.to_numeric(df['column'], errors='coerce').isnull()]['column'].unique()
Answered By: kowpow