How to check if a pandas dataframe contains only numeric values column-wise?

Question:

I want to check every column in a dataframe whether it contains only numeric data. Specifically, my query is not about the datatype, but instead, I want to check every value in each column of the dataframe whether it’s a numeric value.

How can I find this out?

Asked By: Raja Sahe S

||

Answers:

Let’s say you have a dataframe called df, if you do:

df.select_dtypes(include=["float", 'int'])

This will return all the numeric columns, you can check if this is the same as the original df.

Otherwise, you can also use the exclude parameter:

df.select_dtypes(exclude=["float", 'int'])

and check if this gives you an empty dataframe.

Answered By: TYZ

This will return True if all columns are numeric, False otherwise.

df.shape[1] == df.select_dtypes(include=np.number).shape[1]

To select numeric columns:

new_df = df.select_dtypes(include=np.number)
Answered By: Vaishali

You can draw a True / False comparison using isnumeric()

Example:

 >>> df
       A      B
0      1      1
1    NaN      6
2    NaN    NaN
3      2      2
4    NaN    NaN
5      4      4
6   some   some
7  value  other

Results:

>>> df.A.str.isnumeric()
0     True
1      NaN
2      NaN
3     True
4      NaN
5     True
6    False
7    False
Name: A, dtype: object

# df.B.str.isnumeric()

with apply() method which seems more robust in case you need corner to corner comparison:

DataFrame having two different columns one with mixed type another with numbers only for test:

>>> df
       A   B
0      1   1
1    NaN   6
2    NaN  33
3      2   2
4    NaN  22
5      4   4
6   some  66
7  value  11

Result:

>>> df.apply(lambda x: x.str.isnumeric())
       A     B
0   True  True
1    NaN  True
2    NaN  True
3   True  True
4    NaN  True
5   True  True
6  False  True
7  False  True

Another example:

Let’s consider the below dataframe with different data-types as follows..

>>> df
   num  rating    name  age
0    0    80.0  shakir   33
1    1   -22.0   rafiq   37
2    2   -10.0     dev   36
3  num     1.0   suraj   30

Based on the comment from OP on this answer, where it has negative value and 0’s in it.

1- This is a pseudo-internal method to return only the numeric type data.

>>> df._get_numeric_data()
   rating  age
0    80.0   33
1   -22.0   37
2   -10.0   36
3     1.0   30

OR

2- there is an option to use method select_dtypes in module pandas.core.frame which return a subset of the DataFrame’s columns based on the column dtypes. One can use Parameters with include, exclude options.

>>> df.select_dtypes(include=['int64','float64']) # choosing int & float
   rating  age
0    80.0   33
1   -22.0   37
2   -10.0   36
3     1.0   30

>>> df.select_dtypes(include=['int64'])  # choose int
   age
0   33
1   37
2   36
3   30
Answered By: Karn Kumar

You can check that using to_numeric and coercing errors:

pd.to_numeric(df['column'], errors='coerce').notnull().all()

For all columns, you can iterate through columns or just use apply

df.apply(lambda s: pd.to_numeric(s, errors='coerce').notnull().all())

E.g.

df = pd.DataFrame({'col' : [1,2, 10, np.nan, 'a'], 
                   'col2': ['a', 10, 30, 40 ,50],
                   'col3': [1,2,3,4,5.0]})

Outputs

col     False
col2    False
col3     True
dtype: bool
Answered By: rafaelc

The accepted answers seem bit overkill, as they sub-select the entire dataframe.

To check types only metadata should be used, which can be done with
pd.api.types.is_numeric_dtype.

import pandas as pd
df = pd.DataFrame(data=[[1,'a']],columns=['numeruc_col','string_col'])

print(df.columns[list(map(pd.api.types.is_numeric_dtype,df.dtypes))]) # one way
print(df.dtypes.map(pd.api.types.is_numeric_dtype)) # another way
Answered By: Maciej S.

To check for numeric columns, you could use df[c].dtype.kind in 'iufcb' where c is any given column name. The comparison will yeild a True or False boolean output.

It can be iterated through all the column names with a list comprehension:

>>> [(c, df[c].dtype.kind in 'iufcb') for c in df.columns]

[('col', False), ('col2', False), ('col3', True)]

The numpy.dtype.kind 'iufcb' notation is a representation of whether it is a signed integer (i), unsigned integer (u), float (f), complex number (c), or boolean (b). The string can be modified to exclude any of the above (e.g., 'iufc' to exclude boolean).

This solves the original question in relation to checking column data types. It also provides the benefits of (1) a shorter line of code which (2) remains sufficiently intuitive to the user.

Answered By: fact_finder
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.