Selecting Pandas Columns by dtype


I was wondering if there is an elegant and shorthand way in Pandas DataFrames to select columns by data type (dtype). i.e. Select only int64 columns from a DataFrame.

To elaborate, something along the lines of

Asked By: caner



df.loc[:, df.dtypes == np.float64]
Answered By: Dan Allan
Answered By: normonics

Since 0.14.1 there’s a select_dtypes method so you can do this more elegantly/generally.

In [11]: df = pd.DataFrame([[1, 2.2, 'three']], columns=['A', 'B', 'C'])

In [12]: df.select_dtypes(include=['int'])
0  1

To select all numeric types use the numpy dtype numpy.number

In [13]: df.select_dtypes(include=[np.number])
   A    B
0  1  2.2

In [14]: df.select_dtypes(exclude=[object])
   A    B
0  1  2.2
Answered By: Andy Hayden

I’d like to extend existing answer by adding options for selecting all floating dtypes or all integer dtypes:



df = pd.DataFrame({
        'f':np.random.choice([True, False], 3),
        'g':pd.date_range('2000-01-01', periods=3)


In [2]: df
          a         b  c  d        e      f          g
0  0.191519  0.785359  6  0  7578569  False 2000-01-01
1  0.622109  0.779976  8  1  7981439   True 2000-01-02
2  0.437728  0.272593  0  2  2558462   True 2000-01-03

In [3]: df.dtypes
a           float64
b           float32
c             int16
d             int32
e             int64
f              bool
g    datetime64[ns]
dtype: object

Selecting all floating number columns:

In [4]: df.select_dtypes(include=['floating'])
          a         b
0  0.191519  0.785359
1  0.622109  0.779976
2  0.437728  0.272593

In [5]: df.select_dtypes(include=['floating']).dtypes
a    float64
b    float32
dtype: object

Selecting all integer number columns:

In [6]: df.select_dtypes(include=['integer'])
   c  d        e
0  6  0  7578569
1  8  1  7981439
2  0  2  2558462

In [7]: df.select_dtypes(include=['integer']).dtypes
c    int16
d    int32
e    int64
dtype: object

Selecting all numeric columns:

In [8]: df.select_dtypes(include=['number'])
          a         b  c  d        e
0  0.191519  0.785359  6  0  7578569
1  0.622109  0.779976  8  1  7981439
2  0.437728  0.272593  0  2  2558462

In [9]: df.select_dtypes(include=['number']).dtypes
a    float64
b    float32
c      int16
d      int32
e      int64
dtype: object

Optionally if you don’t want to create a subset of the dataframe during the process, you can directly iterate through the column datatype.

I haven’t benchmarked the code below, assume it will be faster if you work on very large dataset.

[col for col in df.columns.tolist() if df[col].dtype not in ['object','<M8[ns]']] 
Answered By: hui chen

Multiple includes for selecting columns with list of types for example- float64 and int64

df_numeric = df.select_dtypes(include=[np.float64,np.int64])
Answered By: Gurubux


Answered By: Anjan Prasad

If you want to select int64 columns and then update “in place”, you can use:

int64_cols = [col for col in df.columns if is_int64_dtype(df[col].dtype)]

For example, notice that I update all the int64 columns in df to zero below:

In [1]:

    import pandas as pd
    from pandas.api.types import is_int64_dtype

    df = pd.DataFrame({'a': [1, 2] * 3,
                       'b': [True, False] * 3,
                       'c': [1.0, 2.0] * 3,
                       'd': ['red','blue'] * 3,
                       'e': pd.Series(['red','blue'] * 3, dtype="category"),
                       'f': pd.Series([1, 2] * 3, dtype="int64")})

    int64_cols = [col for col in df.columns if is_int64_dtype(df[col].dtype)] 
    print('int64 Cols: ',int64_cols)


    df[int64_cols] = 0


Out [1]:

    int64 Cols:  ['a', 'f']

           a  f
        0  1  1
        1  2  2
        2  1  1
        3  2  2
        4  1  1
        5  2  2
           a  f
        0  0  0
        1  0  0
        2  0  0
        3  0  0
        4  0  0
        5  0  0

Just for completeness:

df.loc() and df.select_dtypes() are going to give a copy of a slice from the dataframe. This means that if you try to update values from df.select_dtypes(), you will get a SettingWithCopyWarning and no updates will happen to df in place.

For example, notice when I try to update df using .loc() or .select_dtypes() to select columns, nothing happens:

In [2]:

    df = pd.DataFrame({'a': [1, 2] * 3,
                       'b': [True, False] * 3,
                       'c': [1.0, 2.0] * 3,
                       'd': ['red','blue'] * 3,
                       'e': pd.Series(['red','blue'] * 3, dtype="category"),
                       'f': pd.Series([1, 2] * 3, dtype="int64")})

    df_bool = df.select_dtypes(include='bool')
    df_bool.b[0] = False


    df.loc[:, df.dtypes == np.int64].a[0]=7

Out [2]:

Answered By: Jake Drew

You can use :

for i in x.columns[x.dtypes == 'object']:

incase you just want to display only the column names of a particular dataframe rather than a sliced dataframe. Don’t know if any function as such exits for python.

PS : replace object with the datatype you want.

Answered By: Rahul Bordoloi
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.