Get list of pandas dataframe columns based on data type

Question

If I have a dataframe with the following columns:

1. NAME                                     object
2. On_Time                                      object
3. On_Budget                                    object
4. %actual_hr                                  float64
5. Baseline Start Date                  datetime64[ns]
6. Forecast Start Date                  datetime64[ns]

I would like to be able to say: for this dataframe, give me a list of the columns which are of type ‘object’ or of type ‘datetime’?

I have a function which converts numbers (‘float64’) to two decimal places, and I would like to use this list of dataframe columns, of a particular type, and run it through this function to convert them all to 2dp.

Maybe something like:

For c in col_list: if c.dtype = "Something"
list[]
List.append(c)?

Asked By: yoshiserry

||

Source

Answer 1

You can use boolean mask on the dtypes attribute:

In [11]: df = pd.DataFrame([[1, 2.3456, 'c']])

In [12]: df.dtypes
Out[12]: 
0      int64
1    float64
2     object
dtype: object

In [13]: msk = df.dtypes == np.float64  # or object, etc.

In [14]: msk
Out[14]: 
0    False
1     True
2    False
dtype: bool

You can look at just those columns with the desired dtype:

In [15]: df.loc[:, msk]
Out[15]: 
        1
0  2.3456

Now you can use round (or whatever) and assign it back:

In [16]: np.round(df.loc[:, msk], 2)
Out[16]: 
      1
0  2.35

In [17]: df.loc[:, msk] = np.round(df.loc[:, msk], 2)

In [18]: df
Out[18]: 
   0     1  2
0  1  2.35  c

Answered By: Andy Hayden

Answer 2

If you want a list of columns of a certain type, you can use groupby:

>>> df = pd.DataFrame([[1, 2.3456, 'c', 'd', 78]], columns=list("ABCDE"))
>>> df
   A       B  C  D   E
0  1  2.3456  c  d  78

[1 rows x 5 columns]
>>> df.dtypes
A      int64
B    float64
C     object
D     object
E      int64
dtype: object
>>> g = df.columns.to_series().groupby(df.dtypes).groups
>>> g
{dtype('int64'): ['A', 'E'], dtype('float64'): ['B'], dtype('O'): ['C', 'D']}
>>> {k.name: v for k, v in g.items()}
{'object': ['C', 'D'], 'int64': ['A', 'E'], 'float64': ['B']}

Answered By: DSM

Answer 3

As of pandas v0.14.1, you can utilize select_dtypes() to select columns by dtype

In [2]: df = pd.DataFrame({'NAME': list('abcdef'),
    'On_Time': [True, False] * 3,
    'On_Budget': [False, True] * 3})

In [3]: df.select_dtypes(include=['bool'])
Out[3]:
  On_Budget On_Time
0     False    True
1      True   False
2     False    True
3      True   False
4     False    True
5      True   False

In [4]: mylist = list(df.select_dtypes(include=['bool']).columns)

In [5]: mylist
Out[5]: ['On_Budget', 'On_Time']

Answered By: qmorgan

Answer 4

If you want a list of only the object columns you could do:

non_numerics = [x for x in df.columns 
                if not (df[x].dtype == np.float64 
                        or df[x].dtype == np.int64)]

and then if you want to get another list of only the numerics:

numerics = [x for x in df.columns if x not in non_numerics]

Answered By: user4322543

Answer 5

Using dtype will give you desired column’s data type:

dataframe['column1'].dtype

if you want to know data types of all the column at once, you can use plural of dtype as dtypes:

dataframe.dtypes

Answered By: Ashish25

Answer 6

list(df.select_dtypes(['object']).columns)

This should do the trick

Answered By: Tanmoy

Answer 7

use df.info(verbose=True) where df is a pandas datafarme, by default verbose=False

Answered By: Koo

Answer 8

The most direct way to get a list of columns of certain dtype e.g. ‘object’:

df.select_dtypes(include='object').columns

For example:

>>df = pd.DataFrame([[1, 2.3456, 'c', 'd', 78]], columns=list("ABCDE"))
>>df.dtypes

A      int64
B    float64
C     object
D     object
E      int64
dtype: object

To get all ‘object’ dtype columns:

>>df.select_dtypes(include='object').columns

Index(['C', 'D'], dtype='object')

For just the list:

>>list(df.select_dtypes(include='object').columns)

['C', 'D']

Answered By: MLKing

Answer 9

I came up with this three liner.

Essentially, here’s what it does:

Fetch the column names and their respective data types.
I am optionally outputting it to a csv.

inp = pd.read_csv('filename.csv') # read input. Add read_csv arguments as needed
columns = pd.DataFrame({'column_names': inp.columns, 'datatypes': inp.dtypes})
columns.to_csv(inp+'columns_list.csv', encoding='utf-8') # encoding is optional

This made my life much easier in trying to generate schemas on the fly. Hope this helps

Answered By: geekidharsh

Answer 10

for yoshiserry;

def col_types(x,pd):
    dtypes=x.dtypes
    dtypes_col=dtypes.index
    dtypes_type=dtypes.value
    column_types=dict(zip(dtypes_col,dtypes_type))
    return column_types

Answered By: itthrill

Answer 11

I use infer_objects()

Docstring: Attempt to infer better dtypes for object columns.

Attempts soft conversion of object-dtyped columns, leaving non-object
and unconvertible columns unchanged. The inference rules are the same
as during normal Series/DataFrame construction.

df.infer_objects().dtypes

Answered By: as – if

Answer 12

If after 6 years you still have the issue, this should solve it 🙂

cols = [c for c in df.columns if df[c].dtype in ['object', 'datetime64[ns]']]

Answered By: Rafael Neves

Answer 13

df = pd.DataFrame({'float': [1.0],
                   'int': [1],
                   'bool_1': [False],
                   'datetime': [pd.Timestamp('20180310')],
                   'bool_2': [True],
                   'string': ['foo']})
df.dtypes

# float              float64
# int                  int64
# bool_1                bool
# datetime    datetime64[ns]
# bool_2                bool
# string              object
# dtype: object


[column for column, is_type in (df.dtypes==bool).items() if is_type]
# ['bool_1', 'bool_2']

Answered By: Oleks

Answer 14

Many of the posted solutions use df.select_dtypes which unnecessarily creates a temporary intermediate dataframe. If all you want is "a list of the columns which are of" non-numeric (not float32/int64/complex128/etc.) types, just do one of these (remove the "not" if you do want just the numeric types):

import numpy as np
[c for c in df.columns if not np.issubdtype(df[c].dtype, np.number)]

from pandas.api.types import is_numeric_dtype
[c for c in df.columns if not is_numeric_dtype(c)]

Note: if you want to distinguish floating (float32/float64) from integer and complex then you could use np.floating instead of np.number in the first of the two solutions above or in the first of the two just below.

If you want the result to be a pd.Index rather than just a list of column name strings as above, here are two ways (first is based on @juanpa.arrivillaga):

import numpy as np
df.columns[[not np.issubdtype(dt, np.number) for dt in df.dtypes]]

from pandas.api.types import is_numeric_dtype
df.columns[[not is_numeric_dtype(c) for c in df.columns]]

Some other methods may consider a bool column to be numeric, but the solutions above do not (tested with numpy 1.22.3 / pandas 1.4.2).

Answered By: dabru

Get list of pandas dataframe columns based on data type

Question:

Answers: