Python using Pandas – Retrieving the name of all columns that contain numbers
Question:
I searched for a solution on the site, but I couldn’t find anything relevant, only outdated code. I am new to the Pandas library and I have the following dataframe
as an example:
A
B
C
D
E
142
0.4
red
108
front
164
1.3
green
98
rear
71
-1.0
blue
234
front
109
0.2
black
120
front
I would like to extract the name of the columns that contain numbers (integers and floats). It is completely fine to use the first row to achieve this.
So the result should look like this: ['A', 'B', 'D']
I tried the following command to get some of the columns that contained numbers:
dataframe.loc[0, dataframe.dtypes == 'int64']
Out:
A 142
D 108
There are two problems with this. First of all, I just need the name of the columns, but not the values. Second, this captures only the integer columns. My next attempt just gave an error:
dataframe.loc[0, dataframe.dtypes == 'int64' or dataframe.dtypes == 'float64']
Answers:
Based on Marcelo’s comment, you can use:
from pandas.api.types import is_numeric_dtype
numeric_columns = []
for column in df.columns:
if is_numeric_dtype(df[column]):
numeric_columns.append(column)
print(numeric_columns)
You can use .dtype
then .kind
while filtering the the column names with list comprehension.
# import pandas as pd
# df = pd.read_html('https://stackoverflow.com/questions/75909965')[0] # scraped your q
[c for c in df.columns if df[c].dtype.kind in 'iufc']
should return ['A', 'B', 'D']
. [Note that 'iufc'
covers signed and unsigned integers as well as real and complex floating-point numbers. Add b
if you want to cover Booleans as well since they’re a subclass of int
in python….]
Use the below function:
First it select all the numeric columns, then it finds the columns, which is finally converted into list.
df.select_dtypes(include="number").columns.to_list()
Another possibles solution:
import re
df.columns[
[re.match(r'^(int|float)', x.name) != None for x in df.dtypes]].to_list()
Output:
['A', 'B', 'D']
I searched for a solution on the site, but I couldn’t find anything relevant, only outdated code. I am new to the Pandas library and I have the following dataframe
as an example:
A | B | C | D | E |
---|---|---|---|---|
142 | 0.4 | red | 108 | front |
164 | 1.3 | green | 98 | rear |
71 | -1.0 | blue | 234 | front |
109 | 0.2 | black | 120 | front |
I would like to extract the name of the columns that contain numbers (integers and floats). It is completely fine to use the first row to achieve this.
So the result should look like this: ['A', 'B', 'D']
I tried the following command to get some of the columns that contained numbers:
dataframe.loc[0, dataframe.dtypes == 'int64']
Out:
A 142
D 108
There are two problems with this. First of all, I just need the name of the columns, but not the values. Second, this captures only the integer columns. My next attempt just gave an error:
dataframe.loc[0, dataframe.dtypes == 'int64' or dataframe.dtypes == 'float64']
Based on Marcelo’s comment, you can use:
from pandas.api.types import is_numeric_dtype
numeric_columns = []
for column in df.columns:
if is_numeric_dtype(df[column]):
numeric_columns.append(column)
print(numeric_columns)
You can use .dtype
then .kind
while filtering the the column names with list comprehension.
# import pandas as pd
# df = pd.read_html('https://stackoverflow.com/questions/75909965')[0] # scraped your q
[c for c in df.columns if df[c].dtype.kind in 'iufc']
should return ['A', 'B', 'D']
. [Note that 'iufc'
covers signed and unsigned integers as well as real and complex floating-point numbers. Add b
if you want to cover Booleans as well since they’re a subclass of int
in python….]
Use the below function:
First it select all the numeric columns, then it finds the columns, which is finally converted into list.
df.select_dtypes(include="number").columns.to_list()
Another possibles solution:
import re
df.columns[
[re.match(r'^(int|float)', x.name) != None for x in df.dtypes]].to_list()
Output:
['A', 'B', 'D']