How to check if a column exists in Pandas

Question:

How do I check if a column exists in a Pandas DataFrame df?

   A   B    C
0  3  40  100
1  6  30  200

How would I check if the column "A" exists in the above DataFrame so that I can compute:

df['sum'] = df['A'] + df['C']

And if "A" doesn’t exist:

df['sum'] = df['B'] + df['C']
Asked By: npires

||

Answers:

This will work:

if 'A' in df:

But for clarity, I’d probably write it as:

if 'A' in df.columns:
Answered By: chrisb

To check if one or more columns all exist, you can use set.issubset, as in:

if set(['A','C']).issubset(df.columns):
   df['sum'] = df['A'] + df['C']                

As @brianpck points out in a comment, set([]) can alternatively be constructed with curly braces,

if {'A', 'C'}.issubset(df.columns):

See this question for a discussion of the curly-braces syntax.

Or, you can use a generator comprehension, as in:

if all(item in df.columns for item in ['A','C']):
Answered By: C8H10N4O2

Just to suggest another way without using if statements, you can use the get() method for DataFrames. For performing the sum based on the question:

df['sum'] = df.get('A', df['B']) + df['C']

The DataFrame get method has similar behavior as python dictionaries.

Answered By: Gerges

You can use the set’s method issuperset:

set(df).issuperset(['A', 'B'])
# set(df.columns).issuperset(['A', 'B'])
Answered By: Mykola Zotko

You can also call isin() on the columns to check if specific column(s) exist in it and call any() on the result to reduce it to a single boolean value1. For example, to check if a dataframe contains columns A or C, one could do:

if df.columns.isin(['A', 'C']).any():
    # do something

To check if a column name is not present, you can use the not operator in the if-clause:

if 'A' not in df:
    # do something

or along with the isin().any() call.

if not df.columns.isin(['A', 'C']).any():
    # do something

1: isin() call on the columns returns a boolean array whose values are True if it’s either A or C and False otherwise. The truth value of an array is ambiguous, so any() call reduces it to a single True/False value.

Answered By: cottontail
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.