pandas dataframe remove constant column


I have a dataframe that may or may not have columns that are the same value. For example

    row    A    B
    1      9    0
    2      7    0
    3      5    0
    4      2    0

I’d like to return just

   row    A  
   1      9    
   2      7    
   3      5    
   4      2

Is there a simple way to identify if any of these columns exist and then remove them?

Asked By: user1802143



Ignoring NaNs like usual, a column is constant if nunique() == 1. So:

>>> df
   A  B  row
0  9  0    1
1  7  0    2
2  5  0    3
3  2  0    4
>>> df = df.loc[:,df.apply(pd.Series.nunique) != 1]
>>> df
   A  row
0  9    1
1  7    2
2  5    3
3  2    4
Answered By: DSM

I believe this option will be faster than the other answers here as it will traverse the data frame only once for the comparison and short-circuit if a non-unique value is found.

>>> df

   0  1  2
0  1  9  0
1  2  7  0
2  3  7  0

>>> df.loc[:, (df != df.iloc[0]).any()] 

   0  1
0  1  9
1  2  7
2  3  7
Answered By: chthonicdaemon

Assuming that the DataFrame is completely of type numeric:

you can try:

>>> df = df.loc[:, df.var() == 0.0]

which will remove constant(i.e. variance = 0) columns.

If the DataFrame is of type both numeric and object, then you should try:

>>> enum_df = df.select_dtypes(include=['object'])
>>> num_df = df.select_dtypes(exclude=['object'])
>>> num_df = num_df.loc[:, num_df.var() == 0.0]
>>> df = pd.concat([num_df, enum_df], axis=1)

which will drop constant columns of numeric type only.

If you also want to ignore/delete constant enum columns, you should try:

>>> enum_df = df.select_dtypes(include=['object'])
>>> num_df = df.select_dtypes(exclude=['object'])
>>> enum_df = enum_df.loc[:, [True if y !=1 else False for y in [len(np.unique(x, return_counts=True)[-1]) for x in enum_df.T.as_matrix()]]]
>>> num_df = num_df.loc[:, num_df.var() == 0.0]
>>> df = pd.concat([num_df, enum_df], axis=1)
Answered By: Hng

Here is my solution since I needed to do both object and numerical columns. Not claiming its super efficient or anything but it gets the job done.

def drop_constants(df):
    """iterate through columns and remove columns with constant values (all same)"""
    columns = df.columns.values
    for col in columns:
        # drop col if unique values is 1
        if df[col].nunique(dropna=False) == 1:
            del df[col]
    return df

Extra caveat, it won’t work on columns of lists or arrays since they are not hashable.

Answered By: dreyco676

I compared various methods on data frame of size 120*10000. And found the efficient one is

def drop_constant_column(dataframe):
    Drops constant value columns of pandas dataframe.
    return dataframe.loc[:, (dataframe != dataframe.iloc[0]).any()]

1 loop, best of 3: 237 ms per loop

The other contenders are

def drop_constant_columns(dataframe):
    Drops constant value columns of pandas dataframe.
    result = dataframe.copy()
    for column in dataframe.columns:
        if len(dataframe[column].unique()) == 1:
            result = result.drop(column,axis=1)
    return result

1 loop, best of 3: 19.2 s per loop

def drop_constant_columns_2(dataframe):
    Drops constant value columns of pandas dataframe.
    for column in dataframe.columns:
        if len(dataframe[column].unique()) == 1:
    return dataframe

1 loop, best of 3: 317 ms per loop

def drop_constant_columns_3(dataframe):
    Drops constant value columns of pandas dataframe.
    keep_columns = [col for col in dataframe.columns if len(dataframe[col].unique()) > 1]
    return dataframe[keep_columns].copy()

1 loop, best of 3: 358 ms per loop

def drop_constant_columns_4(dataframe):
    Drops constant value columns of pandas dataframe.
    keep_columns = dataframe.columns[dataframe.nunique()>1]
    return dataframe.loc[:,keep_columns].copy()

1 loop, best of 3: 1.8 s per loop

Answered By: Yantraguru

Many examples in this thread does not work properly. Check this my answer with collection of examples that work

Answered By: vasili111
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.