# How do I delete a column that contains only zeros in Pandas?

## Question:

I currently have a dataframe consisting of columns with 1’s and 0’s as values, I would like to iterate through the columns and delete the ones that are made up of only 0’s. Here’s what I have tried so far:

```
ones = []
zeros = []
for year in years:
for i in range(0,599):
if year[str(i)].values.any() == 1:
ones.append(i)
if year[str(i)].values.all() == 0:
zeros.append(i)
for j in ones:
if j in zeros:
zeros.remove(j)
for q in zeros:
del year[str(q)]
```

In which years is a list of dataframes for the various years I am analyzing, ones consists of columns with a one in them and zeros is a list of columns containing all zeros. Is there a better way to delete a column based on a condition? For some reason I have to check whether the ones columns are in the zeros list as well and remove them from the zeros list to obtain a list of all the zero columns.

## Answers:

```
df.loc[:, (df != 0).any(axis=0)]
```

Here is a break-down of how it works:

```
In [74]: import pandas as pd
In [75]: df = pd.DataFrame([[1,0,0,0], [0,0,1,0]])
In [76]: df
Out[76]:
0 1 2 3
0 1 0 0 0
1 0 0 1 0
[2 rows x 4 columns]
```

`df != 0`

creates a boolean DataFrame which is True where `df`

is nonzero:

```
In [77]: df != 0
Out[77]:
0 1 2 3
0 True False False False
1 False False True False
[2 rows x 4 columns]
```

`(df != 0).any(axis=0)`

returns a boolean Series indicating which columns have nonzero entries. (The `any`

operation aggregates values along the 0-axis — i.e. along the rows — into a single boolean value. Hence the result is one boolean value for each column.)

```
In [78]: (df != 0).any(axis=0)
Out[78]:
0 True
1 False
2 True
3 False
dtype: bool
```

And `df.loc`

can be used to select those columns:

```
In [79]: df.loc[:, (df != 0).any(axis=0)]
Out[79]:
0 2
0 1 0
1 0 1
[2 rows x 2 columns]
```

To “delete” the zero-columns, reassign `df`

:

```
df = df.loc[:, (df != 0).any(axis=0)]
```

Here is an alternative way to use is

`df.replace(0,np.nan).dropna(axis=1,how="all")`

Compared with the solution of unutbu, this way is obviously slower:

```
%timeit df.loc[:, (df != 0).any(axis=0)]
652 µs ± 5.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df.replace(0,np.nan).dropna(axis=1,how="all")
1.75 ms ± 9.49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

In case you’d like a more **expressive** way of getting the zero-column names so you can print / log them, and drop them, in-place, by their **names**:

```
zero_cols = [ col for col, is_zero in ((df == 0).sum() == df.shape[0]).items() if is_zero ]
df.drop(zero_cols, axis=1, inplace=True)
```

Some break down:

```
# a pandas Series with {col: is_zero} items
# is_zero is True when the number of zero items in that column == num_all_rows
(df == 0).sum() == df.shape[0])
# a list comprehension of zero_col_names is built from the_series
[ col for col, is_zero in the_series.items() if is_zero ]
```

In case there are some NaN values in your columns, you may want to use this approach if you want to remove columns that have both 0 and NaN :

```
df.loc[:, (df**2).sum() != 0]
```

This should do the work:

```
zero_cols = df.columns[(df == 0).all()]
df.drop(labels=zero_cols, axis=1, inplace=True)
```