Python Pandas find all rows where all values are NaN
Question:
So I have a dataframe with 5 columns. I would like to pull the indices where all of the columns are NaN. I was using this code:
nan = pd.isnull(df.all)
but that is just returning false because it is logically saying no not all values in the dataframe are null. There are thousands of entries so I would prefer to not have to loop through and check each entry. Thanks!
Answers:
It should just be:
df.isnull().all(1)
The index
can be accessed like:
df.index[df.isnull().all(1)]
Demonstration
np.random.seed([3,1415])
df = pd.DataFrame(np.random.choice((1, np.nan), (10, 2)))
df
idx = df.index[df.isnull().all(1)]
nans = df.ix[idx]
nans
Timing
code
np.random.seed([3,1415])
df = pd.DataFrame(np.random.choice((1, np.nan), (10000, 5)))
Assuming your dataframe is named df
, you can use boolean indexing to check if all columns (axis=1
) are null. Then take the index of the result.
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5, 3))
df.iloc[-2:, :] = np.nan
>>> df
0 1 2
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
2 0.950088 -0.151357 -0.103219
3 NaN NaN NaN
4 NaN NaN NaN
nan = df[df.isnull().all(axis=1)].index
>>> nan
Int64Index([3, 4], dtype='int64')
From the master himself: https://stackoverflow.com/a/14033137/6664393
nans = pd.isnull(df).all(1).nonzero()[0]
So I have a dataframe with 5 columns. I would like to pull the indices where all of the columns are NaN. I was using this code:
nan = pd.isnull(df.all)
but that is just returning false because it is logically saying no not all values in the dataframe are null. There are thousands of entries so I would prefer to not have to loop through and check each entry. Thanks!
It should just be:
df.isnull().all(1)
The index
can be accessed like:
df.index[df.isnull().all(1)]
Demonstration
np.random.seed([3,1415])
df = pd.DataFrame(np.random.choice((1, np.nan), (10, 2)))
df
idx = df.index[df.isnull().all(1)]
nans = df.ix[idx]
nans
Timing
code
np.random.seed([3,1415])
df = pd.DataFrame(np.random.choice((1, np.nan), (10000, 5)))
Assuming your dataframe is named df
, you can use boolean indexing to check if all columns (axis=1
) are null. Then take the index of the result.
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5, 3))
df.iloc[-2:, :] = np.nan
>>> df
0 1 2
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
2 0.950088 -0.151357 -0.103219
3 NaN NaN NaN
4 NaN NaN NaN
nan = df[df.isnull().all(axis=1)].index
>>> nan
Int64Index([3, 4], dtype='int64')
From the master himself: https://stackoverflow.com/a/14033137/6664393
nans = pd.isnull(df).all(1).nonzero()[0]