# How to drop rows of Pandas DataFrame whose value in a certain column is NaN

## Question:

I have this `DataFrame`

and want only the records whose `EPS`

column is not `NaN`

:

```
>>> df
STK_ID EPS cash
STK_ID RPT_Date
601166 20111231 601166 NaN NaN
600036 20111231 600036 NaN 12
600016 20111231 600016 4.3 NaN
601009 20111231 601009 NaN NaN
601939 20111231 601939 2.5 NaN
000001 20111231 000001 NaN NaN
```

…i.e. something like `df.drop(....)`

to get this resulting dataframe:

```
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
```

How do I do that?

## Answers:

Don’t drop, just take the rows where EPS is not NA:

```
df = df[df['EPS'].notna()]
```

This question is already resolved, but…

…also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including `dropna()`

, is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.

```
In [24]: df = pd.DataFrame(np.random.randn(10,3))
In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;
In [26]: df
Out[26]:
0 1 2
0 NaN NaN NaN
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN
```

```
In [27]: df.dropna() #drop all rows that have any NaN values
Out[27]:
0 1 2
1 2.677677 -1.466923 -0.750366
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
```

```
In [28]: df.dropna(how='all') #drop only if ALL columns are NaN
Out[28]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN
```

```
In [29]: df.dropna(thresh=2) #Drop row if it does not have at least two values that are **not** NaN
Out[29]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN
```

```
In [30]: df.dropna(subset=[1]) #Drop only if NaN in specific column (as asked in the question)
Out[30]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN
```

There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.

Pretty handy!

I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:

```
import pandas as pd
df = df[pd.notnull(df['EPS'])]
```

You could use dataframe method notnull or inverse of isnull, or numpy.isnan:

```
In [332]: df[df.EPS.notnull()]
Out[332]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
In [334]: df[~df.EPS.isnull()]
Out[334]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
In [347]: df[~np.isnan(df.EPS)]
Out[347]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
```

It may be added at that ‘&’ can be used to add additional conditions e.g.

```
df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
```

Notice that when evaluating the statements, pandas needs parenthesis.

yet another solution which uses the fact that `np.nan != np.nan`

:

```
In [149]: df.query("EPS == EPS")
Out[149]:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
```

You can use this:

```
df.dropna(subset=['EPS'], how='all', inplace=True)
```

**Simplest of all solutions:**

```
filtered_df = df[df['EPS'].notnull()]
```

The above solution is way better than using np.isfinite()

**Simple and easy way**

`df.dropna(subset=['EPS'],inplace=True)`

source: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html

In datasets having large number of columns its even better to see how many columns contain null values and how many don’t.

```
print("No. of columns containing null values")
print(len(df.columns[df.isna().any()]))
print("No. of columns not containing null values")
print(len(df.columns[df.notna().all()]))
print("Total no. of columns in the dataframe")
print(len(df.columns))
```

For example in my dataframe it contained 82 columns, of which 19 contained at least one null value.

Further you can also **automatically remove cols and rows** depending on which has more null values

Here is the code which does this intelligently:

```
df = df.drop(df.columns[df.isna().sum()>len(df.columns)],axis = 1)
df = df.dropna(axis = 0).reset_index(drop=True)
```

**Note:** Above code removes all of your null values. If you want null values, process them before.

Another version:

```
df[~df['EPS'].isna()]
```

## How to drop rows of Pandas DataFrame whose value in a certain column is NaN

This is an old question which has been beaten to death but I do believe there is some more useful information to be surfaced on this thread. Read on if you’re looking for the answer to any of the following questions:

- Can I drop rows if any of its values have NaNs? What about if all of them are NaN?
- Can I only look at NaNs in specific columns when dropping rows?
- Can I drop rows with a specific count of NaN values?
- How do I drop columns instead of rows?
- I tried all of the options above but my DataFrame just won’t update!

`DataFrame.dropna`

: Usage, and Examples

It’s already been said that `df.dropna`

is the canonical method to drop NaNs from DataFrames, but there’s nothing like a few visual cues to help along the way.

```
# Setup
df = pd.DataFrame({
'A': [np.nan, 2, 3, 4],
'B': [np.nan, np.nan, 2, 3],
'C': [np.nan]*3 + [3]})
df
A B C
0 NaN NaN NaN
1 2.0 NaN NaN
2 3.0 2.0 NaN
3 4.0 3.0 3.0
```

Below is a detail of the most important arguments and how they work, arranged in an FAQ format.

## Can I drop rows if any of its values have NaNs? What about if all of them are NaN?

This is where the `how=...`

argument comes in handy. It can be one of

`'any'`

(default) – drops rows if at least one column has NaN`'all'`

– drops rows only if all of its columns have NaNs

<!_ ->

```
# Removes all but the last row since there are no NaNs
df.dropna()
A B C
3 4.0 3.0 3.0
# Removes the first row only
df.dropna(how='all')
A B C
1 2.0 NaN NaN
2 3.0 2.0 NaN
3 4.0 3.0 3.0
```

Note

If you just want to see which rows are null (IOW, if you want a

boolean mask of rows), use

`isna`

:`df.isna() A B C 0 True True True 1 False True True 2 False False True 3 False False False df.isna().any(axis=1) 0 True 1 True 2 True 3 False dtype: bool`

To get the inversion of this result, use

`notna`

instead.

## Can I only look at NaNs in specific columns when dropping rows?

This is a use case for the `subset=[...]`

argument.

Specify a list of columns (or indexes with `axis=1`

) to tells pandas you only want to look at these columns (or rows with `axis=1`

) when dropping rows (or columns with `axis=1`

.

```
# Drop all rows with NaNs in A
df.dropna(subset=['A'])
A B C
1 2.0 NaN NaN
2 3.0 2.0 NaN
3 4.0 3.0 3.0
# Drop all rows with NaNs in A OR B
df.dropna(subset=['A', 'B'])
A B C
2 3.0 2.0 NaN
3 4.0 3.0 3.0
```

## Can I drop rows with a specific count of NaN values?

This is a use case for the `thresh=...`

argument. Specify the minimum number of NON-NULL values as an integer.

```
df.dropna(thresh=1)
A B C
1 2.0 NaN NaN
2 3.0 2.0 NaN
3 4.0 3.0 3.0
df.dropna(thresh=2)
A B C
2 3.0 2.0 NaN
3 4.0 3.0 3.0
df.dropna(thresh=3)
A B C
3 4.0 3.0 3.0
```

The thing to note here is you need to specify how many NON-NULL values you want to *keep*, rather than how many NULL values you want to *drop*. This is a pain point for new users.

Luckily the fix is easy: if you have a count of NULL values, simply subtract it from the column size to get the correct thresh argument for the function.

```
required_min_null_values_to_drop = 2 # drop rows with at least 2 NaN
df.dropna(thresh=df.shape[1] - required_min_null_values_to_drop + 1)
A B C
2 3.0 2.0 NaN
3 4.0 3.0 3.0
```

## How do I drop columns instead of rows?

Use the `axis=...`

argument, it can be `axis=0`

or `axis=1`

.

Tells the function whether you want to drop rows (`axis=0`

) or drop columns (`axis=1`

).

```
df.dropna()
A B C
3 4.0 3.0 3.0
# All columns have rows, so the result is empty.
df.dropna(axis=1)
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3]
# Here's a different example requiring the column to have all NaN rows
# to be dropped. In this case no columns satisfy the condition.
df.dropna(axis=1, how='all')
A B C
0 NaN NaN NaN
1 2.0 NaN NaN
2 3.0 2.0 NaN
3 4.0 3.0 3.0
# Here's a different example requiring a column to have at least 2 NON-NULL
# values. Column C has less than 2 NON-NULL values, so it should be dropped.
df.dropna(axis=1, thresh=2)
A B
0 NaN NaN
1 2.0 NaN
2 3.0 2.0
3 4.0 3.0
```

## I tried all of the options above but my DataFrame just won’t update!

`dropna`

, like most other functions in the pandas API returns a new DataFrame (a copy of the original with changes) as the result, so you should assign it back if you want to see changes.

```
df.dropna(...) # wrong
df.dropna(..., inplace=True) # right, but not recommended
df = df.dropna(...) # right
```

# Reference

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html

`DataFrame.dropna( self, axis=0, how='any', thresh=None, subset=None, inplace=False)`

The following method worked for me. It would help if none of the above methods work:

```
df[df['colum_name'].str.len() >= 1]
```

The basic idea is that you pick up the record only if the length strength is greater than 1. This is especially useful if you are dealing with string data

Best!

you can try with:

```
df['EPS'].dropna()
```