Keeping NaN values and dropping nonmissing values

Question:

I have a DataFrame where I would like to keep the rows when a particular variable has a NaN value and drop the non-missing values.

Example:

ticker  opinion  x1       x2  
aapl    GC       100      70  
msft    NaN      50       40  
goog    GC       40       60  
wmt     GC       45       15  
abm     NaN      80       90  

In the above DataFrame, I would like to drop all observations where opinion is not missing (so, I would like to drop the rows where ticker is aapl, goog, and wmt).

Is there anything in pandas that is the opposite to .dropna()?

Asked By: tan

||

Answers:

Use pandas.Series.isnull on the column to find the missing values and index with the result.

import pandas as pd

data = pd.DataFrame({'ticker': ['aapl', 'msft', 'goog'],
                     'opinion': ['GC', nan, 'GC'],
                     'x1': [100, 50, 40]})

data = data[data['opinion'].isnull()]
Answered By: Roger Fan

Alternatively you can use query:

In [4]: df.query('opinion != opinion')
Out[4]: 
  ticker opinion  x1  x2
1   msft     NaN  50  40
4    abm     NaN  80  90

This works as NaN is not equal to NaN:

In [5]: np.nan != np.nan
Out[5]: True
Answered By: rachwa

Not what the OP was asking, but in case you’re here for the inverse of df.dropna(), the equivalent of df.keepna() would be:

df[~df.index.isin(df.dropna().index)]
Answered By: csaroff
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.