Keeping NaN values and dropping nonmissing values
Question:
I have a DataFrame where I would like to keep the rows when a particular variable has a NaN
value and drop the non-missing values.
Example:
ticker opinion x1 x2
aapl GC 100 70
msft NaN 50 40
goog GC 40 60
wmt GC 45 15
abm NaN 80 90
In the above DataFrame, I would like to drop all observations where opinion is not missing (so, I would like to drop the rows where ticker is aapl, goog, and wmt
).
Is there anything in pandas that is the opposite to .dropna()
?
Answers:
Use pandas.Series.isnull
on the column to find the missing values and index with the result.
import pandas as pd
data = pd.DataFrame({'ticker': ['aapl', 'msft', 'goog'],
'opinion': ['GC', nan, 'GC'],
'x1': [100, 50, 40]})
data = data[data['opinion'].isnull()]
Alternatively you can use query
:
In [4]: df.query('opinion != opinion')
Out[4]:
ticker opinion x1 x2
1 msft NaN 50 40
4 abm NaN 80 90
This works as NaN is not equal to NaN:
In [5]: np.nan != np.nan
Out[5]: True
Not what the OP was asking, but in case you’re here for the inverse of df.dropna()
, the equivalent of df.keepna()
would be:
df[~df.index.isin(df.dropna().index)]
I have a DataFrame where I would like to keep the rows when a particular variable has a NaN
value and drop the non-missing values.
Example:
ticker opinion x1 x2
aapl GC 100 70
msft NaN 50 40
goog GC 40 60
wmt GC 45 15
abm NaN 80 90
In the above DataFrame, I would like to drop all observations where opinion is not missing (so, I would like to drop the rows where ticker is aapl, goog, and wmt
).
Is there anything in pandas that is the opposite to .dropna()
?
Use pandas.Series.isnull
on the column to find the missing values and index with the result.
import pandas as pd
data = pd.DataFrame({'ticker': ['aapl', 'msft', 'goog'],
'opinion': ['GC', nan, 'GC'],
'x1': [100, 50, 40]})
data = data[data['opinion'].isnull()]
Alternatively you can use query
:
In [4]: df.query('opinion != opinion')
Out[4]:
ticker opinion x1 x2
1 msft NaN 50 40
4 abm NaN 80 90
This works as NaN is not equal to NaN:
In [5]: np.nan != np.nan
Out[5]: True
Not what the OP was asking, but in case you’re here for the inverse of df.dropna()
, the equivalent of df.keepna()
would be:
df[~df.index.isin(df.dropna().index)]