Trying to drop NaN indexed row in dataframe
Question:
I’m using python 2.7.3 and Pandas version 0.12.0.
I want to drop the row with the NaN index so that I only have valid site_id values.
print df.head()
special_name
site_id
NaN Banana
OMG Apple
df.drop(df.index[0])
TypeError: 'NoneType' object is not iterable
If I try dropping a range, like this:
df.drop(df.index[0:1])
I get this error:
AttributeError: 'DataFrame' object has no attribute 'special_name'
Answers:
I’ve found that the easiest way is to reset the index, drop the NaNs, and then reset the index again.
In [26]: dfA.reset_index()
Out[26]:
index special_name
0 NaN Apple
1 OMG Banana
In [30]: df = dfA.reset_index().dropna().set_index('index')
In [31]: df
Out[31]:
special_name
index
OMG Banana
Edit: the following probably only applies to MultiIndex
s, and is in any case obsoleted by the new df.index.isnull()
function (see other answers). I’ll leave this answer just for historical interest.
For people who come to this now, one can do this directly without reindexing by relying on the fact that NaNs in the index will be represented with the label -1
. So:
df = dfA[dfA.index.labels!=-1]
Even better, in Pandas>0.16.1, one can use drop() to do this inplace without copying:
dfA.drop(labels=[-1], level='index', inplace=True)
NB: It’s a bit misleading that the index level is called ‘index’: it would usually be something more use-specific like ‘date’ or ‘experimental_run’..
With pandas version >= 0.20.0 you can:
df = df[df.index.notnull()]
With older versions:
df = df[pandas.notnull(df.index)]
To break it down:
notnull
generates a boolean mask, e.g. [False, False, True]
, where True denotes the value at the corresponding position is null (numpy.nan
or None
). We then select the rows whose index corresponds to a true value in the mask by using df[boolean_mask]
.
Tested this to be working :
df.reset_index(inplace=True)
df.drop(df[df['index'].isnull()].index, inplace=True)
How I checked the above
Replicated the table in the original question using
df=pd.DataFrame(data=['Banana', 'Apple'], index=[np.nan, 'OMG'],columns=['Special_name'])
then input the above two code lines- which I try to explain in human language below:
- 1st line resets the index to integers, and the NaN is now in a column named after the original name of the index (‘index’ in the example above as there was no name specified) – pandas does this automatically with the reset_index() command.
- 2nd line from innermost brackets:
df[df['index'].isnull()]
filters rows for which column named ‘index’ shows ‘NaN’ values using isnull() command. .index
is used to pass an unambiguous index object pointing to all ‘index’=NaN rows to the df.drop(
in the outermost part of the expression.
nb: tested the above command to work on multiple NaN values in a column
Using Python 3.5.1 , Pandas 0.17.1 via Anaconda package 32bits
None of the answers worked 100% for me. Here’s what worked:
In [26]: print df
Out[26]:
site_id special_name
0 OMG Apple
1 NaN Banana
2 RLY Orange
In [27]: df.dropna(inplace=True)
Out[27]:
site_id special_name
0 OMG Apple
2 RLY Orange
In [28]: df.reset_index(inplace=True)
Out[28]:
index site_id special_name
0 0 OMG Apple
1 2 RLY Orange
In [29]: df.drop('index', axis='columns', inplace=True)
Out[29]:
site_id special_name
0 OMG Apple
1 RLY Orange
As of pandas
0.19, Index
es do have a .notnull()
method, so the answer by timdiels can be simplified to:
df[df.index.notnull()]
which I think is (currently) the simplest you can get.
Alternatively you can use query
:
In [4]: df.query('index == index')
Out[4]:
special_name
site_id
OMG Apple
This works as NaN when compared to itself returns False:
In [5]: np.nan == np.nan
Out[5]: False
Another version:
df[df.index.notna()]
I’m using python 2.7.3 and Pandas version 0.12.0.
I want to drop the row with the NaN index so that I only have valid site_id values.
print df.head()
special_name
site_id
NaN Banana
OMG Apple
df.drop(df.index[0])
TypeError: 'NoneType' object is not iterable
If I try dropping a range, like this:
df.drop(df.index[0:1])
I get this error:
AttributeError: 'DataFrame' object has no attribute 'special_name'
I’ve found that the easiest way is to reset the index, drop the NaNs, and then reset the index again.
In [26]: dfA.reset_index()
Out[26]:
index special_name
0 NaN Apple
1 OMG Banana
In [30]: df = dfA.reset_index().dropna().set_index('index')
In [31]: df
Out[31]:
special_name
index
OMG Banana
Edit: the following probably only applies to MultiIndex
s, and is in any case obsoleted by the new df.index.isnull()
function (see other answers). I’ll leave this answer just for historical interest.
For people who come to this now, one can do this directly without reindexing by relying on the fact that NaNs in the index will be represented with the label -1
. So:
df = dfA[dfA.index.labels!=-1]
Even better, in Pandas>0.16.1, one can use drop() to do this inplace without copying:
dfA.drop(labels=[-1], level='index', inplace=True)
NB: It’s a bit misleading that the index level is called ‘index’: it would usually be something more use-specific like ‘date’ or ‘experimental_run’..
With pandas version >= 0.20.0 you can:
df = df[df.index.notnull()]
With older versions:
df = df[pandas.notnull(df.index)]
To break it down:
notnull
generates a boolean mask, e.g. [False, False, True]
, where True denotes the value at the corresponding position is null (numpy.nan
or None
). We then select the rows whose index corresponds to a true value in the mask by using df[boolean_mask]
.
Tested this to be working :
df.reset_index(inplace=True)
df.drop(df[df['index'].isnull()].index, inplace=True)
How I checked the above
Replicated the table in the original question using
df=pd.DataFrame(data=['Banana', 'Apple'], index=[np.nan, 'OMG'],columns=['Special_name'])
then input the above two code lines- which I try to explain in human language below:
- 1st line resets the index to integers, and the NaN is now in a column named after the original name of the index (‘index’ in the example above as there was no name specified) – pandas does this automatically with the reset_index() command.
- 2nd line from innermost brackets:
df[df['index'].isnull()]
filters rows for which column named ‘index’ shows ‘NaN’ values using isnull() command..index
is used to pass an unambiguous index object pointing to all ‘index’=NaN rows to thedf.drop(
in the outermost part of the expression.
nb: tested the above command to work on multiple NaN values in a column
Using Python 3.5.1 , Pandas 0.17.1 via Anaconda package 32bits
None of the answers worked 100% for me. Here’s what worked:
In [26]: print df
Out[26]:
site_id special_name
0 OMG Apple
1 NaN Banana
2 RLY Orange
In [27]: df.dropna(inplace=True)
Out[27]:
site_id special_name
0 OMG Apple
2 RLY Orange
In [28]: df.reset_index(inplace=True)
Out[28]:
index site_id special_name
0 0 OMG Apple
1 2 RLY Orange
In [29]: df.drop('index', axis='columns', inplace=True)
Out[29]:
site_id special_name
0 OMG Apple
1 RLY Orange
As of pandas
0.19, Index
es do have a .notnull()
method, so the answer by timdiels can be simplified to:
df[df.index.notnull()]
which I think is (currently) the simplest you can get.
Alternatively you can use query
:
In [4]: df.query('index == index')
Out[4]:
special_name
site_id
OMG Apple
This works as NaN when compared to itself returns False:
In [5]: np.nan == np.nan
Out[5]: False
Another version:
df[df.index.notna()]