Trying to drop NaN indexed row in dataframe

Question:

I’m using python 2.7.3 and Pandas version 0.12.0.

I want to drop the row with the NaN index so that I only have valid site_id values.

print df.head()
            special_name
site_id
NaN          Banana
OMG          Apple

df.drop(df.index[0])

TypeError: 'NoneType' object is not iterable

If I try dropping a range, like this:

df.drop(df.index[0:1])

I get this error:

AttributeError: 'DataFrame' object has no attribute 'special_name'
Asked By: Alison S

||

Answers:

I’ve found that the easiest way is to reset the index, drop the NaNs, and then reset the index again.

In [26]: dfA.reset_index()
Out[26]: 
  index special_name
0   NaN        Apple
1   OMG       Banana

In [30]: df = dfA.reset_index().dropna().set_index('index')

In [31]: df
Out[31]: 
      special_name
index             
OMG         Banana
Answered By: TomAugspurger

Edit: the following probably only applies to MultiIndexs, and is in any case obsoleted by the new df.index.isnull() function (see other answers). I’ll leave this answer just for historical interest.

For people who come to this now, one can do this directly without reindexing by relying on the fact that NaNs in the index will be represented with the label -1. So:

df = dfA[dfA.index.labels!=-1]

Even better, in Pandas>0.16.1, one can use drop() to do this inplace without copying:

dfA.drop(labels=[-1], level='index', inplace=True)

NB: It’s a bit misleading that the index level is called ‘index’: it would usually be something more use-specific like ‘date’ or ‘experimental_run’..

Answered By: Robert Muil

With pandas version >= 0.20.0 you can:

df = df[df.index.notnull()]

With older versions:

df = df[pandas.notnull(df.index)]

To break it down:

notnull generates a boolean mask, e.g. [False, False, True], where True denotes the value at the corresponding position is null (numpy.nan or None). We then select the rows whose index corresponds to a true value in the mask by using df[boolean_mask].

Answered By: Tim Diels

Tested this to be working :

df.reset_index(inplace=True)

df.drop(df[df['index'].isnull()].index, inplace=True)


How I checked the above

Replicated the table in the original question using
df=pd.DataFrame(data=['Banana', 'Apple'], index=[np.nan, 'OMG'],columns=['Special_name'])

then input the above two code lines- which I try to explain in human language below:

  • 1st line resets the index to integers, and the NaN is now in a column named after the original name of the index (‘index’ in the example above as there was no name specified) – pandas does this automatically with the reset_index() command.
  • 2nd line from innermost brackets: df[df['index'].isnull()] filters rows for which column named ‘index’ shows ‘NaN’ values using isnull() command. .index is used to pass an unambiguous index object pointing to all ‘index’=NaN rows to the df.drop( in the outermost part of the expression.

nb: tested the above command to work on multiple NaN values in a column

Using Python 3.5.1 , Pandas 0.17.1 via Anaconda package 32bits

Answered By: Mrumble

None of the answers worked 100% for me. Here’s what worked:

In [26]: print df
Out[26]:            
          site_id      special_name
0         OMG          Apple
1         NaN          Banana
2         RLY          Orange


In [27]: df.dropna(inplace=True)
Out[27]:            
          site_id      special_name
0         OMG          Apple
2         RLY          Orange

In [28]: df.reset_index(inplace=True)
Out[28]:            
          index     site_id      special_name
0         0         OMG          Apple
1         2         RLY          Orange

In [29]: df.drop('index', axis='columns', inplace=True)
Out[29]:             
          site_id      special_name
0         OMG          Apple
1         RLY          Orange
Answered By: Joakim

As of pandas 0.19, Indexes do have a .notnull() method, so the answer by timdiels can be simplified to:

df[df.index.notnull()]

which I think is (currently) the simplest you can get.

Answered By: Pietro Battiston

Alternatively you can use query:

In [4]: df.query('index == index')
Out[4]: 
        special_name
site_id             
OMG            Apple

This works as NaN when compared to itself returns False:

In [5]: np.nan == np.nan
Out[5]: False
Answered By: rachwa

Another version:

df[df.index.notna()]
Answered By: keramat
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.