checking if pandas dataframe is indexed?

Question:

Is it possible to check if a pandas dataframe is indexed? Check if DataFrame.set_index(...) was ever called on the dataframe? I could check if df.index is a numeric list but that’s not a perfect test for this.

Asked By: user248237

||

Answers:

One way would be to compare it to the plain Index:

pd.Index(np.arange(0, len(df))).equals(df.index)

For example:

In [11]: df = pd.DataFrame([['a', 'b'], ['c', 'd']], columns=['A', 'B'])

In [12]: df
Out[12]:
   A  B
0  a  b
1  c  d

In [13]: pd.Index(np.arange(0, len(df))).equals(df.index)
Out[13]: True

and if it’s not the plain index, it will return False:

In [14]: df = df.set_index('A')

In [15]: pd.Index(np.arange(0, len(df))).equals(df.index)
Out[15]: False
Answered By: Andy Hayden

I just ran into this myself. The problem is that a dataframe is indexed before calling .set_index(), so the question is really whether or not the index is named. In which case, df.index.name appears to be less reliable than df.index.names

>>> import pandas as pd
>>> df = pd.DataFrame({"id1": [1, 2, 3], "id2": [4,5,6], "word": ["cat", "mouse", "game"]})
>>> df
   id1  id2   word
0    1    4    cat
1    2    5  mouse
2    3    6   game
>>> df.index
RangeIndex(start=0, stop=3, step=1)
>>> df.index.name, df.index.names[0]
(None, None)
>>> "indexed" if df.index.names[0] else "no index"
'no index'
>>> df1 = df.set_index("id1")
>>> df1
     id2   word
id1            
1      4    cat
2      5  mouse
3      6   game
>>> df1.index
>>> df1.index.name, df1.index.names[0]
('id1', 'id1')
Int64Index([1, 2, 3], dtype='int64', name='id1')
>>> "indexed" if df1.index.names[0] else "no index"
'indexed'
>>> df12 = df.set_index(["id1", "id2"])
>>> df12
          word
id1 id2       
1   4      cat
2   5    mouse
3   6     game
>>> df12.index
MultiIndex([(1, 4),
            (2, 5),
            (3, 6)],
           names=['id1', 'id2'])
>>> df12.index.name, df12.index.names[0]
(None, 'id1')
>>> "indexed" if df12.index.names[0] else "no index"
'indexed'

Answered By: Victor Davis

The following worked for me, I do set_index([label], append=False) if the dataframe has the default RangeIndex, or set_index([label], append=True) otherwise.

append = not isinstance(df.index, pd.RangeIndex)
df.set_index([label], drop=True, append=append, inplace=True)

So my assumption, is that when index is the default RangeIndex, that setting another column as an index, I can drop the RangeIndex.

Answered By: alant
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.