Getting the integer index of a Pandas DataFrame row fulfilling a condition?

Question:

I have the following DataFrame:

   a  b  c
b
2  1  2  3
5  4  5  6

As you can see, column b is used as an index. I want to get the ordinal number of the row fulfilling ('b' == 5), which in this case would be 1.

The column being tested can be either an index column (as with b in this case) or a regular column, e.g. I may want to find the index of the row fulfilling ('c' == 6).

Asked By: Dun Peal

||

Answers:

You could use np.where like this:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(1,7).reshape(2,3),
                  columns = list('abc'), 
                  index=pd.Series([2,5], name='b'))
print(df)
#    a  b  c
# b         
# 2  1  2  3
# 5  4  5  6
print(np.where(df.index==5)[0])
# [1]
print(np.where(df['c']==6)[0])
# [1]

The value returned is an array since there could be more than one row with a particular index or value in a column.

Answered By: unutbu

Use Index.get_loc instead.

Reusing @unutbu’s set up code, you’ll achieve the same results.

>>> import pandas as pd
>>> import numpy as np


>>> df = pd.DataFrame(np.arange(1,7).reshape(2,3),
                  columns = list('abc'),
                  index=pd.Series([2,5], name='b'))
>>> df
   a  b  c
b
2  1  2  3
5  4  5  6
>>> df.index.get_loc(5)
1
Answered By: hlin117

With Index.get_loc and general condition:

>>> import pandas as pd
>>> import numpy as np


>>> df = pd.DataFrame(np.arange(1,7).reshape(2,3),
                  columns = list('abc'),
                  index=pd.Series([2,5], name='b'))
>>> df
   a  b  c
b
2  1  2  3
5  4  5  6
>>> df.index.get_loc(df.index[df['b'] == 5][0])
1
Answered By: Gabriele Picco

The other answers based on Index.get_loc() do not provide a consistent result, because this function will return in integer if the index consists of all unique values, but it will return a boolean mask array if the index does not consist of unique values. A more consistent approach to return a list of integer values every time would be the following, with this example shown for an index with non-unique values:

df = pd.DataFrame([
    {"A":1, "B":2}, {"A":2, "B":2}, 
    {"A":3, "B":4}, {"A":1, "B":3}
], index=[1,2,3,1])

If searching based on index value:

[i for i,v in enumerate(df.index == 1) if v]
[0, 3]

If searching based on a column value:

[i for i,v in enumerate(df["B"] == 2) if v]
[0, 1]
Answered By: BioData41
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.