How to get row number in dataframe in Pandas?
Question:
How can I get the number of the row in a dataframe that contains a certain value in a certain column using Pandas? For example, I have the following dataframe:
ClientID LastName
0 34 Johnson
1 67 Smith
2 53 Brows
How can I find the number of the row that has ‘Smith’ in ‘LastName’ column?
Answers:
Note that a dataframe’s index could be out of order, or not even numerical at all. If you don’t want to use the current index and instead renumber the rows sequentially, then you can use df.reset_index()
together with the suggestions below
To get all indices that matches ‘Smith’
>>> df[df['LastName'] == 'Smith'].index
Int64Index([1], dtype='int64')
or as a numpy array
>>> df[df['LastName'] == 'Smith'].index.to_numpy() # .values on older versions
array([1])
or if there is only one and you want the integer, you can subset
>>> df[df['LastName'] == 'Smith'].index[0]
1
You could use the same boolean expressions with .loc
, but it is not needed unless you also want to select a certain column, which is redundant when you only want the row number/index.
df.index[df.LastName == 'Smith']
Or
df.query('LastName == "Smith"').index
Will return all row indices where LastName
is Smith
Int64Index([1], dtype='int64')
df.loc[df.LastName == 'Smith']
will return the row
ClientID LastName
1 67 Smith
and
df.loc[df.LastName == 'Smith'].index
will return the index
Int64Index([1], dtype='int64')
NOTE: Column names ‘LastName’ and ‘Last Name’ or even ‘lastname’ are three unique names. The best practice would be to first check the exact name using df.columns. If you really need to strip the column names of all the white spaces, you can first do
df.columns = [x.strip().replace(' ', '') for x in df.columns]
count_smiths = (df['LastName'] == 'Smith').sum()
len(df[df["Lastname"]=="Smith"].values)
You can simply use shape method
df[df['LastName'] == 'Smith'].shape
Output
(1,1)
Which indicates 1 row and 1 column. This way you can get the idea of whole datasets
Let me explain the above code
DataframeName[DataframeName['Column_name'] == 'Value to match in column']
I know it’s many years later but don’t try the above solutions without reindexing your dataframe first. As many have pointed out already the number you see to the left of the dataframe 0,1,2 in the initial question is the index INSIDE that dataframe. When you extract a subset of it with a condition you might end up with 0,2 or 2,1, or 2,1 or 2,1,0 depending your condition. So by using that number (called "index") you will not get the position of the row in the subset. You will get the position of that row inside the main dataframe.
use:
np.where([df['LastName'] == 'Smith'])[1][0]
and play with the string ‘Smith’ to see the various outcomes. Where will return 2 arrays. The 2nd one (index 1) is the one you care about.
NOTE:
When the value you search for does not exist where() will return 0 on [1][0]. When is the first value of the list it will also return 0 on [1][0]. Make sure you validate the existence first.
NOTE #2:
In case the same value as in your condition is present in the subset multiple times on [1] with will find the list with the position of all occurrences. You can use the length of [1] for future processing if needed.
If the index of the dataframe and the ordinal number of the rows differ, most solutions posted here won’t work anymore. Given your dataframe with an alphabetical index:
In [2]: df = pd.DataFrame({"ClientID": {"A": 34, "B": 67, "C": 53}, "LastName": {"A": "Johnson", "B": "Smith", "C": "Brows"}})
In [3]: df
Out[3]:
ClientID LastName
A 34 Johnson
B 67 Smith
C 53 Brows
You have to use get_loc
to access the ordinal row number:
In [4]: df.index.get_loc(df.query('LastName == "Smith"').index[0])
Out[4]: 1
If there may exist multiple rows where the condition holds, e.g. find the ordinal row numbers that have ‘Smith’ or ‘Brows’ in LastName
column, you can use list comprehensions:
In [5]: [df.index.get_loc(idx) for idx in df.query('LastName == "Smith" | LastName == "Brows"').index]
Out[5]: [1, 2]
If in the question "row number" means actual row number/position (rather than index label)
pandas.Index.get_loc(key, method=None, tolerance=None)
seems to be the answer, ie something like:
row_number = df.index.get_loc(df.query(f'numbers == {m}').index[0])
The current answers, except one, explain how to get the index label rather than the row number.
Trivial code with index lables not corresponding to row numbers:
import pandas as pd
n = 3; m = n-1
df = pd.DataFrame({'numbers' : range(n) },
index = range(n-1,-1,-1))
print(df,"n")
label = df[df['numbers'] == m].index[0]
row_number = df.index.get_loc(df.query(f'numbers == {m}').index[0])
print(f'index label: {label}nrow number: {row_number}',"n")
print(f"df.loc[{label},'numbers']: {df.loc[label, 'numbers']}")
print(f"df.iloc[{row_number}, 0]: {df.iloc[row_number, 0]}")
numbers
2 0
1 1
0 2
index label: 0
row number: 2
df.loc[0,'numbers']: 2
df.iloc[2, 0]: 2
- To get exact row-number of single occurrence
row-number = df[df["LastName" == 'Smith']].index[0]
- To get exact row-number of multiple occurrence of ‘Smith’
row-number = df[df["LastName" == 'Smith']].index.tolist()
How can I get the number of the row in a dataframe that contains a certain value in a certain column using Pandas? For example, I have the following dataframe:
ClientID LastName
0 34 Johnson
1 67 Smith
2 53 Brows
How can I find the number of the row that has ‘Smith’ in ‘LastName’ column?
Note that a dataframe’s index could be out of order, or not even numerical at all. If you don’t want to use the current index and instead renumber the rows sequentially, then you can use df.reset_index()
together with the suggestions below
To get all indices that matches ‘Smith’
>>> df[df['LastName'] == 'Smith'].index
Int64Index([1], dtype='int64')
or as a numpy array
>>> df[df['LastName'] == 'Smith'].index.to_numpy() # .values on older versions
array([1])
or if there is only one and you want the integer, you can subset
>>> df[df['LastName'] == 'Smith'].index[0]
1
You could use the same boolean expressions with .loc
, but it is not needed unless you also want to select a certain column, which is redundant when you only want the row number/index.
df.index[df.LastName == 'Smith']
Or
df.query('LastName == "Smith"').index
Will return all row indices where LastName
is Smith
Int64Index([1], dtype='int64')
df.loc[df.LastName == 'Smith']
will return the row
ClientID LastName
1 67 Smith
and
df.loc[df.LastName == 'Smith'].index
will return the index
Int64Index([1], dtype='int64')
NOTE: Column names ‘LastName’ and ‘Last Name’ or even ‘lastname’ are three unique names. The best practice would be to first check the exact name using df.columns. If you really need to strip the column names of all the white spaces, you can first do
df.columns = [x.strip().replace(' ', '') for x in df.columns]
count_smiths = (df['LastName'] == 'Smith').sum()
len(df[df["Lastname"]=="Smith"].values)
You can simply use shape method
df[df['LastName'] == 'Smith'].shape
Output
(1,1)
Which indicates 1 row and 1 column. This way you can get the idea of whole datasets
Let me explain the above code
DataframeName[DataframeName['Column_name'] == 'Value to match in column']
I know it’s many years later but don’t try the above solutions without reindexing your dataframe first. As many have pointed out already the number you see to the left of the dataframe 0,1,2 in the initial question is the index INSIDE that dataframe. When you extract a subset of it with a condition you might end up with 0,2 or 2,1, or 2,1 or 2,1,0 depending your condition. So by using that number (called "index") you will not get the position of the row in the subset. You will get the position of that row inside the main dataframe.
use:
np.where([df['LastName'] == 'Smith'])[1][0]
and play with the string ‘Smith’ to see the various outcomes. Where will return 2 arrays. The 2nd one (index 1) is the one you care about.
NOTE:
When the value you search for does not exist where() will return 0 on [1][0]. When is the first value of the list it will also return 0 on [1][0]. Make sure you validate the existence first.
NOTE #2:
In case the same value as in your condition is present in the subset multiple times on [1] with will find the list with the position of all occurrences. You can use the length of [1] for future processing if needed.
If the index of the dataframe and the ordinal number of the rows differ, most solutions posted here won’t work anymore. Given your dataframe with an alphabetical index:
In [2]: df = pd.DataFrame({"ClientID": {"A": 34, "B": 67, "C": 53}, "LastName": {"A": "Johnson", "B": "Smith", "C": "Brows"}})
In [3]: df
Out[3]:
ClientID LastName
A 34 Johnson
B 67 Smith
C 53 Brows
You have to use get_loc
to access the ordinal row number:
In [4]: df.index.get_loc(df.query('LastName == "Smith"').index[0])
Out[4]: 1
If there may exist multiple rows where the condition holds, e.g. find the ordinal row numbers that have ‘Smith’ or ‘Brows’ in LastName
column, you can use list comprehensions:
In [5]: [df.index.get_loc(idx) for idx in df.query('LastName == "Smith" | LastName == "Brows"').index]
Out[5]: [1, 2]
If in the question "row number" means actual row number/position (rather than index label)
pandas.Index.get_loc(key, method=None, tolerance=None)
seems to be the answer, ie something like:
row_number = df.index.get_loc(df.query(f'numbers == {m}').index[0])
The current answers, except one, explain how to get the index label rather than the row number.
Trivial code with index lables not corresponding to row numbers:
import pandas as pd
n = 3; m = n-1
df = pd.DataFrame({'numbers' : range(n) },
index = range(n-1,-1,-1))
print(df,"n")
label = df[df['numbers'] == m].index[0]
row_number = df.index.get_loc(df.query(f'numbers == {m}').index[0])
print(f'index label: {label}nrow number: {row_number}',"n")
print(f"df.loc[{label},'numbers']: {df.loc[label, 'numbers']}")
print(f"df.iloc[{row_number}, 0]: {df.iloc[row_number, 0]}")
numbers
2 0
1 1
0 2
index label: 0
row number: 2
df.loc[0,'numbers']: 2
df.iloc[2, 0]: 2
- To get exact row-number of single occurrence
row-number = df[df["LastName" == 'Smith']].index[0]
- To get exact row-number of multiple occurrence of ‘Smith’
row-number = df[df["LastName" == 'Smith']].index.tolist()