How to get the index of ith item in pandas.Series or pandas.DataFrame

Question:

I’m trying to get the index of 6th item in a Series I have.

This is how the head looks like:

United States    1.536434e+13
China            6.348609e+12
Japan            5.542208e+12
Germany          3.493025e+12
France           2.681725e+12

To get the 6th index value (6th Country after being sorted), I usually use s.head(6) and get the 6th index from there.

s.head(6) gives me:

United States     1.536434e+13
China             6.348609e+12
Japan             5.542208e+12
Germany           3.493025e+12
France            2.681725e+12
United Kingdom    2.487907e+12

and from this Series, I get United Kingdom as the 6th index.

So, is there any better way for getting the index other than this? And also, for a dataframe, is there any function to get the 6th index on the basis of a respective column after sorting?

If it’s a dataframe, I usually sort, create a new column named index, and use reset_index, and then use iloc attribute to get the 6th (since it will be using a range in the index after reset).

Is there any better way to do this with pd.Series and pd.DataFrame?

Asked By: D3VLPR

||

Answers:

You could get it straight from the index

s.index[5]

Or

s.index.values[5]

It all depends on what you consider better. I can tell you that a numpy approach will probably be faster.

For example. numpy.argsort returns an array where the first element in the array is the position in the array being sorted that should be first. The second element in argsort’s return array is the position of the element in the array being sorted that should be second. So on and so forth.

So you could do this to get the index value of the 6th item after being sorted.

s.index.values[s.values.argsort()[5]]

Or more transparently

s.sort_values().index[5]

Or more creatively

s.nsmallest(6).idxmax()
Answered By: piRSquared

If you are trying to get the index of the ith item, then as piRSquared mentioned, s.index[i-1] suffices.

If you want to get the index of the ith largest value as in the OP, then instead of sorting the whole column / Series, a faster way is a combination of nlargest and idxmin:

i = 6
s.nlargest(i).idxmin()

or use argpartition and index. It is particularly fast because it only guarantees the ith element is in its final sorted position (which is the only thing we care about here), so it’s much faster than a full sorting of the elements (a timeit test shows that it’s about 15 times faster than a full sort and 3 times faster than nlargest.idxmin).

s.values.argpartition(len(s)-i)[-i]

To get the index of the ith smallest value,

s.nsmallest(i).idxmax()         # suggested by piRSquared
# or 
s.values.argpartition(i)[i-1]

A working example to get the index of the 6th largest value in a Series.

s = pd.Series(range(1_000_000)).sample(frac=1).reset_index(drop=True)


x = s.sort_values(ascending=False).index[5]
y = s.values.argsort()[-6]
z = s.nlargest(6).idxmin()
w = s.values.argpartition(len(s)-6)[-6]

x == y == z == w   # True
Answered By: cottontail
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.