pandas Series sort_index with key

Question:

I am trying to sort a Series using sort_index(key = lambda idx: foo(idx)), which should take the first item of the list and put it at the end. My sorting function foo looks like this:

def foo(idx):
    print("pre",idx)
    if idx.name == "pca_n":
        ret = pd.Index(list(idx[1:]) + list(idx[:1]),name=idx.name)
    else:
        ret = idx.copy()
    print("post",ret)
    return ret

I call it like this:

print("index before sort",byHyp.index)
byHyp = byHyp.sort_index(key = lambda x: foo(x))
print("index after sort",byHyp.index)

This results in the following output:

index before sort Int64Index([-1, 2, 5, 10, 20], dtype='int64', name='pca_n')
pre Int64Index([-1, 2, 5, 10, 20], dtype='int64', name='pca_n')
post Int64Index([2, 5, 10, 20, -1], dtype='int64', name='pca_n')
index after sort Int64Index([20, -1, 2, 5, 10], dtype='int64', name='pca_n')

In other words, the output of foo gives a list of indices, but they are not retained in the Series. (I am expecting [2,5,10,20,-1], as is the output of foo). Perhaps I am misunderstanding how to use the key argument of sort_index?

Asked By: GregarityNow

||

Answers:

If you just return the list of the order that you want as a regular list then do df.loc[returned list] it will order it they way you want it ordered. Notice below the index goes from 1912 to 1916, but you can reset it to be any order you with with df.loc[your_new_order].
enter image description here

Answered By: Weezy.F

The docs explain:

key: callable, optional

If not None, apply the key function to the index values before sorting.

In other words, foo gets called and returns an index [2,5,10,20,-1]. After that, your df index gets sorted according to the output of foo:

  • the output of foo in your example is already nearly sorted, we just have to make the final element -1 the first element
  • this means your df’s index will go from [-1, 2, 5, 10, 20] to [20, -1, 2, 5, 10], which is exactly what your output shows.

I think what you are trying to do is not sort the index, but rather reorder it using foo like this:

print("index before-ordering",byHyp.index)
byHyp = byHyp.loc[foo(byHyp.index), :]
print("index after re-ordering",byHyp.index)

… or, as pointed out by OP in a comment, if the input is a Series then:

byHyp = byHyp[foo(byHyp.index)]

Output:

index before-ordering Int64Index([-1, 2, 5, 10, 20], dtype='int64', name='pca_n')
pre Int64Index([-1, 2, 5, 10, 20], dtype='int64', name='pca_n')
post Int64Index([2, 5, 10, 20, -1], dtype='int64', name='pca_n')
index after re-ordering Int64Index([2, 5, 10, 20, -1], dtype='int64', name='pca_n')
Answered By: constantstranger
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.