pandas Series sort_index with key
Question:
I am trying to sort a Series using sort_index(key = lambda idx: foo(idx))
, which should take the first item of the list and put it at the end. My sorting function foo
looks like this:
def foo(idx):
print("pre",idx)
if idx.name == "pca_n":
ret = pd.Index(list(idx[1:]) + list(idx[:1]),name=idx.name)
else:
ret = idx.copy()
print("post",ret)
return ret
I call it like this:
print("index before sort",byHyp.index)
byHyp = byHyp.sort_index(key = lambda x: foo(x))
print("index after sort",byHyp.index)
This results in the following output:
index before sort Int64Index([-1, 2, 5, 10, 20], dtype='int64', name='pca_n')
pre Int64Index([-1, 2, 5, 10, 20], dtype='int64', name='pca_n')
post Int64Index([2, 5, 10, 20, -1], dtype='int64', name='pca_n')
index after sort Int64Index([20, -1, 2, 5, 10], dtype='int64', name='pca_n')
In other words, the output of foo
gives a list of indices, but they are not retained in the Series. (I am expecting [2,5,10,20,-1]
, as is the output of foo). Perhaps I am misunderstanding how to use the key
argument of sort_index
?
Answers:
The docs explain:
key: callable, optional
If not None, apply the key function to the index values before sorting.
In other words, foo
gets called and returns an index [2,5,10,20,-1]
. After that, your df index gets sorted according to the output of foo
:
- the output of foo in your example is already nearly sorted, we just have to make the final element
-1
the first element
- this means your df’s index will go from
[-1, 2, 5, 10, 20]
to [20, -1, 2, 5, 10]
, which is exactly what your output shows.
I think what you are trying to do is not sort the index, but rather reorder it using foo
like this:
print("index before-ordering",byHyp.index)
byHyp = byHyp.loc[foo(byHyp.index), :]
print("index after re-ordering",byHyp.index)
… or, as pointed out by OP in a comment, if the input is a Series then:
byHyp = byHyp[foo(byHyp.index)]
Output:
index before-ordering Int64Index([-1, 2, 5, 10, 20], dtype='int64', name='pca_n')
pre Int64Index([-1, 2, 5, 10, 20], dtype='int64', name='pca_n')
post Int64Index([2, 5, 10, 20, -1], dtype='int64', name='pca_n')
index after re-ordering Int64Index([2, 5, 10, 20, -1], dtype='int64', name='pca_n')
I am trying to sort a Series using sort_index(key = lambda idx: foo(idx))
, which should take the first item of the list and put it at the end. My sorting function foo
looks like this:
def foo(idx):
print("pre",idx)
if idx.name == "pca_n":
ret = pd.Index(list(idx[1:]) + list(idx[:1]),name=idx.name)
else:
ret = idx.copy()
print("post",ret)
return ret
I call it like this:
print("index before sort",byHyp.index)
byHyp = byHyp.sort_index(key = lambda x: foo(x))
print("index after sort",byHyp.index)
This results in the following output:
index before sort Int64Index([-1, 2, 5, 10, 20], dtype='int64', name='pca_n')
pre Int64Index([-1, 2, 5, 10, 20], dtype='int64', name='pca_n')
post Int64Index([2, 5, 10, 20, -1], dtype='int64', name='pca_n')
index after sort Int64Index([20, -1, 2, 5, 10], dtype='int64', name='pca_n')
In other words, the output of foo
gives a list of indices, but they are not retained in the Series. (I am expecting [2,5,10,20,-1]
, as is the output of foo). Perhaps I am misunderstanding how to use the key
argument of sort_index
?
The docs explain:
key: callable, optional
If not None, apply the key function to the index values before sorting.
In other words, foo
gets called and returns an index [2,5,10,20,-1]
. After that, your df index gets sorted according to the output of foo
:
- the output of foo in your example is already nearly sorted, we just have to make the final element
-1
the first element - this means your df’s index will go from
[-1, 2, 5, 10, 20]
to[20, -1, 2, 5, 10]
, which is exactly what your output shows.
I think what you are trying to do is not sort the index, but rather reorder it using foo
like this:
print("index before-ordering",byHyp.index)
byHyp = byHyp.loc[foo(byHyp.index), :]
print("index after re-ordering",byHyp.index)
… or, as pointed out by OP in a comment, if the input is a Series then:
byHyp = byHyp[foo(byHyp.index)]
Output:
index before-ordering Int64Index([-1, 2, 5, 10, 20], dtype='int64', name='pca_n')
pre Int64Index([-1, 2, 5, 10, 20], dtype='int64', name='pca_n')
post Int64Index([2, 5, 10, 20, -1], dtype='int64', name='pca_n')
index after re-ordering Int64Index([2, 5, 10, 20, -1], dtype='int64', name='pca_n')