sort_values() with key in Python
Question:
I have a dataframe where the column names are times (0:00, 0:10, 0:20, …, 23:50). Right now, they’re sorted in a string order (so 0:00 is first and 9:50 is last) but I want to sort them after time (so 0:00 is first and 23:50 is last).
If time is a column, you can use
df = df.sort(columns='Time',key=float)
But 1) that only works if time is a column itself, rather than the column names, and 2) sort() is deprecated so I try to abstain from using it.
I’m trying to use
df = df.sort_index(axis = 1)
but since the column names are in string format, they get sorted according to a string key. I’ve tried
df = df.sort_index(key=float, axis=1)
but that gives an error message:
Traceback (most recent call last):
File "<ipython-input-112-5663f277da66>", line 1, in <module>
df.sort_index(key=float, axis=1)
TypeError: sort_index() got an unexpected keyword argument 'key'
Does anyone have ideas for how to fix this? So annoying that sort_index() – and sort_values() for that matter – don’t have the key argument!!
Answers:
Just prepend a leading zero to one-digit hours. This should be the simplest solution as you can simply sort lexically then.
E.g. 5:30 -> 05:30.
Try sorting the columns with the sorted
builtin function and passing the output to the dataframe for indexing. The following should serve as a working example:
import pandas as pd
records = [(2, 33, 23, 45), (3, 4, 2, 4), (4, 5, 7, 19), (4, 6, 71, 2)]
df = pd.DataFrame.from_records(records, columns = ('0:00', '23:40', '12:30', '11:23'))
df
# 0:00 23:40 12:30 11:23
# 0 2 33 23 45
# 1 3 4 2 4
# 2 4 5 7 19
# 3 4 6 71 2
df[sorted(df,key=pd.to_datetime)]
# 0:00 11:23 12:30 23:40
# 0 2 45 23 33
# 1 3 4 2 4
# 2 4 19 7 5
# 3 4 2 71 6
I hope this helps
Here is a working demo, which implements @MartinKrämer’s idea:
import re
In [259]: df
Out[259]:
23:40 0:00 19:19 12:30 09:00 11:23
0 33 2 1 23 12 45
1 4 3 1 2 13 4
2 5 4 1 7 14 19
3 6 4 1 71 14 2
In [260]: df.rename(columns=lambda x: re.sub(r'^(d{1}):', r'01:', x)).sort_index(axis=1)
Out[260]:
00:00 09:00 11:23 12:30 19:19 23:40
0 2 12 45 23 1 33
1 3 13 4 2 1 4
2 4 14 19 7 1 5
3 4 14 2 71 1 6
I know this question is a few years old, but since it’s the top Google result for this question, I wanted to provide the root cause of the error.
The ‘key’ argument was added to sort_values in version 1.1.0. See the note in the documentation linked below.
This feature will very like work as you intended if you upgrade to 1.1.0 or higher.
It seems sort_values()
with key may not work. However, sort_index()
with key can do the thing.
Referring Abdou
enter image description here
I have a dataframe where the column names are times (0:00, 0:10, 0:20, …, 23:50). Right now, they’re sorted in a string order (so 0:00 is first and 9:50 is last) but I want to sort them after time (so 0:00 is first and 23:50 is last).
If time is a column, you can use
df = df.sort(columns='Time',key=float)
But 1) that only works if time is a column itself, rather than the column names, and 2) sort() is deprecated so I try to abstain from using it.
I’m trying to use
df = df.sort_index(axis = 1)
but since the column names are in string format, they get sorted according to a string key. I’ve tried
df = df.sort_index(key=float, axis=1)
but that gives an error message:
Traceback (most recent call last):
File "<ipython-input-112-5663f277da66>", line 1, in <module>
df.sort_index(key=float, axis=1)
TypeError: sort_index() got an unexpected keyword argument 'key'
Does anyone have ideas for how to fix this? So annoying that sort_index() – and sort_values() for that matter – don’t have the key argument!!
Just prepend a leading zero to one-digit hours. This should be the simplest solution as you can simply sort lexically then.
E.g. 5:30 -> 05:30.
Try sorting the columns with the sorted
builtin function and passing the output to the dataframe for indexing. The following should serve as a working example:
import pandas as pd
records = [(2, 33, 23, 45), (3, 4, 2, 4), (4, 5, 7, 19), (4, 6, 71, 2)]
df = pd.DataFrame.from_records(records, columns = ('0:00', '23:40', '12:30', '11:23'))
df
# 0:00 23:40 12:30 11:23
# 0 2 33 23 45
# 1 3 4 2 4
# 2 4 5 7 19
# 3 4 6 71 2
df[sorted(df,key=pd.to_datetime)]
# 0:00 11:23 12:30 23:40
# 0 2 45 23 33
# 1 3 4 2 4
# 2 4 19 7 5
# 3 4 2 71 6
I hope this helps
Here is a working demo, which implements @MartinKrämer’s idea:
import re
In [259]: df
Out[259]:
23:40 0:00 19:19 12:30 09:00 11:23
0 33 2 1 23 12 45
1 4 3 1 2 13 4
2 5 4 1 7 14 19
3 6 4 1 71 14 2
In [260]: df.rename(columns=lambda x: re.sub(r'^(d{1}):', r'01:', x)).sort_index(axis=1)
Out[260]:
00:00 09:00 11:23 12:30 19:19 23:40
0 2 12 45 23 1 33
1 3 13 4 2 1 4
2 4 14 19 7 1 5
3 4 14 2 71 1 6
I know this question is a few years old, but since it’s the top Google result for this question, I wanted to provide the root cause of the error.
The ‘key’ argument was added to sort_values in version 1.1.0. See the note in the documentation linked below.
This feature will very like work as you intended if you upgrade to 1.1.0 or higher.
It seems sort_values()
with key may not work. However, sort_index()
with key can do the thing.
Referring Abdou
enter image description here