sort_values() with 'key' to sort a column of tuples in a dataframe
Question:
I have the following dataframe:
df = pd.DataFrame({'Params': {0: (400, 30),
1: (2000, 10),
2: (1200, 10),
3: (2000, 30),
4: (1600, None)},
'mean_test_score': {0: -0.6197478578718253,
1: -0.6164605619489576,
2: -0.6229674626212879,
3: -0.7963084775995496,
4: -0.7854265341671137}})
I wish to sort it according to the first element of the tuples in the first column.
First column of the desired output:
{'Params': {0: (400, 30),
2: (1200, 10),
4: (1600, 10),
1: (2000, 10),
3: (2000, 30),
I have tried to use df.sort_values(by=('Params'), key=lambda x:x[0])
like I would do with a list and .sort
but I get the following value error: ValueError: User-provided
key function must not change the shape of the array.
I have looked at the documentation of sort_values()
but it did not help much about why lambda does not work.
EDIT: Following @DeepSpace suggestion, I can’t do
df.sort_values(by='Params')
gives '<' not supported between instances of 'NoneType' and 'int'
Answers:
The document of sort_values() says
key
should expect a Series
and return a Series with the same shape as the input.
In df.sort_values(by=('Params'), key=lambda x:x[0])
, the x
is actually the Params
column. By accessing x
with x[0]
, you are returning the first element of x
Series, which is not the same shape as input Series. Thus gives you the error.
If you want to sort by the first element of tuple, you can do
df.sort_values(by='Params', key=lambda col: col.map(lambda x: x[0]))
# or
df.sort_values(by='Params', key=lambda col: col.str[0])
I have the following dataframe:
df = pd.DataFrame({'Params': {0: (400, 30),
1: (2000, 10),
2: (1200, 10),
3: (2000, 30),
4: (1600, None)},
'mean_test_score': {0: -0.6197478578718253,
1: -0.6164605619489576,
2: -0.6229674626212879,
3: -0.7963084775995496,
4: -0.7854265341671137}})
I wish to sort it according to the first element of the tuples in the first column.
First column of the desired output:
{'Params': {0: (400, 30),
2: (1200, 10),
4: (1600, 10),
1: (2000, 10),
3: (2000, 30),
I have tried to use df.sort_values(by=('Params'), key=lambda x:x[0])
like I would do with a list and .sort
but I get the following value error: ValueError: User-provided
key function must not change the shape of the array.
I have looked at the documentation of sort_values()
but it did not help much about why lambda does not work.
EDIT: Following @DeepSpace suggestion, I can’t do
df.sort_values(by='Params')
gives '<' not supported between instances of 'NoneType' and 'int'
The document of sort_values() says
key
should expect aSeries
and return a Series with the same shape as the input.
In df.sort_values(by=('Params'), key=lambda x:x[0])
, the x
is actually the Params
column. By accessing x
with x[0]
, you are returning the first element of x
Series, which is not the same shape as input Series. Thus gives you the error.
If you want to sort by the first element of tuple, you can do
df.sort_values(by='Params', key=lambda col: col.map(lambda x: x[0]))
# or
df.sort_values(by='Params', key=lambda col: col.str[0])