sort_values() with 'key' to sort a column of tuples in a dataframe

Question

I have the following dataframe:

df = pd.DataFrame({'Params': {0: (400, 30),
  1: (2000, 10),
  2: (1200, 10),
  3: (2000, 30),
  4: (1600, None)},
 'mean_test_score': {0: -0.6197478578718253,
  1: -0.6164605619489576,
  2: -0.6229674626212879,
  3: -0.7963084775995496,
  4: -0.7854265341671137}})

I wish to sort it according to the first element of the tuples in the first column.

First column of the desired output:

{'Params': {0: (400, 30),
  2: (1200, 10),
  4: (1600, 10),
  1: (2000, 10),
  3: (2000, 30),

I have tried to use df.sort_values(by=('Params'), key=lambda x:x[0]) like I would do with a list and .sort but I get the following value error: ValueError: User-provided key function must not change the shape of the array.

I have looked at the documentation of sort_values() but it did not help much about why lambda does not work.

EDIT: Following @DeepSpace suggestion, I can’t do
df.sort_values(by='Params') gives '<' not supported between instances of 'NoneType' and 'int'

Asked By: vpvinc

||

Source

Answer 1

The document of sort_values() says

key should expect a Series and return a Series with the same shape as the input.

In df.sort_values(by=('Params'), key=lambda x:x[0]), the x is actually the Params column. By accessing x with x[0], you are returning the first element of x Series, which is not the same shape as input Series. Thus gives you the error.

If you want to sort by the first element of tuple, you can do

df.sort_values(by='Params', key=lambda col: col.map(lambda x: x[0]))
# or
df.sort_values(by='Params', key=lambda col: col.str[0])

Answered By: Ynjxsjmh

sort_values() with 'key' to sort a column of tuples in a dataframe

Question:

Answers: