Groupby with no operation to dataframe / How to sort on total per group?

Question:

I’m trying to sort a dataframe grouped by a column but sorting by the sum of each group:

I have a dataframe with the columns ["make", "model", "year", "fuel", "variant", "num registered cars"] (cars info). My goal is to sort for each year every make, model by the sum of the number of registered cars but grouping by ‘year’, ‘make’, ‘model’, ‘fuel’. Here is the code I already implemented:

sorted_table = simix_table.groupby(["make", "model", "year"])['matriculas'].sum().reset_index(name="matriculas").sort_values(["year", "matriculas"], ascending=[True, False]).reset_index(drop=True)

How can I sort my original dataframe using the order in my new sorted_table grouping by ‘year’, ‘make’, ‘model’, ‘fuel’?

the output has to be as folows:

Result output

This example I used the grupby operation to ilustrate an example. In this case Opel Corsa comes first because the sum of opels corsa in the 2013 (gasolina + diesel+ Gas Licuado..) is the biggest this year.

Example:

cars_data = [[2013, 'Corsa', 'Opel', 'Gas', 'variant corsa', 'version corsa', 6282],
[2014, 'Corsa', 'Opel', 'Gas', 'variant corsa', 'version corsa', 2210],
[2013, 'Corsa', 'Opel', 'Diesel', 'variant corsa 2', 'version corsa 2', 1432],
[2014, 'Corsa', 'Opel', 'Diesel', 'variant corsa 2', 'version corsa 2', 1432],
[2013, 'Polo', 'Volkswagen', 'Gas', 'variant polo', 'version polo', 5316],
[2014, 'Polo', 'Volkswagen', 'Gas', 'variant polo', 'version polo', 5224],
[2013, 'Polo', 'Volkswagen', 'Electric', 'variant polo 2', 'version polo 2', 3126],
[2014, 'Polo', 'Volkswagen', 'Electric', 'variant polo 2', 'version polo 2', 1513],
[2013, 'Fiesta', 'Ford', 'Gas', 'variant fiesta', 'version fiesta', 1351],
[2014, 'Fiesta', 'Ford', 'Gas', 'variant fiesta', 'version fiesta', 4351],
[2013, 'Fiesta', 'Ford', 'Diesel', 'variant fiesta 2', 'version fiesta 2', 4523],
[2014, 'Fiesta', 'Ford', 'Diesel', 'variant fiesta 2', 'version fiesta 2', 3523]]

columns = ['year', 'model', 'make', 'fuel', 'variant', 'version', 'num_registered_cars']

cars_df = pd.DataFrame(cars_data, columns=columns)

Output of the group by operation:

cars_df.groupby(["make", "model", "year"])['num_registered_cars'].sum().reset_index(name="num_registered_cars").sort_values(["year", "num_registered_cars"], ascending=[True, False]).reset_index(drop=True)

Order by result

So the final result of the sorted dataframe has to be a dataframe with this rows:

[2013, 'Polo', 'Volkswagen', 'Gas', 'variant polo', 'version polo', 5316],
[2013, 'Polo', 'Volkswagen', 'Electric', 'variant polo 2', 'version polo 2', 3126],
[2013, 'Corsa', 'Opel', 'Gas', 'variant corsa', 'version corsa', 6282],
[2013, 'Corsa', 'Opel', 'Diesel', 'variant corsa 2', 'version corsa 2', 1432],
[2013, 'Fiesta', 'Ford', 'Diesel', 'variant fiesta 2', 'version fiesta 2', 4523],
[2013, 'Fiesta', 'Ford', 'Gas', 'variant fiesta', 'version fiesta', 1351],
[2014, 'Fiesta', 'Ford', 'Gas', 'variant fiesta', 'version fiesta', 4351],
[2014, 'Fiesta', 'Ford', 'Diesel', 'variant fiesta 2', 'version fiesta 2', 3523],
[2014, 'Polo', 'Volkswagen', 'Gas', 'variant polo', 'version polo', 5224],
[2014, 'Polo', 'Volkswagen', 'Electric', 'variant polo 2', 'version polo 2', 1513]
[2014, 'Corsa', 'Opel', 'Gas', 'variant corsa', 'version corsa', 2210],
[2014, 'Corsa', 'Opel', 'Diesel', 'variant corsa 2', 'version corsa 2', 1432]

The 2013 starts with the Volkswagen Polo because 5316 + 3126 = 8442 greater than opel corsa 6282 + 1432 = 7714 …

The 2014 starts with the Ford Fiesta because 4351 + 3523 = 7874 greater than Volkswagen Polo 5224 + 1513 = 6737 …

How can I do this type of sorting?

Asked By: Nacho R.

||

Answers:

You can use groupby.transform to add a temporary column, sort, then drop:

out = (cars_df
 .assign(total=cars_df.groupby(['make', 'model', 'year'])
               ['num_registered_cars'].transform('sum'))
 .sort_values(by=['year', 'total'], ascending=[True, False])
# or, for further sorting
# .sort_values(by=['year', 'total', 'num_registered_cars'],
#              ascending=[True, False, False])
 .drop(columns='total')
)

output:

    year   model        make      fuel           variant           version  num_registered_cars
4   2013    Polo  Volkswagen       Gas      variant polo      version polo                 5316
6   2013    Polo  Volkswagen  Electric    variant polo 2    version polo 2                 3126
0   2013   Corsa        Opel       Gas     variant corsa     version corsa                 6282
2   2013   Corsa        Opel    Diesel   variant corsa 2   version corsa 2                 1432
8   2013  Fiesta        Ford       Gas    variant fiesta    version fiesta                 1351
10  2013  Fiesta        Ford    Diesel  variant fiesta 2  version fiesta 2                 4523
9   2014  Fiesta        Ford       Gas    variant fiesta    version fiesta                 4351
11  2014  Fiesta        Ford    Diesel  variant fiesta 2  version fiesta 2                 3523
5   2014    Polo  Volkswagen       Gas      variant polo      version polo                 5224
7   2014    Polo  Volkswagen  Electric    variant polo 2    version polo 2                 1513
1   2014   Corsa        Opel       Gas     variant corsa     version corsa                 2210
3   2014   Corsa        Opel    Diesel   variant corsa 2   version corsa 2                 1432
Answered By: mozway
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.