Pandas scoring system: sort_values

Question:

So my task is pretty simple.
We have a .CSV file with the results of the decathlon competition. They need to be changed into tasks, ranked and assigned places. Everything works fine apart from one line:

modified_data.sort_values(by=["Total points"])

Why doesn’t it sort the result for me?

My work below:

import pandas as pd
import numpy as np

# Modification of CSV file data by adding header names and splitting data
data = pd.read_csv("static/data/Decathlon.csv", delimiter=';', header=None)
data = data.assign(Total_points=0)
data = data.assign(Ranking=0)
header_list = ['Player', '100 metres', 'Long jump', 'Short put', 'High jump', '400 metres', '110 metres hurdles',
               'Discus throw', 'Pole vault', 'Javelin throw', '1500 metres', 'Total points', 'Ranking']
data.to_csv("static/data/Decathlon_modified.csv", header=header_list, index=False)
modified_data = pd.read_csv("static/data/Decathlon_modified.csv", delimiter=',')
print(modified_data)

# Conversion of CSV data into the necessary units of measurement,
# so that it can be applied to the calculation of the resulting formulas:
temporary_list = []
changed_list = []
for time in modified_data["1500 metres"]:
    temporary_list.append(time.split('.'))
for new_value in temporary_list:
    value = (int(new_value[0]) * 60) + int(new_value[1]) + int(new_value[2]) * 0.01
    changed_list.append(value)
for index, new_value in enumerate(changed_list):
    modified_data.loc[index, "1500 metres"] = new_value

# Results are calculated according to formulas:
# Points = INT(A(B — P)C) for track events (faster time produces a higher score)
modified_data["100 metres"] = round((25.4347 * (18 - modified_data["100 metres"]) ** 1.81))
modified_data["400 metres"] = round(1.53775 * (82 - modified_data["400 metres"]) ** 1.81)
modified_data["110 metres hurdles"] = round(5.74352 * (28.5 - modified_data["110 metres hurdles"]) ** 1.92)
modified_data["1500 metres"] = round(0.03768 * (480 - modified_data["1500 metres"].astype(float)) ** 1.85)

# Points = INT(A(P — B)C) for field events (greater distance or height produces a higher score)
modified_data["Long jump"] = round(0.14354 * ((modified_data["Long jump"] * 100) - 220) ** 1.4)
modified_data["Short put"] = round(51.39 * (modified_data["Short put"] - 1.5) ** 1.05)
modified_data["High jump"] = round(0.8465 * ((modified_data["High jump"] * 100) - 75) ** 1.42)
modified_data["Discus throw"] = round(12.91 * (modified_data["Discus throw"] - 4) ** 1.1)
modified_data["Pole vault"] = round(0.2797 * (modified_data["Pole vault"] * 100 - 100) ** 1.35)
modified_data["Javelin throw"] = round(10.14 * (modified_data["Javelin throw"] - 7) ** 1.08)

# Total calculation and rewriting of each player's result in a common table
total_points = modified_data["100 metres"] + modified_data["Long jump"] + modified_data["Short put"] + 
               modified_data["High jump"] + modified_data["400 metres"] + modified_data["110 metres hurdles"] + 
               modified_data["Discus throw"] + modified_data["Pole vault"] + modified_data["Javelin throw"] 
               + modified_data["1500 metres"]
for index, new_value in enumerate(total_points):
    modified_data.loc[index, "Total points"] = new_value


# Ranking according to collected points
modified_data.reset_index(drop=False)
modified_data.index = np.arange(1, len(modified_data) + 1)

# TODO
modified_data.sort_values(by=["Total points"])
print(modified_data)

modified_data["Ranking"] = modified_data["Total points"]. 
    apply(lambda score:
          modified_data.index[modified_data["Total points"] == score].astype(str)).str.join("-")
print(modified_data)

modified_data.to_json(r'static/data/Decathlon.json')

I tried:

modified_data["Total points"] = modified_data["Total points"].astype(int)
modified_data.sort_values(by=["Total points"])

AND

modified_data["Total points"] = modified_data["Total points"].astype(int)
modified_data.sort_values('Total points')

Also this:
(https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html)

Answers:

You should use inplace = True or assign the dataframe to the same variable:

modified_data.sort_values(by=["Total points"], inplace=True)
# Or alternatively
modified_data = modified_data.sort_values(by=["Total points"])
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.