Sort dataframe by dates

Question:

I am importing an excel file in which a column is a date in the format dd/mm/yyyy.

When I import it from the excel file, I think it is understood as a string. I need to sort the whole DataFrame by date, so I perform this code:
import pandas as pd

import pandas as pd

dictionary = {
      "DATE" : ['12/02/2023', '02/01/2023', '02/01/2023', '10/02/2023'],    
      "CONCEPT" : ['Supermarket','Restaurant', 'Gas', 'Suscription'],
      "EUR" : [-150,-50,-45,-95]
          }
df = pd.DataFrame(dictionary)

df['DATE'] =  pd.to_datetime(df['DATE']).dt.strftime('%d/%m/%Y')

df = df.sort_values(by=['DATE'],axis=0, ascending=True)

If you perform this example, you will see it works perfectly fine, as the first-row date, 12/02/2023, is sorted in the last position. However, when I am using my real excel file, this date is interpreted as the 2nd of December 2023. Moreover, it sorts the date column as strings as not as dates; therefore, 31/01/2023 goes after 28/02/2023.

How could I solve this problem?

Asked By: nekovolta

||

Answers:

You need to use the dayfirst argument to to_datetime to make the conversion work as expected. Using your sample data:

pd.to_datetime(df['DATE'])
# 0   2023-12-02
# 1   2023-02-01
# 2   2023-02-01
# 3   2023-10-02
pd.to_datetime(df['DATE'], dayfirst=True)
# 0   2023-02-12
# 1   2023-01-02
# 2   2023-01-02
# 3   2023-02-10

Your DATE column is then being sorted as strings because you have converted it to a string using strftime. You should sort first and then convert to a string just for display purposes.

df['DATE'] =  pd.to_datetime(df['DATE'], dayfirst=True)
df = df.sort_values(by=['DATE'],axis=0, ascending=True)
df['DATE'] = df['DATE'].dt.strftime('%d/%m/%Y')

Output

         DATE      CONCEPT  EUR
1  02/01/2023   Restaurant  -50
2  02/01/2023          Gas  -45
3  10/02/2023  Suscription  -95
0  12/02/2023  Supermarket -150

If you need to reset the index, just use reset_index:

df = df.reset_index(drop=True)

Output:

         DATE      CONCEPT  EUR
0  02/01/2023   Restaurant  -50
1  02/01/2023          Gas  -45
2  10/02/2023  Suscription  -95
3  12/02/2023  Supermarket -150
Answered By: Nick

The ideal format to store dates in, is ISO format: yyyy-mm-dd

Change the string format to:

dt.strftime('%Y-%m-%d')

This is not only an international standard, but also sorts neatly in date order, without any further hacks.

You can use hacks to display it on screen, since different countries have different conventions for ordering items, e.g. dd-mm-yyyy for us in Europe; mm-dd-yyyy in North America and yyyy-mm-dd for China.

Answered By: Eureka

You can use the parse_dates parameter of the read_excel() function to ensure that the date column is parsed as a date during import. You can also specify the date format explicitly using the date_format parameter:

import pandas as pd
df = pd.read_excel('your_file.xlsx', parse_dates=['DATE'], date_format='%d/%m/%Y')
df = df.sort_values(by=['DATE'], axis=0, ascending=True)

Alternatively, you can convert the date column to a datetime object using pd.to_datetime() with the format parameter to specify the input date format:

import pandas as pd

dictionary = {
  "DATE" : ['12/02/2023', '02/01/2023', '02/01/2023', '10/02/2023'],    
  "CONCEPT" : ['Supermarket','Restaurant', 'Gas', 'Suscription'],
  "EUR" : [-150,-50,-45,-95]
      }
df = pd.DataFrame(dictionary)
df['DATE'] = pd.to_datetime(df['DATE'], format='%d/%m/%Y')
df = df.sort_values(by=['DATE'],axis=0, ascending=True)
Answered By: Ukeme Edet
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.