Polars read_excel converting dates to strings

Question:

So I’m using the polars read_excel function and I’m reading in some dates from an excel file. However, when I read them in, they get formatted as a string with the format "mm-dd-yy". This is causing problems down the line as I’ll have a date of 01/01/1950 in the excel file (which gets converted to ’01-01-50′), but then when I go to use the date brought in by polars, my code thinks the date is 01/01/2050 instead since I don’t have the full year being brought in.

You can see in the print() statement below that even though I’m pulling in dates from 2050 and 1950, when brought in with polars, they both appear to be the same date in the DF. So is there a way to bring in the full-year value to prevent this and distinguish between the actual dates?

Code:

import polars as pl
    
extracted = pl.read_excel('file_name.xlsx')
print(extracted)

file_name.xlsx:

enter image description here

print(extracted) =

enter image description here

Asked By: Aaron

||

Answers:

Specify a 4-digit year in the format, using the dateformat in xlsx2csv_option:

extracted = pl.read_excel('testdate.xlsx', xlsx2csv_options={"dateformat": "%Y-%m-%d"})
print(extracted)
┌────────────┬─────────────┬─────────────┐
│ Hire Date  ┆ Hire Date 2 ┆ Hire Date 3 │
│ ---        ┆ ---         ┆ ---         │
│ str        ┆ str         ┆ str         │
╞════════════╪═════════════╪═════════════╡
│ 2005-02-05 ┆ 1950-01-02  ┆ 2050-01-02  │
│ 2005-02-05 ┆ 1950-01-03  ┆ 2050-01-03  │
│ 2020-04-06 ┆ 1950-01-04  ┆ 2050-01-04  │
│ 2008-12-20 ┆ 1950-01-05  ┆ 2050-01-05  │
│ 2009-03-12 ┆ 1950-01-06  ┆ 2050-01-06  │
│ 2018-05-26 ┆ 1950-01-07  ┆ 2050-01-07  │
│ 2018-05-26 ┆ 1950-01-08  ┆ 2050-01-08  │
└────────────┴─────────────┴─────────────┘
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.