How to fix date format issues while reading xlsx files using polars?
Question:
I have an excel file with an extension .xlsx. and it has also a field called date_of_birth, this filed is having years from 1860,1870 to till 2000 as below
Here is a command that I used for loading an excel:
df_pl = pl.read_excel('Data_Set_14_Data.xlsx',
read_csv_options={'ignore_errors':True,'infer_schema_length':0,'parse_dates':True})
On running this it gives an error:
XlsxValueError: Error: potential invalid date format.
How to ignore/Fix this error while reading the file so that I would get the data as it is in data frame. Is there any work around for this ?
Answers:
Well, the error is due to strftime
function which do not support pre-1900 years.
Probably polars is using that and it causes the problem.
You may try not parsing the dates on polar function; so that you can read the CSV file (and dates stay as String). And when you need to parsing the dates; just use strptime
like:
datetime.datetime.strptime("1800/04/10", "%Y/%m/%d")
Also, you may try to use with_column
method of polars framework (I couldn’t test it yet; will update after trying it):
df_pl = pl.read_excel('Data_Set_14_Data.xlsx',
read_csv_options={'ignore_errors':True,'infer_schema_length':0,'parse_dates':True}).with_column(pl.col('<last_col_name>').str.strptime(pl.Date, '%m/%d/%Y'))
I have an excel file with an extension .xlsx. and it has also a field called date_of_birth, this filed is having years from 1860,1870 to till 2000 as below
Here is a command that I used for loading an excel:
df_pl = pl.read_excel('Data_Set_14_Data.xlsx',
read_csv_options={'ignore_errors':True,'infer_schema_length':0,'parse_dates':True})
On running this it gives an error:
XlsxValueError: Error: potential invalid date format.
How to ignore/Fix this error while reading the file so that I would get the data as it is in data frame. Is there any work around for this ?
Well, the error is due to strftime
function which do not support pre-1900 years.
Probably polars is using that and it causes the problem.
You may try not parsing the dates on polar function; so that you can read the CSV file (and dates stay as String). And when you need to parsing the dates; just use strptime
like:
datetime.datetime.strptime("1800/04/10", "%Y/%m/%d")
Also, you may try to use with_column
method of polars framework (I couldn’t test it yet; will update after trying it):
df_pl = pl.read_excel('Data_Set_14_Data.xlsx',
read_csv_options={'ignore_errors':True,'infer_schema_length':0,'parse_dates':True}).with_column(pl.col('<last_col_name>').str.strptime(pl.Date, '%m/%d/%Y'))