having issue when converting a string into date frame to filter certain values

Question:

I am reading a csv file and would like to filter rows based on a column that is in date format (i..e, yyyy/mm/dd). However, I am receiving an error '<' not supported between instances of 'int' and 'datetime.date'. It seems like the conversion that I apply does not go through.

import pandas as pd 
import datetime

#Suppose I read the following from an Excel file. 
x = pd.DataFrame([
    "4/11/2022",
    "6/16/2022",
    "12/18/2021",
    "11/13/2020",
    "5/17/2023"
                 ], columns = ["My Date"])

#I first convert the values into date format
x["My Date"] = pd.to_datetime(x["My Date"], errors = 'coerce').dt.date

#Then, I go ahead and try to extract values between two dates.
startdate = pd.to_datetime("2021-1-1").date()
enddate = pd.to_datetime("2022-1-1").date()
x["My Date"].loc[startdate:enddate]

My guess is that the conversion that I do with x["My Date"] = pd.to_datetime(x["My Date"], errors = 'coerce').dt.date is not working properly and was wondering if someone can help me with that.

Asked By: sergey_208

||

Answers:

There is no problem with regards to the conversion, the issue is loc[] accesses index values, which are integers. In your example your index is [0,1,2,3,4] which correctly is of type int, so when you pass datetime objects to loc[] essentially it is comparing int and datetime giving you the error. Consider using:

x.set_index('My Date').loc[startdate:enddate]

However, if you’d like to slice between two dates, then I suggest you use:

x_filtered = x[(x['My Date'] > startdate) & (x['My Date'] < enddate)]
Answered By: Celius Stingher
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.