Pandas: Filter datetime column using regex

Question:

I have a dataframe that has sports times as follows:

df:

1    05 Mar 2023 16:00
2    05 Mar 2023 16:00
3       05 Mar 2023 HT
4      05 Mar 2023 FIN
5    05 Mar 2023 90+ '
6    05 Mar 2023 18:00
7    05 Mar 2023 18:30

I am trying

df= df[~df.datetime.str.contains("'|w+$", regex=True, na=False)]

expected df:

df:

1    05 Mar 2023 16:00
2    05 Mar 2023 16:00
6    05 Mar 2023 18:00
7    05 Mar 2023 18:30

But I get an empty dataframe.

Whats the correct way in filtering this datetime column via regex?

Asked By: PyNoob

||

Answers:

Your issue is that w+$ matches any alphanumeric characters immediately before the end of the string, which all rows other than the one ending in ' match, since they either end with letters or digits. You should use [A-Za-z] instead. Note that you don’t need + as an entry that ends with multiple letters also ends with 1 letter:

df = df[~df.datetime.str.contains("'|[A-Za-z]$", regex=True, na=False)]

Output:

1    05 Mar 2023 16:00
2    05 Mar 2023 16:00
6    05 Mar 2023 18:00
7    05 Mar 2023 18:30
Answered By: Nick
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.