Pandas: Filter datetime column using regex
Question:
I have a dataframe that has sports times as follows:
df:
1 05 Mar 2023 16:00
2 05 Mar 2023 16:00
3 05 Mar 2023 HT
4 05 Mar 2023 FIN
5 05 Mar 2023 90+ '
6 05 Mar 2023 18:00
7 05 Mar 2023 18:30
I am trying
df= df[~df.datetime.str.contains("'|w+$", regex=True, na=False)
]
expected df:
df:
1 05 Mar 2023 16:00
2 05 Mar 2023 16:00
6 05 Mar 2023 18:00
7 05 Mar 2023 18:30
But I get an empty dataframe.
Whats the correct way in filtering this datetime column via regex?
Answers:
Your issue is that w+$
matches any alphanumeric characters immediately before the end of the string, which all rows other than the one ending in '
match, since they either end with letters or digits. You should use [A-Za-z]
instead. Note that you don’t need +
as an entry that ends with multiple letters also ends with 1 letter:
df = df[~df.datetime.str.contains("'|[A-Za-z]$", regex=True, na=False)]
Output:
1 05 Mar 2023 16:00
2 05 Mar 2023 16:00
6 05 Mar 2023 18:00
7 05 Mar 2023 18:30
I have a dataframe that has sports times as follows:
df:
1 05 Mar 2023 16:00
2 05 Mar 2023 16:00
3 05 Mar 2023 HT
4 05 Mar 2023 FIN
5 05 Mar 2023 90+ '
6 05 Mar 2023 18:00
7 05 Mar 2023 18:30
I am trying
df= df[~df.datetime.str.contains("'|w+$", regex=True, na=False)
]
expected df:
df:
1 05 Mar 2023 16:00
2 05 Mar 2023 16:00
6 05 Mar 2023 18:00
7 05 Mar 2023 18:30
But I get an empty dataframe.
Whats the correct way in filtering this datetime column via regex?
Your issue is that w+$
matches any alphanumeric characters immediately before the end of the string, which all rows other than the one ending in '
match, since they either end with letters or digits. You should use [A-Za-z]
instead. Note that you don’t need +
as an entry that ends with multiple letters also ends with 1 letter:
df = df[~df.datetime.str.contains("'|[A-Za-z]$", regex=True, na=False)]
Output:
1 05 Mar 2023 16:00
2 05 Mar 2023 16:00
6 05 Mar 2023 18:00
7 05 Mar 2023 18:30