Why is this regex identification pattern not considering the b word boundary?

Question

import re

input_text = "desde las 15:00 pm  del 2002-11-01 hast16:00 hs"

#Try with word boundary b
input_text = re.sub(r"(d{1,2})[s|]:[s|](d{0,2})[s|]*(am|pm)b",
                   lambda m : print(m[1], m[2]),
                   input_text)

#I have tried with the ? , but I think this is only good for matching zero or more of the above group
input_text = re.sub(r"(d{1,2})[s|]:[s|](d{0,2})[s|]*(am|pm)?",
                   lambda m : print(m[1], m[2]),
                   input_text)

The objective is to capture time (hour minute) in hh:mm format that does not have the am or pm indication at the end. That is why it does not capture at 15:00 pm but it should capture the 16:00

I was doing some tests to test the pattern that the search group in charge of the capture will establish, but I don’t get any results with any of this 2 regex patterns

Although to be sure that the pattern works, the output you should get should look something like this:

"16" # --> m[1]
"00" # --> m[2]

Why is the recognition of this search pattern failing? What should I change to make it work?

Asked By: Matias Nicolas Rodriguez

||

Source

Answer 1

If you want to find all timestamps not followed by either an am or pm marker, then use this regex:

d{1,2}:d{2}(?!s*(?:am|pm))

Sample script:

input_text = "desde las 15:00 pm  del 2002-11-01 hast16:00 hs"
matches = re.findall(r'(d{1,2}):(d{2})(?!s*(?:am|pm))', input_text)
print(matches)  # [('16', '00')]

Answered By: Tim Biegeleisen

Why is this regex identification pattern not considering the b word boundary?

Question:

Answers: