Why is this regex identification pattern not considering the b word boundary?
Question:
import re
input_text = "desde las 15:00 pm del 2002-11-01 hast16:00 hs"
#Try with word boundary b
input_text = re.sub(r"(d{1,2})[s|]:[s|](d{0,2})[s|]*(am|pm)b",
lambda m : print(m[1], m[2]),
input_text)
#I have tried with the ? , but I think this is only good for matching zero or more of the above group
input_text = re.sub(r"(d{1,2})[s|]:[s|](d{0,2})[s|]*(am|pm)?",
lambda m : print(m[1], m[2]),
input_text)
The objective is to capture time (hour minute) in hh:mm format that does not have the am or pm indication at the end. That is why it does not capture at 15:00 pm
but it should capture the 16:00
I was doing some tests to test the pattern that the search group in charge of the capture will establish, but I don’t get any results with any of this 2 regex patterns
Although to be sure that the pattern works, the output you should get should look something like this:
"16" # --> m[1]
"00" # --> m[2]
Why is the recognition of this search pattern failing? What should I change to make it work?
Answers:
If you want to find all timestamps not followed by either an am
or pm
marker, then use this regex:
d{1,2}:d{2}(?!s*(?:am|pm))
Sample script:
input_text = "desde las 15:00 pm del 2002-11-01 hast16:00 hs"
matches = re.findall(r'(d{1,2}):(d{2})(?!s*(?:am|pm))', input_text)
print(matches) # [('16', '00')]
import re
input_text = "desde las 15:00 pm del 2002-11-01 hast16:00 hs"
#Try with word boundary b
input_text = re.sub(r"(d{1,2})[s|]:[s|](d{0,2})[s|]*(am|pm)b",
lambda m : print(m[1], m[2]),
input_text)
#I have tried with the ? , but I think this is only good for matching zero or more of the above group
input_text = re.sub(r"(d{1,2})[s|]:[s|](d{0,2})[s|]*(am|pm)?",
lambda m : print(m[1], m[2]),
input_text)
The objective is to capture time (hour minute) in hh:mm format that does not have the am or pm indication at the end. That is why it does not capture at 15:00 pm
but it should capture the 16:00
I was doing some tests to test the pattern that the search group in charge of the capture will establish, but I don’t get any results with any of this 2 regex patterns
Although to be sure that the pattern works, the output you should get should look something like this:
"16" # --> m[1]
"00" # --> m[2]
Why is the recognition of this search pattern failing? What should I change to make it work?
If you want to find all timestamps not followed by either an am
or pm
marker, then use this regex:
d{1,2}:d{2}(?!s*(?:am|pm))
Sample script:
input_text = "desde las 15:00 pm del 2002-11-01 hast16:00 hs"
matches = re.findall(r'(d{1,2}):(d{2})(?!s*(?:am|pm))', input_text)
print(matches) # [('16', '00')]