How to perform replacements on a string only if it is not preceded and followed by a substring?

Question:

import re, datetime

input_text = "Alrededor de las 00:16 am o las 23:30 pm 2022_-_02_-_18 , quizas cerca del 2022_-_02_-_18 llega el avion, pero no (2022_-_02_-_18 20:16 pm) a las (2022_-_02_-_18 00:16 am), de esos hay dos (22)"

print(repr(input_text)) # --> output


input_date_structure = r"(?P<year>d*)_-_(?P<month>d{2})_-_(?P<startDay>d{2})"

identify_only_date_regex_00 = input_date_structure + r"[s|]*" + r"(bd{2}:d{2}[s|]*[ap]m)?" #to identify if there is a time after  the date
identify_only_date_regex_01 = r"(bd{2}:d{2}[s|]*[ap]m)?" + r"[s|]*" + input_date_structure #to identify if there is a time before the date


date_restructuring_structure = r"g<year>_-_g<month>_-_g<startDay>"
restructuring_only_date = lambda x: x.group() if x.group(1) else "(" + fr"{x.expand(date_restructuring_structure)}" + " 00:00 am)"

#do the replace with re.sub() method and the regex patterns instructions
input_text = re.sub(identify_only_date_regex_00, restructuring_only_date, input_text)
input_text = re.sub(identify_only_date_regex_01, restructuring_only_date, input_text)

#print output
print(repr(input_text)) # --> output

The wrong output that I get:

'Alrededor de las 00:16 am o las 23:30 pm 2022_-_02_-_18 , quizas cerca del(2022_-_02_-_18 00:00 am) llega el avion, pero no ((2022_-_02_-_18 00:00 am) 20:16 pm) a las ((2022_-_02_-_18 00:00 am) 00:16 am), de esos hay dos (22)'

The correct output, where only dates that were not preceded or followed by times hh:mm am or pm, indicated as r"(d{2}:d{2}[ s|]*[ap]m)?", are modified:

"Alrededor de las 00:16 am o las 23:30 pm 2022_-_02_-_18 , quizas cerca del (2022_-_02_-_18 00:00 am) llega el avion, pero no (2022_-_02_-_18 20:16 pm) a las (2022_-_02_-_18 00:16 am), de esos hay dos (22)"

I don’t understand why it’s failing, at least I think I’m conditioning my regex correctly using b and ?

Not replace
"sdsdds 2022_-_02_-_18 00:16 am sdsddssd2

Not replace
"sdsdsd 00:16 am 2022_-_02_-_18 sdsdsd"

replace
"sdsdds 2022_-_02_-_18 sdsdsd"

Answers:

You can merge the two regexps to form an expression like (Group1)?(...)(Group5)? (5 is due to the fact you have three capturing groups in the middle part, and even though they are named capturing groups, they are still assigned a numeric ID), and then check if Group 1 or 5 is matched inside the lambda:

import re, datetime

input_text = "Alrededor de las 00:16 am o las 23:30 pm 2022_-_02_-_18 , quizas cerca del 2022_-_02_-_18 llega el avion, pero no (2022_-_02_-_18 20:16 pm) a las (2022_-_02_-_18 00:16 am), de esos hay dos (22)"

input_date_structure = r"(?P<year>d*)_-_(?P<month>d{2})_-_(?P<startDay>d{2})"

identify_only_date_regex = r"(bd{2}:d{2}[s|]*[ap]m)?[s|]*" + input_date_structure + r"[s|]*(bd{2}:d{2}[s|]*[ap]m)?"

date_restructuring_structure = r"g<year>_-_g<month>_-_g<startDay>"
restructuring_only_date = lambda x: x.group() if x.group(1) or x.group(5) else "(" + x.expand(date_restructuring_structure) + " 00:00 am)"

input_text = re.sub(identify_only_date_regex, restructuring_only_date, input_text)
print(repr(input_text)) # --> output

See the Python demo.

The output is

Alrededor de las 00:16 am o las 23:30 pm 2022_-_02_-_18 , quizas cerca del(2022_-_02_-_18 00:00 am)llega el avion, pero no (2022_-_02_-_18 20:16 pm) a las (2022_-_02_-_18 00:16 am), de esos hay dos (22)
Answered By: Wiktor Stribiżew