How to reorder data from a character string with re.sub only in cases where it detects a certain regex pattern,amd not in other cases

Question:

import re

#example
input_text = 'Alrededor de las 00:16 am o las 23:30 pm , quizas cerca del 2022_-_02_-_18 llega el avion, pero no a las (2022_-_02_-_18 00:16 am), de esos hay dos (22)'


identify_time_regex = r"(?P<hh>d{2}):(?P<mm>d{2})[s|]*(?P<am_or_pm>(?:am|pm))"

restructuring_structure_00 = r"(g<hh>----g<mm>----g<am_or_pm>)"

#replacement
input_text = re.sub(identify_regex_01_a, restructuring_structure_00, input_text)


print(repr(input_text)) # --> output

I have to change things in this regex identify_time_regex so that it extracts the hour numbers but only if it is inside a structure like the following (2022_-_02_-_18 00:16 am), which can be generalized as follows:

r"(d*_-_d{2}_-_d{2}) " + identify_time_regex

The output that I need, you can see that only those hours were modified where there was no date before:

input_text = 'Alrededor de las 00----16----am o las 23----30----pm , quizas cerca del 2022_-_02_-_18 llega el avion, pero no a las (2022_-_02_-_18 00:16 am), de esos hay dos (22)'

Answers:

You can use

import re

input_text = 'Alrededor de las 00:16 am o las 23:30 pm , quizas cerca del 2022_-_02_-_18 llega el avion, pero no a las (2022_-_02_-_18 00:16 am), de esos hay dos (22)'
identify_time_regex = r"(bd{4}_-_d{2}_-_d{2}s+)?(?P<hh>d{2}):(?P<mm>d{2})[s|]*(?P<am_or_pm>[ap]m)"
restructuring_structure_00 = lambda x: x.group() if x.group(1) else fr"{x.group('hh')}----{x.group('mm')}----{x.group('am_or_pm')}"
input_text = re.sub(identify_time_regex, restructuring_structure_00, input_text)
print(input_text)
# Alrededor de las 00----16----am o las 23----30----pm , quizas cerca del 2022_-_02_-_18 llega el avion, pero no a las (2022_-_02_-_18 00:16 am), de esos hay dos (22)

See the Python demo.

The logic is the following: if the (bd{4}_-_d{2}_-_d{2}s+)? optional capturing group matches, the replacement is the whole match (i.e. no replacement occurs), and if it does not, your replacement takes place.

The restructuring_structure_00 must be a lambda expression since the match structure needs to be evaluated before replacement.

The bd{4}_-_d{2}_-_d{2}s+ pattern matches a word boundary, four digits, _-_, two digits, _-_, two digits, and one or more whitespaces.

Answered By: Wiktor Stribiżew