Capture substring and send it to a function that modifies it and can replace it in this string
Question:
import re
def one_day_or_another_day_relative_to_a_date_func(input_text):
#print(repr(input_text)) #print what you have captured, and you should replace
return "aaaaaaaa"
def identify(input_text):
some_text = r"(?:(?!.s*?n)[^;])*"
date_capture_pattern = r"([12]d{3}-[01]d-[0-3]d)(D*?)"
previous_days = r"(d+)s*(?:dias|dia)s*(?:antes|previos|previo|antes|atrás|atras)s*"
after_days = r"(d+)s*(?:dias|dia)s*(?:después|despues|luego)s*"
n_patterns = [
previous_days + r"(?:del|des*el|de|al|a)s*" + some_text + date_capture_pattern + some_text + r"s*(?:,s*o|o)s*" + previous_days,
after_days + r"(?:del|des*el|de|al|a)s*" + some_text + date_capture_pattern + some_text + r"s*(?:,s*o|o)s*" + previous_days,
previous_days + r"(?:del|des*el|de|al|a)s*" + some_text + date_capture_pattern + some_text + r"s*(?:,s*o|o)s*" + after_days,
after_days + r"(?:del|des*el|de|al|a)s*" + some_text + date_capture_pattern + some_text + r"s*(?:,s*o|o)s*" + after_days]
#Itero la lista de patrones de búsqueda para que el programa intente con uno por uno
for n_pattern in n_patterns:
#Este es mi intento de realizar el reemplazo, aunque tiene problemas con modificadores non-greedy
input_text = re.sub(n_pattern, one_day_or_another_day_relative_to_a_date_func , input_text, re.IGNORECASE)
input_texts = ["8 dias antes o 9 dias antes del 2022-12-22",
"2 dias despues o 1 dia antes del 2022-12-22, dia en donde ocurrio",
"a tan solo 2 dias despues de 2022-12-22 o a caso eran 6 dias despues, mmm no recuerdo bien",
]
#Testing...
for input_text in input_texts:
#print(input_text)
print(one_day_or_another_day_relative_to_a_date_func(input_text))
Incorrect output that I am getting, because if I incorrectly capture the substrings, the replacements will also be incorrect
"aaaaaaaa"
"aaaaaaaa"
"aaaaaaaa"
Having well-defined limits, I don’t understand why this capture pattern try to capture beyond them?
And the output that I need is that:
"aaaaaaaa"
"aaaaaaaa, dia en donde ocurrio"
"a tan solo aaaaaaaa, mmm no recuerdo bien"
Answers:
There are several errors in your code, among which:
- You are printing the result of the
one_day_or_another_day_relative_to_a_date_func
function. Print the result of identify
instead.
- In the
identify
function you are not returning the result text. Add return input_text
at the end of it.
- Make the "o…" suffix optional.
- Use regex alternation instead of multiple patterns, otherwise you may get unexpected results.
Fixed code (I’ve also made it more compact):
import re
def one_day_or_another_day_relative_to_a_date_func(input_text):
#print(repr(input_text)) #print what you have captured, and you should replace
return "aaaaaaaa"
def identify(input_text):
some_text = r"(?:(?!.s*?n)[^;])*"
date_capture_pattern = r"([12]d{3}-[01]d-[0-3]d)(D*?)"
previous_days = r"antes|previos|previo|antes|atrás|atras"
after_days = r"después|despues|luego"
prev_or_after = r"(d+)s*(?:dias|dia)s*(?:" + previous_days + "|" + after_days + ")s*"
preposition = r"(?:del|des*el|de|al|a)s*"
suffix = "(?:" + r"s*(?:,s*o|o)s*" + some_text + prev_or_after + ")?"
pattern = prev_or_after + some_text + preposition + date_capture_pattern + suffix
input_text = re.sub(pattern, one_day_or_another_day_relative_to_a_date_func , input_text, re.IGNORECASE)
return input_text
input_texts = ["8 dias antes o 9 dias antes del 2022-12-22",
"2 dias despues o 1 dia antes del 2022-12-22, dia en donde ocurrio",
"a tan solo 2 dias despues de 2022-12-22 o a caso eran 6 dias despues, mmm no recuerdo bien",
]
#Testing...
for input_text in input_texts:
#print(input_text)
print(identify(input_text))
Result:
aaaaaaaa
aaaaaaaa, dia en donde ocurrio
a tan solo aaaaaaaa, mmm no recuerdo bien
import re
def one_day_or_another_day_relative_to_a_date_func(input_text):
#print(repr(input_text)) #print what you have captured, and you should replace
return "aaaaaaaa"
def identify(input_text):
some_text = r"(?:(?!.s*?n)[^;])*"
date_capture_pattern = r"([12]d{3}-[01]d-[0-3]d)(D*?)"
previous_days = r"(d+)s*(?:dias|dia)s*(?:antes|previos|previo|antes|atrás|atras)s*"
after_days = r"(d+)s*(?:dias|dia)s*(?:después|despues|luego)s*"
n_patterns = [
previous_days + r"(?:del|des*el|de|al|a)s*" + some_text + date_capture_pattern + some_text + r"s*(?:,s*o|o)s*" + previous_days,
after_days + r"(?:del|des*el|de|al|a)s*" + some_text + date_capture_pattern + some_text + r"s*(?:,s*o|o)s*" + previous_days,
previous_days + r"(?:del|des*el|de|al|a)s*" + some_text + date_capture_pattern + some_text + r"s*(?:,s*o|o)s*" + after_days,
after_days + r"(?:del|des*el|de|al|a)s*" + some_text + date_capture_pattern + some_text + r"s*(?:,s*o|o)s*" + after_days]
#Itero la lista de patrones de búsqueda para que el programa intente con uno por uno
for n_pattern in n_patterns:
#Este es mi intento de realizar el reemplazo, aunque tiene problemas con modificadores non-greedy
input_text = re.sub(n_pattern, one_day_or_another_day_relative_to_a_date_func , input_text, re.IGNORECASE)
input_texts = ["8 dias antes o 9 dias antes del 2022-12-22",
"2 dias despues o 1 dia antes del 2022-12-22, dia en donde ocurrio",
"a tan solo 2 dias despues de 2022-12-22 o a caso eran 6 dias despues, mmm no recuerdo bien",
]
#Testing...
for input_text in input_texts:
#print(input_text)
print(one_day_or_another_day_relative_to_a_date_func(input_text))
Incorrect output that I am getting, because if I incorrectly capture the substrings, the replacements will also be incorrect
"aaaaaaaa"
"aaaaaaaa"
"aaaaaaaa"
Having well-defined limits, I don’t understand why this capture pattern try to capture beyond them?
And the output that I need is that:
"aaaaaaaa"
"aaaaaaaa, dia en donde ocurrio"
"a tan solo aaaaaaaa, mmm no recuerdo bien"
There are several errors in your code, among which:
- You are printing the result of the
one_day_or_another_day_relative_to_a_date_func
function. Print the result ofidentify
instead. - In the
identify
function you are not returning the result text. Addreturn input_text
at the end of it. - Make the "o…" suffix optional.
- Use regex alternation instead of multiple patterns, otherwise you may get unexpected results.
Fixed code (I’ve also made it more compact):
import re
def one_day_or_another_day_relative_to_a_date_func(input_text):
#print(repr(input_text)) #print what you have captured, and you should replace
return "aaaaaaaa"
def identify(input_text):
some_text = r"(?:(?!.s*?n)[^;])*"
date_capture_pattern = r"([12]d{3}-[01]d-[0-3]d)(D*?)"
previous_days = r"antes|previos|previo|antes|atrás|atras"
after_days = r"después|despues|luego"
prev_or_after = r"(d+)s*(?:dias|dia)s*(?:" + previous_days + "|" + after_days + ")s*"
preposition = r"(?:del|des*el|de|al|a)s*"
suffix = "(?:" + r"s*(?:,s*o|o)s*" + some_text + prev_or_after + ")?"
pattern = prev_or_after + some_text + preposition + date_capture_pattern + suffix
input_text = re.sub(pattern, one_day_or_another_day_relative_to_a_date_func , input_text, re.IGNORECASE)
return input_text
input_texts = ["8 dias antes o 9 dias antes del 2022-12-22",
"2 dias despues o 1 dia antes del 2022-12-22, dia en donde ocurrio",
"a tan solo 2 dias despues de 2022-12-22 o a caso eran 6 dias despues, mmm no recuerdo bien",
]
#Testing...
for input_text in input_texts:
#print(input_text)
print(identify(input_text))
Result:
aaaaaaaa
aaaaaaaa, dia en donde ocurrio
a tan solo aaaaaaaa, mmm no recuerdo bien