Set regex pattern that if said pattern is validated in a string, a part of it is replaced by another substring

Question:

How to fix this regex so that with these input strings I get these outputs…

out = re.sub(r"(hs|h.s|h.s.)a m(W|b)", r"1 am2", out)
print(repr(out))

Input string examples…

#example 1.1
colloquial_hour = "Cerca de las 2: hs a m, hay que salir antes de esas hs a m"
#example 1.2
colloquial_hour = "A medida que avance cerca de la media noche 12: 04 hs a m. Deben ir a las 15 hs a m."
#example 1.3
colloquial_hour = "A mmm... cerca de las 12: h.s a m, hay que salir antes de esas h.s. a m"
#example 1.4
colloquial_hour = "A medida que avance cerca de las 12:04 hs. a m. Deben ir a las 15 h.s a m."

correct outputs:

#correct output for example 1.1
"Cerca de las 2: hs am, hay que salir antes de esas hs a m"
#correct output for example 1.2
"A medida que avance cerca de la media noche 12: 04 hs am. Deben ir a las 15 hs am."
#correct output for example 1.3
"A mmm... cerca de las 12: h.s am, hay que salir antes de esas h.s. a m"
#correct output for example 1.4
"A medida que avance cerca de las 12:04 hs. am. Deben ir a las 15 h.s am."

The logic should work that su will do a numeric value and then an "a m" replace that "a m" substring with this string "am" in the original string.

These would be all the possible cases where you have to replace the substring "a m" with "am"

X a m
X: a m
X: hs a m
X: h.s. a m
X: h.s a m
X: hs. a m
X:  a m
X : hs a m
X  : h.s. a m
X : h.s a m
X  : hs. a m
X hs a m
X h.s. a m
X h.s a m
X hs. a m

#where "X" is a numerical value ("1", "2", "3", "4", "5", "6", ... )
#in all these cases, in which this pattern is met, "a m" must be replaced by "am"

Answers:

You could match:

(d+s*:?s*(?:h.?s.?)?)s*a mb

The pattern matches:

  • ( Capture group 1
    • d+s*:?s* match 1+ digits and an optional : between optional whitespace chars
    • (?:h.?s.?)? Optionally match hm h.s hs. h.s.
  • ) Close group 1
  • s*a mb Match optional whitespace chars and a m

And replace with group 1 followed by am

1 am

See a regex demo and a Python demo

Answered By: The fourth bird

You can search using regex:

(dW+)(h.?s.?s+)?as+mb

and replace using:

12am

RegEx Demo

RegEx Details:

  • (dW+): Match a digit followed by 1+ non-word char in capture group #1
  • (h.?s.?s+)?: Match h followed by s with optional dots after them. This optional group is capture group #2
  • as+mb: Match a followed by 1+ whitespaces then m with a word boundary
Answered By: anubhava

My solution uses re.sub

import re

phrases = ["Cerca de las 2: hs a m, hay que salir antes de esas hs a m",
"A medida que avance cerca de la media noche 12: 04 hs a m. Deben ir a las 15 hs a m.",
"A mmm... cerca de las 12: h.s a m, hay que salir antes de esas h.s. a m",
"A medida que avance cerca de las 12:04 hs. a m. Deben ir a las 15 h.s a m."]

pattern = re.compile(r'ds*?:?s*?h?.?s?.?s(a m)')

for phrase in phrases:
    print(pattern.sub(lambda x: x.group(0)[:-3] + "am", phrase))

OUTPUT

Cerca de las 2: hs am, hay que salir antes de esas hs a m
A medida que avance cerca de la media noche 12: 04 hs am. Deben ir a las 15 hs am.
A mmm... cerca de las 12: h.s am, hay que salir antes de esas h.s. a m
A medida que avance cerca de las 12:04 hs. am. Deben ir a las 15 h.s am.
Answered By: Alexander