Create a lambda function that allows me to identify if a regex capturing group is equal to another capturing group, and if so replace it

Question:

import re

input_text = "Asegurarse que halla 4(4) o 5( 5  ) de ellos, también 24(24 48), o sino 24, (24 )no 66(70 y seis), o () 56(), 56(5 6) ssd 56"

lambda_fuction_result = lambda m: re.sub(r"", "", m.group())

input_text = re.sub(capture_groups_with_regex, lambda_fuction_result, input_text)

print(repr(input_text)) # --> output

The correct output that I need obtain:

"Asegurarse que halla 4 o 5 de ellos, también 24(24 48), o sino 24, (24 )no 66(70 y seis), o () 56(), 56(5 6) ssd 56"

I need the data that is entered repeated between parentheses to be removed from the main string, since they are redundant clarifications. These replacements should only be done if the parentheses are immediately after the number, and that the number inside the parentheses is exactly equal to the number that is before them, you must also consider that the number inside the parentheses must be alone and there must be nothing more inside them.

The regex should look something like this r"d*[s|]([s|]*d*[s|]*)[s|]", , but that it has some validation that allows me to evaluate that the d* in front of the parentheses is equal to the d* inside the parentheses

Answers:

As far as I understood your task is to remove parentheses with number if it equals number before those parentheses.

First of all you need a regular expression which matches number and number in parentheses in separate capturing groups:

(d+)s*(s*(d+)s*)

Link to demo.

To check whether values of both capturing groups are equal we can exploit feature of re.sub() and pass a function to second argument which will compare groups and return either number or full match.

def replace(match):
    if match.group(1) == match.group(2):
        return match.group(1)
    else:
        return match.group(0)  # keep same

input_text = "this 0(   0) is 0 (1   ) test 100    ( 1 0 0   ) string 1(1)."
result = re.sub(r"(d+)s*(s*(d+)s*)", replace, input_text)

Let’s shorten our function. It’s quite more compact to access capturing groups using indexing (e.g. match[0]). Also we can apply some python magic. In python True equals 1 and False equals 0 and (as you might have noticed) our replace() function returns either group 0 or group 1. Combining all this together we can use condition itself as index of group:

def replace(match):
    return match.group(match.group(1) == match.group(2))
# OR
def replace(match):
    return match[match[1] == match[2]]
# OR
replace = lambda m: m[m[1] == m[2]]

Shortened version of code:

input_text = "this 0(   0) is 0 (1   ) test 100    ( 1 0 0   ) string 1(1)."
result = re.sub(r"(d+)s*(s*(d+)s*)", lambda m: m[m[1] == m[2]], input_text)

Output:

this 0 is 0 (1   ) test 100    ( 1 0 0   ) string 1.

Upd. I completely forgot about backreferences when wrote this answer.

There’s a better way to implement this:

input_text = "this 0(   0) is 0 (1   ) test 100    ( 1 0 0   ) string 1(1)."
result = re.sub(r"(d+)s*(s*1s*)", r"1", input_text)

Here using 1 we let regular expressions engine to do all the job matching only equal numbers. Link to demo.


You can help my country, check my profile info.

Answered By: Olvin Roght
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.