How to capture all substrings that match this regex pattern, which is based on a repeat range of 2 or more consecutive times?
Question:
import re
input_text = "((PERS)Marcos) ssdsdsd sdsdsdsd sdsdsd le ((VERB)empujé) hasta ((VERB)dejarle) en ese lugar. A ((PERS)Marcos) le ((VERB)dijeron) y luego le ((VERB)ayudo)"
input_text = re.sub(r"((PERS)((?:ws*)+))s*((?!el)w+s+){2,}(le)",
lambda m: print(f"{m[2]}"),
input_text, flags = re.IGNORECASE)
print(repr(input_text)) # --> output
Here I have used repeat quantifiers, such as +
(one or more repeats) or *
(zero or more repeats), in combination with {}
to specify a range of repeats.
Why this code gives me as output, only the first word and not all the possible words that the pattern ((?!el)w+s+){2,}
would cover. Since this pattern captures if there are 2 or more words between "((PERS) )"
and "el"
?
"sdsdsd "
And not this output, which is what I want to get
" ssdsdsd sdsdsdsd sdsdsd "
How could I fix my regex to get this result when I print capturing group 2?
Answers:
Wrap the entire part (s*((?!el)w+s+){2,}
) into one capturing group.
m = re.search(r"((PERS)((?:ws*)+))(s*((?!el)w+s+){2,})(le)",
input_text, flags=re.IGNORECASE)
print(m.group(2))
import re
input_text = "((PERS)Marcos) ssdsdsd sdsdsdsd sdsdsd le ((VERB)empujé) hasta ((VERB)dejarle) en ese lugar. A ((PERS)Marcos) le ((VERB)dijeron) y luego le ((VERB)ayudo)"
input_text = re.sub(r"((PERS)((?:ws*)+))s*((?!el)w+s+){2,}(le)",
lambda m: print(f"{m[2]}"),
input_text, flags = re.IGNORECASE)
print(repr(input_text)) # --> output
Here I have used repeat quantifiers, such as +
(one or more repeats) or *
(zero or more repeats), in combination with {}
to specify a range of repeats.
Why this code gives me as output, only the first word and not all the possible words that the pattern ((?!el)w+s+){2,}
would cover. Since this pattern captures if there are 2 or more words between "((PERS) )"
and "el"
?
"sdsdsd "
And not this output, which is what I want to get
" ssdsdsd sdsdsdsd sdsdsd "
How could I fix my regex to get this result when I print capturing group 2?
Wrap the entire part (s*((?!el)w+s+){2,}
) into one capturing group.
m = re.search(r"((PERS)((?:ws*)+))(s*((?!el)w+s+){2,})(le)",
input_text, flags=re.IGNORECASE)
print(m.group(2))