How to add a whitespace before "((VERB)" only if it is not preceded by a space or the beginning of the string?
Question:
import re
#input string example:
input_text = "((VERB)ayudar a nosotros) ár((VERB)ayudar a nosotros) Los computadores pueden ((VERB)ayudar a nosotros)"
#this give me a raise error("look-behind requires fixed-width pattern") re.error: look-behind requires fixed-width pattern
#input_text = re.sub(r"(?<!^|s)((VERB)", " ((VERB)", input_text)
#and this other option simply places a space in front of all ((VERB) )
# without caring if there is a space or the beginning of the string in front
input_text = re.sub(r"(^|s)((VERB)", lambda match: match.group(1) + "((VERB)", input_text)
print(repr(input_text)) # --> output
I have tried using (^|s)
as it is a capturing group that looks for the start of the string ^
or a whitespace just before the pattern "((VERB)"
. Another pattern option could be with a non-capturing group (?:|)
or better still using a context limiter like look-behind (?<!^|s)
This is the output you should be getting when running this script:
"((VERB)ayudar a nosotros) ár ((VERB)ayudar a nosotros) Los computadores pueden ((VERB)ayudar a nosotros)"
Answers:
You can assert a non whitespace char to the left:
(?<=S)((VERB)
In the replacement use a space followed by the full match r" g<0>"
import re
input_text = "((VERB)ayudar a nosotros) ár((VERB)ayudar a nosotros) Los computadores pueden ((VERB)ayudar a nosotros)"
input_text = re.sub(r"(?<=S)((VERB)", r" g<0>", input_text)
print(input_text)
Output
((VERB)ayudar a nosotros) ár ((VERB)ayudar a nosotros) Los computadores pueden ((VERB)ayudar a nosotros)
An alternative to using a lookbehind could be to match on any character except whitespaces before ((VERB)
:
([^s])(((VERB))
substitute with
1 2
(
– start of capture group 1
[^s]
– match on a single character that is not a whitespace
)
– end of capture group 1
(
– start of capture group 2
((VERB)
– literal match on ((VERB)
)
– end of capture group 2
import re
#input string example:
input_text = "((VERB)ayudar a nosotros) ár((VERB)ayudar a nosotros) Los computadores pueden ((VERB)ayudar a nosotros)"
#this give me a raise error("look-behind requires fixed-width pattern") re.error: look-behind requires fixed-width pattern
#input_text = re.sub(r"(?<!^|s)((VERB)", " ((VERB)", input_text)
#and this other option simply places a space in front of all ((VERB) )
# without caring if there is a space or the beginning of the string in front
input_text = re.sub(r"(^|s)((VERB)", lambda match: match.group(1) + "((VERB)", input_text)
print(repr(input_text)) # --> output
I have tried using (^|s)
as it is a capturing group that looks for the start of the string ^
or a whitespace just before the pattern "((VERB)"
. Another pattern option could be with a non-capturing group (?:|)
or better still using a context limiter like look-behind (?<!^|s)
This is the output you should be getting when running this script:
"((VERB)ayudar a nosotros) ár ((VERB)ayudar a nosotros) Los computadores pueden ((VERB)ayudar a nosotros)"
You can assert a non whitespace char to the left:
(?<=S)((VERB)
In the replacement use a space followed by the full match r" g<0>"
import re
input_text = "((VERB)ayudar a nosotros) ár((VERB)ayudar a nosotros) Los computadores pueden ((VERB)ayudar a nosotros)"
input_text = re.sub(r"(?<=S)((VERB)", r" g<0>", input_text)
print(input_text)
Output
((VERB)ayudar a nosotros) ár ((VERB)ayudar a nosotros) Los computadores pueden ((VERB)ayudar a nosotros)
An alternative to using a lookbehind could be to match on any character except whitespaces before ((VERB)
:
([^s])(((VERB))
substitute with
1 2
(
– start of capture group 1[^s]
– match on a single character that is not a whitespace
)
– end of capture group 1(
– start of capture group 2((VERB)
– literal match on((VERB)
)
– end of capture group 2