Regexp to remove specific number of occurrences of character only
Question:
In Python re
, I have long strings of text with >
character chunks of different lengths. One string can have 3 consecutive >
chars in the middle, >>
in the beginning, or any such combination.
I want to write a regexp that, after splitting the string based on spaces, iterates through each word to only identify those regions with exactly 2 occurrences >>
, and I can’t be sure if it’s at the beginning, middle or end of the whole string, or what characters are before or after it, or if it’s even the only 2 characters in the string.
So far I could come up with:
word = re.sub(r'>{2}', '', word)
This ends up removing all occurrences of 2 or more. What regular expression would work for this requirement? Any help is appreciated.
Answers:
You need to make sure there is no character of your choice both on the left and right using a pair of lookaround, a lookahead and a lookbehind. The general scheme is
(?<!X)X{n}(?!X)
where (?<!X)
means no X
immediately on the left is allowed, X{n}
means n occurrences of X
, and (?!X)
means no X
immediately on the right is allowed.
In this case, use
r'(?<!>)>{2}(?!>)'
See the regex demo.
no need to split on spaces first if dont needs to
try (?<![^ ])[^ >]*>>[^ >]*(?![^ ])
finds segments on space boundry’s with only >>
in it and no more
In Python re
, I have long strings of text with >
character chunks of different lengths. One string can have 3 consecutive >
chars in the middle, >>
in the beginning, or any such combination.
I want to write a regexp that, after splitting the string based on spaces, iterates through each word to only identify those regions with exactly 2 occurrences >>
, and I can’t be sure if it’s at the beginning, middle or end of the whole string, or what characters are before or after it, or if it’s even the only 2 characters in the string.
So far I could come up with:
word = re.sub(r'>{2}', '', word)
This ends up removing all occurrences of 2 or more. What regular expression would work for this requirement? Any help is appreciated.
You need to make sure there is no character of your choice both on the left and right using a pair of lookaround, a lookahead and a lookbehind. The general scheme is
(?<!X)X{n}(?!X)
where (?<!X)
means no X
immediately on the left is allowed, X{n}
means n occurrences of X
, and (?!X)
means no X
immediately on the right is allowed.
In this case, use
r'(?<!>)>{2}(?!>)'
See the regex demo.
no need to split on spaces first if dont needs to
try (?<![^ ])[^ >]*>>[^ >]*(?![^ ])
finds segments on space boundry’s with only >>
in it and no more