Regexp to remove specific number of occurrences of character only

Question:

In Python re, I have long strings of text with > character chunks of different lengths. One string can have 3 consecutive > chars in the middle, >> in the beginning, or any such combination.

I want to write a regexp that, after splitting the string based on spaces, iterates through each word to only identify those regions with exactly 2 occurrences >>, and I can’t be sure if it’s at the beginning, middle or end of the whole string, or what characters are before or after it, or if it’s even the only 2 characters in the string.

So far I could come up with:

word = re.sub(r'>{2}', '', word)

This ends up removing all occurrences of 2 or more. What regular expression would work for this requirement? Any help is appreciated.

Asked By: Aryan poonacha

||

Answers:

You need to make sure there is no character of your choice both on the left and right using a pair of lookaround, a lookahead and a lookbehind. The general scheme is

(?<!X)X{n}(?!X)

where (?<!X) means no X immediately on the left is allowed, X{n} means n occurrences of X, and (?!X) means no X immediately on the right is allowed.

In this case, use

r'(?<!>)>{2}(?!>)'

See the regex demo.

Answered By: Wiktor Stribiżew

no need to split on spaces first if dont needs to

try (?<![^ ])[^ >]*>>[^ >]*(?![^ ])

finds segments on space boundry’s with only >> in it and no more

Answered By: user13469682
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.