python regex to substitute all digits except when they are part of a substring

Question:

I want to remove all digits, except if the digits make up one of the special substrings. In the example below, my special substring that should skip the digit removal are 1s, 2s, s4, 3s. I think I need to use a negative lookahead

s = "a61s8sa92s3s3as4s4af3s"
pattern = r"(?!1s|2s|s4|3s)[0-9.]"
re.sub(pattern, ' ', s)

To my understanding, the pattern above is:

  • starting from the end ([]) match all digits including decimals
  • only do that if we have not matched the patter after ?!
  • which are 1s, 2s, s4, OR 3s (| = OR)

It all makes sense until you try it. The sample s above returns a 1s sa 2s3s as s af3s, which suggests that all the exclusion patterns are working except if the digit is at the end of the special substring, in which case it still gets matched?!

I believe this operation should return a 1s sa 2s3s as4s4af3s, how to fix my pattern?

Asked By: MarcelD

||

Answers:

You can use

import re
s = "a61s8sa92s3s3as4s4af3s"
pattern = r"(1s|2s|s4|3s)|[d.]"
print( re.sub(pattern, lambda x: x.group(1) or ' ', s) )
# => a 1s sa 2s3s as4s4af3s

See the Python demo.

Details:

  • (1s|2s|s4|3s) – Group 1: 1s, 2s, s4 or 3s
  • | – or
  • [d.] – a digit or dot.

If Group 1 matches, Group 1 value is the replacement, else, it is a space.

Answered By: Wiktor Stribiżew

Try (regex101):

import re

s = "a61s8sa92s3s3as4s4af3s"

s = re.sub(r"(?!1s|2s|3s)(?<!s(?=4))[d.]", " ", s)
print(s)

Prints:

a 1s sa 2s3s as4s4af3s
Answered By: Andrej Kesely
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.