Regex: Remove all special characters that are not apostrophes between letters

Question:

I have a string like so:

s = "i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread."

and I am trying to use re.sub to replace all special characters that are not apostrophes between letters with a space, so ‘gluten-free’ becomes gluten free and i’m will stay as i’m.

I have tried this:

import re

s = re.sub('[^[a-z]+'?[a-z]+]', ' ', s)

which I am trying to say is to replace anything that is not following the pattern of one and more letters, with then 0 or one apostrophes, followed by one or more letters with white space.

this returns the same string:

i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread.

I would like to have:

i'm sorry  sir  but this is a  gluten free  restaurant  we don't serve bread 
Asked By: Stefano Potter

||

Answers:

You may use this regex with a nested lookahead+lookbehind:

>>> s = "i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread."
>>> print ( re.sub(r"(?!(?<=[a-z])'[a-z])[^ws]", ' ', s, flags=re.I) )
i'm sorry  sir  but this is a  gluten free  restaurant  we don't serve bread

RegEx Demo

RegEx Details:

  • (?!: Start negative lookahead
    • (?<=[a-z]): Positive lookbehind to assert that we have an alphabet at previous position
    • ': Match an apostrophe
    • [a-z]: Match letter [a-z]
  • ): End negative lookahead
  • [^ws]: Match a character that is not a whitespace and not a word character
Answered By: anubhava

You can use

import re
s = "i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread."
print( re.sub(r"(?:(?!b['‘’]b)[W_])+", ' ', s).strip() )
# => i'm sorry sir but this is a gluten free restaurant we don't serve bread

See the Python demo and the regex demo.

Details:

  • (?: – start of a non-capturing group:
    • (?!b['‘’]b) – a negative lookahead that fails the match if there is an apostrophe within word chars
    • [W_] – a non-word or _ char
  • )+ – one or more occurrences
Answered By: Wiktor Stribiżew
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.