How to replace a word suffix using re.sub() in Python?

Question:

If I had a body of text and wanted to replace "ion" or "s" with nothing but keep the rest of the word (so if the word is reflection it should output reflect), how would I go about that? I have tried:

new_llw = re.sub(r'[a-z]+ion', "", llw)
print(new_llw)

which replaces the whole word, and I tried

if re.search(r'[a-z]+ion', "", llw) is True:
    re.sub('ion', '', llw)

print(llw)

which gives me an error:

TypeError: unsupported operand type(s) for &: ‘str’ and ‘int’

Asked By: dmoses

||

Answers:

For the ion replacement, you may use a positive lookbehind:

inp = "reflection"
output = re.sub(r'(?<=w)ionb', '', inp)
print(output)  # reflect
Answered By: Tim Biegeleisen

The TypeError: unsupported operand type(s) for &: 'str' and 'int' error is due to the fact you are using re.search(r'[a-z]+ion', "", llw) like re.sub. The second argument to re.search is the input string, which is empty and the third argument is the flags, that are set with specific regex options (like re.A or re.I) that may present a bitwise mask (re.A | re.I).

Now, if you need to match an ion as a suffix in a word, you can use

new_llw = re.sub(r'Bionb', '', llw)

Here, B matches a location that is immediately preceded with a word char (a letter, digit or connector punctuation, like _), then ion matches ion and b matches a location that is either at the end of string or immediately followed with a non-word char.

To also match an s suffix:

new_llw = re.sub(r'B(?:ion|s)b', '', llw)

The (?:...) is a non-capturing group.

See the regex demo.

Variations

If you consider words as letter sequences only, you can use

new_llw = re.sub(r'(?<=[a-zA-Z])(?:ion|s)b', '', llw) # ASCII only version
new_llw = re.sub(r'(?<=[^Wd_])(?:ion|s)b', '', llw) # Any Unicode letters supported

Here, (?<=[a-zA-Z]) matches a location that is immediately preceded with an ASCII letter.

Answered By: Wiktor Stribiżew
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.