re.sub a list of words, ignore case

Question:

I am trying to add the html <b> element to a list of words in a sentence. After doing some search I got it almost working, except the ignore-case.

import re

bolds = ['test', 'tested']  # I want to bold these words, ignoring-case
text = "Test lorem tested ipsum dolor sit amet test, consectetur TEST adipiscing elit test."

pattern = r'b(?:' + "|".join(bolds) + r')b'
dict_repl = {k: f'<b>{k}</b>' for k in bolds}
text_bolded = re.sub(pattern, lambda m: dict_repl.get(m.group(), m.group()), text)
print(text_bolded)

Output:

Test lorem <b>tested</b> ipsum dolor sit amet <b>test</b>, consectetur TEST adipiscing elit <b>test</b>.

This output misses the <b> element for Test and TEST. In other words, I would like the output to be:

<b>Test</b> lorem <b>tested</b> ipsum dolor sit amet <b>test</b>, consectetur <b>TEST</b> adipiscing elit <b>test</b>.

One hack is that I explicitly add the capitalize and upper, like so …

bolds = bolds + [b.capitalize() for b in bolds] + [b.upper() for b in bolds]

But I am thinking there must be a better way to do this. Besides, the above hack will miss words like tesT, etc.

Thank you!

Asked By: tikka

||

Answers:

There’s no need for the dictionary or function. All the replacements are simple string wrapped around the original string, you can get that with a back-reference.

Use flags=re.I to make the match case-insensitive.

text_bolded = re.sub(pattern, r'<b>g<0></b>', text, flags=re.I)

g<0> is a back-reference that returns the full match of the pattern.

Answered By: Barmar
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.