Replace a regex pattern in a string with another regex pattern in Python

Question

Is there a way to replace a regex pattern in a string with another regex pattern? I tried this but it didn’t work as intended:

s = 'This is a test. There are two tests'
re.sub(r'btest(s)??b', "<b><font color='blue'>btest(s)??b</font></b>", s)

The output was:

"This is a <b><font color='blue'>x08test(s)??x08</font></b>. There are two <b><font color='blue'>x08test(s)??x08</font></b>"

Instead of the desired result of enclosing the keyword test and tests with html tags:

"This is a <b><font color='blue'>test</font></b>. There are two <b><font color='blue'>tests</font></b>"

And if there was a workaround, how could I apply that to a text column in a dataframe?

Thanks in advance.

Asked By: Nemo

||

Source

Answer 1

You can use a function to replace.

import re


def replacer(match):
    return f"<b><font color='blue'>{match[0]}</font></b>"


s = 'This is a test. There are two tests'
ss = re.sub(r'btest(s)??b', replacer, s)
print(ss)

This is a <b><font color='blue'>test</font></b>. There are two <b><font color='blue'>tests</font></b>

Answered By: Сергей Кох

Answer 2

If in result you want to put element which it found in original text then you have to put regex in () (to catch it) and later use 1 to put this element in result.

re.sub(r'(btest(s)??b)', r"<b><font color='blue'>1</font></b>", s)

BTW: it needs also prefix r in result to treat as normal char.

Result:

"This is a <b><font color='blue'>test</font></b>. There are two <b><font color='blue'>tests</font></b>"

If you will use more () then every () will catch separated elements and every element will have own number 1, 2, etc.

For example

re.sub(r'(.*) (.*)', r'2 1', 'first second')

gives:

'second first'

In example it catchs also (s) and it has number 2

Answered By: furas

Replace a regex pattern in a string with another regex pattern in Python

Question:

Answers: