Replace a regex pattern in a string with another regex pattern in Python
Question:
Is there a way to replace a regex pattern in a string with another regex pattern? I tried this but it didn’t work as intended:
s = 'This is a test. There are two tests'
re.sub(r'btest(s)??b', "<b><font color='blue'>btest(s)??b</font></b>", s)
The output was:
"This is a <b><font color='blue'>x08test(s)??x08</font></b>. There are two <b><font color='blue'>x08test(s)??x08</font></b>"
Instead of the desired result of enclosing the keyword test
and tests
with html tags:
"This is a <b><font color='blue'>test</font></b>. There are two <b><font color='blue'>tests</font></b>"
And if there was a workaround, how could I apply that to a text column in a dataframe?
Thanks in advance.
Answers:
You can use a function to replace.
import re
def replacer(match):
return f"<b><font color='blue'>{match[0]}</font></b>"
s = 'This is a test. There are two tests'
ss = re.sub(r'btest(s)??b', replacer, s)
print(ss)
This is a <b><font color='blue'>test</font></b>. There are two <b><font color='blue'>tests</font></b>
If in result you want to put element which it found in original text then you have to put regex in ()
(to catch it) and later use 1
to put this element in result.
re.sub(r'(btest(s)??b)', r"<b><font color='blue'>1</font></b>", s)
BTW: it needs also prefix r
in result to treat
as normal char.
Result:
"This is a <b><font color='blue'>test</font></b>. There are two <b><font color='blue'>tests</font></b>"
If you will use more ()
then every ()
will catch separated elements and every element will have own number 1
, 2
, etc.
For example
re.sub(r'(.*) (.*)', r'2 1', 'first second')
gives:
'second first'
In example it catchs also (s)
and it has number 2
Is there a way to replace a regex pattern in a string with another regex pattern? I tried this but it didn’t work as intended:
s = 'This is a test. There are two tests'
re.sub(r'btest(s)??b', "<b><font color='blue'>btest(s)??b</font></b>", s)
The output was:
"This is a <b><font color='blue'>x08test(s)??x08</font></b>. There are two <b><font color='blue'>x08test(s)??x08</font></b>"
Instead of the desired result of enclosing the keyword test
and tests
with html tags:
"This is a <b><font color='blue'>test</font></b>. There are two <b><font color='blue'>tests</font></b>"
And if there was a workaround, how could I apply that to a text column in a dataframe?
Thanks in advance.
You can use a function to replace.
import re
def replacer(match):
return f"<b><font color='blue'>{match[0]}</font></b>"
s = 'This is a test. There are two tests'
ss = re.sub(r'btest(s)??b', replacer, s)
print(ss)
This is a <b><font color='blue'>test</font></b>. There are two <b><font color='blue'>tests</font></b>
If in result you want to put element which it found in original text then you have to put regex in ()
(to catch it) and later use 1
to put this element in result.
re.sub(r'(btest(s)??b)', r"<b><font color='blue'>1</font></b>", s)
BTW: it needs also prefix r
in result to treat as normal char.
Result:
"This is a <b><font color='blue'>test</font></b>. There are two <b><font color='blue'>tests</font></b>"
If you will use more ()
then every ()
will catch separated elements and every element will have own number 1
, 2
, etc.
For example
re.sub(r'(.*) (.*)', r'2 1', 'first second')
gives:
'second first'
In example it catchs also (s)
and it has number 2