How to use python regex to replace using captured group?
Question:
Suppose I want to change the blue dog and blue cat wore blue hats
to the gray dog and gray cat wore blue hats
.
With sed
I could accomplish this as follows:
$ echo 'the blue dog and blue cat wore blue hats' | sed 's/blue (dog|cat)/gray 1/g'
How can I do a similar replacement in Python? I’ve tried:
>>> import re
>>> s = "the blue dog and blue cat wore blue hats"
>>> p = re.compile(r"blue (dog|cat)")
>>> p.sub('gray 1',s)
'the gray x01 and gray x01 wore blue hats'
Answers:
You need to escape your backslash:
p.sub('gray \1', s)
alternatively you can use a raw string as you already did for the regex:
p.sub(r'gray 1', s)
Try this:
p.sub('gray g<1>',s)
As I was looking for a similar answer; but wanting using named groups within the replace, I thought I’d add the code for others:
p = re.compile(r'blue (?P<animal>dog|cat)')
p.sub(r'gray g<animal>',s)
Off topic,
For numbered capture groups:
#/usr/bin/env python
import re
re.sub(
pattern=r'(d)(w+)',
repl='word: \2, digit: \1',
string='1asdf'
)
word: asdf, digit: 1
Python uses literal backslash, plus one-based-index to do numbered capture group replacements, as shown in this example. So 1
, entered as '\1'
, references the first capture group (d)
, and 2
the second captured group.
Suppose I want to change the blue dog and blue cat wore blue hats
to the gray dog and gray cat wore blue hats
.
With sed
I could accomplish this as follows:
$ echo 'the blue dog and blue cat wore blue hats' | sed 's/blue (dog|cat)/gray 1/g'
How can I do a similar replacement in Python? I’ve tried:
>>> import re
>>> s = "the blue dog and blue cat wore blue hats"
>>> p = re.compile(r"blue (dog|cat)")
>>> p.sub('gray 1',s)
'the gray x01 and gray x01 wore blue hats'
You need to escape your backslash:
p.sub('gray \1', s)
alternatively you can use a raw string as you already did for the regex:
p.sub(r'gray 1', s)
Try this:
p.sub('gray g<1>',s)
As I was looking for a similar answer; but wanting using named groups within the replace, I thought I’d add the code for others:
p = re.compile(r'blue (?P<animal>dog|cat)')
p.sub(r'gray g<animal>',s)
Off topic,
For numbered capture groups:
#/usr/bin/env python
import re
re.sub(
pattern=r'(d)(w+)',
repl='word: \2, digit: \1',
string='1asdf'
)
word: asdf, digit: 1
Python uses literal backslash, plus one-based-index to do numbered capture group replacements, as shown in this example. So 1
, entered as '\1'
, references the first capture group (d)
, and 2
the second captured group.