How to replace only part of the match with python re.sub
Question:
I need to match two cases by one reg expression and do replacement
‘long.file.name.jpg’ -> ‘long.file.name_suff.jpg’
‘long.file.name_a.jpg’ -> ‘long.file.name_suff.jpg’
I’m trying to do the following
re.sub('(_a)?.[^.]*$' , '_suff.',"long.file.name.jpg")
But this is cut the extension ‘.jpg’ and I’m getting
long.file.name_suff. instead of long.file.name_suff.jpg
I understand that this is because of [^.]*$ part, but I can’t exclude it, because
I have to find last occurance of ‘_a’ to replace or last ‘.’
Is there a way to replace only part of the match?
Answers:
Put a capture group around the part that you want to preserve, and then include a reference to that capture group within your replacement text.
re.sub(r'(_a)?.([^.]*)$' , r'_suff.2',"long.file.name.jpg")
Just put the expression for the extension into a group, capture it and reference the match in the replacement:
re.sub(r'(?:_a)?(.[^.]*)$' , r'_suff1',"long.file.name.jpg")
Additionally, using the non-capturing group (?:…)
will prevent re to store to much unneeded information.
re.sub(r'(?:_a)?.([^.]*)$', r'_suff.1', "long.file.name.jpg")
?:
starts a non matching group (SO answer), so (?:_a)
is matching the _a
but not enumerating it, the following question mark makes it optional.
So in English, this says, match the ending .<anything>
that follows (or doesn’t) the pattern _a
Another way to do this would be to use a lookbehind (see here). Mentioning this because they’re super useful, but I didn’t know of them for 15 years of doing REs
You can do it by excluding the parts from replacing. I mean, you can say to the regex module; “match with this pattern, but replace a piece of it”.
re.sub(r'(?<=long.file.name)(_a)?(?=.([^.]*)$)' , r'_suff',"long.file.name.jpg")
>>> 'long.file.name_suff.jpg'
long.file.name and .jpg parts are being used on matching, but they are excluding from replacing.
print(re.sub('name(_a)?','name_suff','long.file.name_a.jpg'))
# long.file.name_suff.jpg
print(re.sub('name(_a)?','name_suff','long.file.name.jpg'))
# long.file.name_suff.jpg
I wanted to use capture groups to replace a specific part of a string to help me parse it later. Consider the example below:
s= '<td> <address> 110 SOLANA ROAD, SUITE 102<br>PONTE VEDRA BEACH, FL32082 </address> </td>'
re.sub(r'(<address>s.*?)(<br>)(.*?</address>)', r'1 -- 3', s)
##'<td> <address> 110 SOLANA ROAD, SUITE 102 -- PONTE VEDRA BEACH, FL32082 </address> </td>'
I need to match two cases by one reg expression and do replacement
‘long.file.name.jpg’ -> ‘long.file.name_suff.jpg’
‘long.file.name_a.jpg’ -> ‘long.file.name_suff.jpg’
I’m trying to do the following
re.sub('(_a)?.[^.]*$' , '_suff.',"long.file.name.jpg")
But this is cut the extension ‘.jpg’ and I’m getting
long.file.name_suff. instead of long.file.name_suff.jpg
I understand that this is because of [^.]*$ part, but I can’t exclude it, because
I have to find last occurance of ‘_a’ to replace or last ‘.’
Is there a way to replace only part of the match?
Put a capture group around the part that you want to preserve, and then include a reference to that capture group within your replacement text.
re.sub(r'(_a)?.([^.]*)$' , r'_suff.2',"long.file.name.jpg")
Just put the expression for the extension into a group, capture it and reference the match in the replacement:
re.sub(r'(?:_a)?(.[^.]*)$' , r'_suff1',"long.file.name.jpg")
Additionally, using the non-capturing group (?:…)
will prevent re to store to much unneeded information.
re.sub(r'(?:_a)?.([^.]*)$', r'_suff.1', "long.file.name.jpg")
?:
starts a non matching group (SO answer), so (?:_a)
is matching the _a
but not enumerating it, the following question mark makes it optional.
So in English, this says, match the ending .<anything>
that follows (or doesn’t) the pattern _a
Another way to do this would be to use a lookbehind (see here). Mentioning this because they’re super useful, but I didn’t know of them for 15 years of doing REs
You can do it by excluding the parts from replacing. I mean, you can say to the regex module; “match with this pattern, but replace a piece of it”.
re.sub(r'(?<=long.file.name)(_a)?(?=.([^.]*)$)' , r'_suff',"long.file.name.jpg")
>>> 'long.file.name_suff.jpg'
long.file.name and .jpg parts are being used on matching, but they are excluding from replacing.
print(re.sub('name(_a)?','name_suff','long.file.name_a.jpg'))
# long.file.name_suff.jpg
print(re.sub('name(_a)?','name_suff','long.file.name.jpg'))
# long.file.name_suff.jpg
I wanted to use capture groups to replace a specific part of a string to help me parse it later. Consider the example below:
s= '<td> <address> 110 SOLANA ROAD, SUITE 102<br>PONTE VEDRA BEACH, FL32082 </address> </td>'
re.sub(r'(<address>s.*?)(<br>)(.*?</address>)', r'1 -- 3', s)
##'<td> <address> 110 SOLANA ROAD, SUITE 102 -- PONTE VEDRA BEACH, FL32082 </address> </td>'