Making letters uppercase using re.sub in python?
Question:
In many programming languages, the following
find foo([a-z]+)bar
and replace with GOOU1GAR
will result in the entire match being made uppercase. I can’t seem to find the equivalent in python; does it exist?
Answers:
You could use some variation of this:
s = 'foohellobar'
def replfunc(m):
return m.groups()[0]+m.groups()[1].upper()+m.groups()[2]
re.sub('(foo)([a-z]+)(bar)',replfunc,s)
gives the output:
'fooHELLObar'
You can pass a function to re.sub()
that will allow you to do this, here is an example:
def upper_repl(match):
return 'GOO' + match.group(1).upper() + 'GAR'
And an example of using it:
>>> re.sub(r'foo([a-z]+)bar', upper_repl, 'foobazbar')
'GOOBAZGAR'
Unfortunately this U1 syntax could never work in Python because U in a string literal indicates the beginning of a 32-bit hex escape sequence. For example, "U0001f4a9" == " "
.
However there are easy alternative to Perl’s case conversion escapes available by using a replacement function. In re.sub(pattern, repl, string, count=0, flags=0)
the replacement repl
is usually a string, but it can also be a callable. If it is a callable, it’s passed the Match object and must return a replacement string to be used.
So, for the example given in the question, this is possible:
>>> string = "fooquuxbar"
>>> pattern = "foo([a-z]+)bar"
>>> re.sub(pattern, lambda m: f"GOO{m.group(1).upper()}GAR", string)
'GOOQUUXGAR'
Here is a table of other string methods which might be useful for similar case conversions.
Modifier
Description
Example
Python callable to use
U
Uppercase
foo BAR –> FOO BAR
str.upper
L
Lowercase
foo BAR –> foo bar
str.lower
or str.casefold
I
Initial capital
foo BAR –> Foo Bar
str.title
F
First capital
foo BAR –> Foo bar
str.capitalize
If you already have a replacement string (template), you may not be keen on swapping it out with the verbosity of m.group(1)+...+m.group(2)+...+m.group(3)
… Sometimes it’s nice to have a tidy little string.
You can use the MatchObject
‘s expand() function to evaluate a template for the match in the same manner as sub(), allowing you to retain as much of your original template as possible. You can use upper
on the relevant pieces.
re.sub(r'foo([a-z]+)bar', lambda m: 'GOO' + m.expand(r'1GAR').upper(), 'foobazbar')
While this would not be particularly useful in the example above, and while it does not aid with complex circumstances, it may be more convenient for longer expressions with a greater number of captured groups, such as a MAC address censoring regex, where you just want to ensure the full replacement is capitalized or not.
For those coming across this on google…
You can also use re.sub to match repeating patterns. For example, you can convert a string with spaces to camelCase:
def to_camelcase(string):
string = string[0].lower() + string[1:] # lowercase first
return re.sub(
r'[s]+(?P<first>[a-z])', # match spaces followed by w
lambda m: m.group('first').upper(), # get following w and upper()
string)
to_camelcase('String to convert') # --> stringToConvert
In many programming languages, the following
find foo([a-z]+)bar
and replace with GOOU1GAR
will result in the entire match being made uppercase. I can’t seem to find the equivalent in python; does it exist?
You could use some variation of this:
s = 'foohellobar'
def replfunc(m):
return m.groups()[0]+m.groups()[1].upper()+m.groups()[2]
re.sub('(foo)([a-z]+)(bar)',replfunc,s)
gives the output:
'fooHELLObar'
You can pass a function to re.sub()
that will allow you to do this, here is an example:
def upper_repl(match):
return 'GOO' + match.group(1).upper() + 'GAR'
And an example of using it:
>>> re.sub(r'foo([a-z]+)bar', upper_repl, 'foobazbar')
'GOOBAZGAR'
Unfortunately this U1 syntax could never work in Python because U in a string literal indicates the beginning of a 32-bit hex escape sequence. For example, "U0001f4a9" == " "
.
However there are easy alternative to Perl’s case conversion escapes available by using a replacement function. In re.sub(pattern, repl, string, count=0, flags=0)
the replacement repl
is usually a string, but it can also be a callable. If it is a callable, it’s passed the Match object and must return a replacement string to be used.
So, for the example given in the question, this is possible:
>>> string = "fooquuxbar"
>>> pattern = "foo([a-z]+)bar"
>>> re.sub(pattern, lambda m: f"GOO{m.group(1).upper()}GAR", string)
'GOOQUUXGAR'
Here is a table of other string methods which might be useful for similar case conversions.
Modifier | Description | Example | Python callable to use |
---|---|---|---|
U | Uppercase | foo BAR –> FOO BAR | str.upper |
L | Lowercase | foo BAR –> foo bar | str.lower or str.casefold |
I | Initial capital | foo BAR –> Foo Bar | str.title |
F | First capital | foo BAR –> Foo bar | str.capitalize |
If you already have a replacement string (template), you may not be keen on swapping it out with the verbosity of m.group(1)+...+m.group(2)+...+m.group(3)
… Sometimes it’s nice to have a tidy little string.
You can use the MatchObject
‘s expand() function to evaluate a template for the match in the same manner as sub(), allowing you to retain as much of your original template as possible. You can use upper
on the relevant pieces.
re.sub(r'foo([a-z]+)bar', lambda m: 'GOO' + m.expand(r'1GAR').upper(), 'foobazbar')
While this would not be particularly useful in the example above, and while it does not aid with complex circumstances, it may be more convenient for longer expressions with a greater number of captured groups, such as a MAC address censoring regex, where you just want to ensure the full replacement is capitalized or not.
For those coming across this on google…
You can also use re.sub to match repeating patterns. For example, you can convert a string with spaces to camelCase:
def to_camelcase(string):
string = string[0].lower() + string[1:] # lowercase first
return re.sub(
r'[s]+(?P<first>[a-z])', # match spaces followed by w
lambda m: m.group('first').upper(), # get following w and upper()
string)
to_camelcase('String to convert') # --> stringToConvert