Making letters uppercase using re.sub in python?

Question:

In many programming languages, the following

find foo([a-z]+)bar and replace with GOOU1GAR

will result in the entire match being made uppercase. I can’t seem to find the equivalent in python; does it exist?

Asked By: Jordan Reiter

||

Answers:

You could use some variation of this:

s = 'foohellobar'
def replfunc(m):
     return m.groups()[0]+m.groups()[1].upper()+m.groups()[2]
re.sub('(foo)([a-z]+)(bar)',replfunc,s)

gives the output:

'fooHELLObar'
Answered By: highBandWidth

You can pass a function to re.sub() that will allow you to do this, here is an example:

 def upper_repl(match):
     return 'GOO' + match.group(1).upper() + 'GAR'

And an example of using it:

 >>> re.sub(r'foo([a-z]+)bar', upper_repl, 'foobazbar')
 'GOOBAZGAR'
Answered By: Andrew Clark

Unfortunately this U1 syntax could never work in Python because U in a string literal indicates the beginning of a 32-bit hex escape sequence. For example, "U0001f4a9" == " ".

However there are easy alternative to Perl’s case conversion escapes available by using a replacement function. In re.sub(pattern, repl, string, count=0, flags=0) the replacement repl is usually a string, but it can also be a callable. If it is a callable, it’s passed the Match object and must return a replacement string to be used.

So, for the example given in the question, this is possible:

>>> string = "fooquuxbar"
>>> pattern = "foo([a-z]+)bar"
>>> re.sub(pattern, lambda m: f"GOO{m.group(1).upper()}GAR", string)
'GOOQUUXGAR'

Here is a table of other string methods which might be useful for similar case conversions.

Modifier Description Example Python callable to use
U Uppercase foo BAR –> FOO BAR str.upper
L Lowercase foo BAR –> foo bar str.lower or str.casefold
I Initial capital foo BAR –> Foo Bar str.title
F First capital foo BAR –> Foo bar str.capitalize
Answered By: wim

If you already have a replacement string (template), you may not be keen on swapping it out with the verbosity of m.group(1)+...+m.group(2)+...+m.group(3)… Sometimes it’s nice to have a tidy little string.

You can use the MatchObject‘s expand() function to evaluate a template for the match in the same manner as sub(), allowing you to retain as much of your original template as possible. You can use upper on the relevant pieces.

re.sub(r'foo([a-z]+)bar', lambda m: 'GOO' + m.expand(r'1GAR').upper(), 'foobazbar')

While this would not be particularly useful in the example above, and while it does not aid with complex circumstances, it may be more convenient for longer expressions with a greater number of captured groups, such as a MAC address censoring regex, where you just want to ensure the full replacement is capitalized or not.

Answered By: it4qfixmqy

For those coming across this on google…

You can also use re.sub to match repeating patterns. For example, you can convert a string with spaces to camelCase:

def to_camelcase(string):
  string = string[0].lower() + string[1:]  # lowercase first
  return re.sub(
    r'[s]+(?P<first>[a-z])',              # match spaces followed by w
    lambda m: m.group('first').upper(),    # get following w and upper()
    string) 

to_camelcase('String to convert')          # --> stringToConvert
Answered By: Michael Delgado
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.