Why doesn't ignorecase flag (re.I) work in re.sub()

Question:

From pydoc:

re.sub = sub(pattern, repl, string, count=0, flags=0)
Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl. repl can be either a string or a callable;
if a string, backslash escapes in it are processed. If it is
a callable, it’s passed the match object and must return
a replacement string to be used.

example code:

import re
print re.sub('class', 'function', 'Class object', re.I)

No replacement is made unless I change pattern to ‘Class’.

Documentation doesn’t mention anything about this limitation, so I assume I may be doing something wrong.

What’s the case here?

Asked By: theta

||

Answers:

Seems to me that you should be doing:

import re
print(re.sub('class', 'function', 'Class object', flags=re.I))

Without this, the re.I argument is passed to the count argument.

Answered By: André Caron

The flags argument is the fifth one – you’re passing the value of re.I as the count argument (an easy mistake to make).

Answered By: ekhumoro

Note for those who still deal with Python 2.6.x installations or older. Python documentation for 2.6 re says:

re.sub(pattern, repl, string[, count])

re.compile(pattern[, flags])

This means you cannot pass flags directly to sub. They can only be used with compile:

regex = re.compile('class', re.I)
regex.sub("function", "Class object")
Answered By: Seppo Erviälä

Just to add to Seppo’s answer. According to http://docs.python.org/2.6/library/re.html, there is still a way to pass flags directly to ‘sub’ in 2.6 which might be useful if you have to make a 2.7 code with a lot of sub’s compatible with 2.6. To quote the manual:

… if you need to specify regular expression flags, you must use a RE object, or use embedded modifiers in a pattern; for example, sub(“(?i)b+”, “x”, “bbbb BBBB”) returns ‘x x’

and

(?iLmsux) (One or more letters from the set ‘i’, ‘L’, ‘m’, ‘s’, ‘u’, ‘x’.) The group matches the empty string; the letters set the corresponding flags: re.I (ignore case), re.L (locale dependent), re.M (multi-line), re.S (dot matches all), re.U (Unicode dependent), and re.X (verbose), for the entire regular expression. (The flags are described in Module Contents.) This is useful if you wish to include the flags as part of the regular expression, instead of passing a flag argument to the re.compile() function.

In practice, this means

print re.sub("class", "function", "Class object", flags=re.I)

can be rewritten using modifiers (?ms) as

print re.sub("(?i)class", "function", "Class object")

Answered By: Maksym

To avoid mistakes of this kind, the following monkey patching can be used:

import re
re.sub = lambda pattern, repl, string, *, count=0, flags=0, _fun=re.sub: 
    _fun(pattern, repl, string, count=count, flags=flags)

(* is to forbid specifying count, flags as positional arguments. _fun=re.sub is to use the declaration-time re.sub.)

Demo:

$ python
Python 3.4.2 (default, Oct  8 2014, 10:45:20) 
[GCC 4.9.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.sub(r'b or b', ',', 'or x', re.X)
'or x'   # ?!
>>> re.sub = lambda pattern, repl, string, *, count=0, flags=0, _fun=re.sub: 
...     _fun(pattern, repl, string, count=count, flags=flags)
>>> re.sub(r'b or b', ',', 'or x', re.X)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: <lambda>() takes 3 positional arguments but 4 were given
>>> re.sub(r'b or b', ',', 'or x', flags=re.X)
', x'
>>> 
Answered By: Kirill Bulygin
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.