How to set ignorecase flag for part of regular expression in Python?

Question:

Is it possible to implement in Python something like this simple one:

#!/usr/bin/perl
my $a = 'Use HELLO1 code';
if($a =~ /(?i:use)s+([A-Z0-9]+)s+(?i:code)/){
    print "$1n";
}

Letters of token in the middle of string are always capital. Letters of the rest of words can have any case (USE, use, Use, CODE, code, Code and so on)

Asked By: Dmitry Nedbaylo

||

Answers:

According to the docs, this is not possible. The (?x) syntax only allows you to modify a flag for the whole expression. Therefore, you must split this into three regexp and apply them one after the other or do the “ignore case” manually: /[uU][sS][eE]...

Answered By: Aaron Digulla

As far as I could find, the python regular expression engine does not support partial ignore-case. Here is a solution using a case-insensitive regular expression, which then tests if the token is uppercase afterward.

#! /usr/bin/env python

import re

token_re = re.compile(r'uses+([a-z0-9]+)s+code', re.IGNORECASE)
def find_token(s):
    m = token_re.search(s)
    if m is not None:
        token = m.group(1)
        if token.isupper():
            return token

if __name__ == '__main__':
    for s in ['Use HELLO1 code',
              'USE hello1 CODE',
              'this does not match',
             ]:
        print s, '->',
        print find_token(s)

Here is the program’s output:

Use HELLO1 code -> HELLO1
USE hello1 CODE -> None
this does not match -> None
Answered By: Christian Oudard

Since python 3.6 you can use flag inside groups :

(?imsx-imsx:…)

(Zero or more letters from the set ‘i’, ‘m’, ‘s’, ‘x’, optionally followed by ‘-‘ followed by one or more letters from the same set.) The letters set or removes the corresponding flags: re.I (ignore case), re.M (multi-line), re.S (dot matches all), and re.X (verbose), for the part of the expression.

Thus (?i:use) is now a correct syntaxe. From a python3.6 terminal:

>>> import re
>>> regex = re.compile('(?i:use)s+([A-Z0-9]+)s+(?i:code)')
>>> regex.match('Use HELLO1 code')
<_sre.SRE_Match object; span=(0, 15), match='Use HELLO1 code'>
>>> regex.match('use HELLO1 Code')
<_sre.SRE_Match object; span=(0, 15), match='use HELLO1 Code'>
Answered By: Thomas Perrot
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.