Case insensitive regular expression without re.compile?
Question:
In Python, I can compile a regular expression to be case-insensitive using re.compile
:
>>> s = 'TeSt'
>>> casesensitive = re.compile('test')
>>> ignorecase = re.compile('test', re.IGNORECASE)
>>>
>>> print casesensitive.match(s)
None
>>> print ignorecase.match(s)
<_sre.SRE_Match object at 0x02F0B608>
Is there a way to do the same, but without using re.compile
. I can’t find anything like Perl’s i
suffix (e.g. m/test/i
) in the documentation.
Answers:
You can also perform case insensitive searches using search/match without the IGNORECASE flag (tested in Python 2.7.3):
re.search(r'(?i)test', 'TeSt').group() ## returns 'TeSt'
re.match(r'(?i)test', 'TeSt').group() ## returns 'TeSt'
You can also define case insensitive during the pattern compile:
pattern = re.compile('FIle:/+(.*)', re.IGNORECASE)
#'re.IGNORECASE' for case insensitive results short form re.I
#'re.match' returns the first match located from the start of the string.
#'re.search' returns location of the where the match is found
#'re.compile' creates a regex object that can be used for multiple matches
>>> s = r'TeSt'
>>> print (re.match(s, r'test123', re.I))
<_sre.SRE_Match object; span=(0, 4), match='test'>
# OR
>>> pattern = re.compile(s, re.I)
>>> print(pattern.match(r'test123'))
<_sre.SRE_Match object; span=(0, 4), match='test'>
The case-insensitive marker, (?i)
can be incorporated directly into the regex pattern:
>>> import re
>>> s = 'This is one Test, another TEST, and another test.'
>>> re.findall('(?i)test', s)
['Test', 'TEST', 'test']
In imports
import re
In run time processing:
RE_TEST = r'test'
if re.match(RE_TEST, 'TeSt', re.IGNORECASE):
It should be mentioned that not using re.compile
is wasteful. Every time the above match method is called, the regular expression will be compiled. This is also faulty practice in other programming languages. The below is the better practice.
In app initialization:
self.RE_TEST = re.compile('test', re.IGNORECASE)
In run time processing:
if self.RE_TEST.match('TeSt'):
To perform case-insensitive operations, supply re.IGNORECASE
>>> import re
>>> test = 'UPPER TEXT, lower text, Mixed Text'
>>> re.findall('text', test, flags=re.IGNORECASE)
['TEXT', 'text', 'Text']
and if we want to replace text matching the case…
>>> def matchcase(word):
def replace(m):
text = m.group()
if text.isupper():
return word.upper()
elif text.islower():
return word.lower()
elif text[0].isupper():
return word.capitalize()
else:
return word
return replace
>>> re.sub('text', matchcase('word'), test, flags=re.IGNORECASE)
'UPPER WORD, lower word, Mixed Word'
If you would like to replace but still keeping the style of previous str. It is possible.
For example: highlight the string “test asdasd TEST asd tEst asdasd”.
sentence = "test asdasd TEST asd tEst asdasd"
result = re.sub(
'(test)',
r'<b>1</b>', # 1 here indicates first matching group.
sentence,
flags=re.IGNORECASE)
test asdasd TEST asd tEst asdasd
For Case insensitive regular expression(Regex):
There are two ways by adding in your code:
-
flags=re.IGNORECASE
Regx3GList = re.search("(WCDMA:)((d*)(,?))*", txt, re.IGNORECASE)
-
The case-insensitive marker (?i)
Regx3GList = re.search("**(?i)**(WCDMA:)((d*)(,?))*", txt)
(?i)
match the remainder of the pattern with the following effective flags: i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
>>> import pandas as pd
>>> s = pd.DataFrame({ 'a': ["TeSt"] })
>>> r = s.replace(to_replace=r'(?i)test', value=r'TEST', regex=True)
>>> print(r)
a
0 TEST
In Python, I can compile a regular expression to be case-insensitive using re.compile
:
>>> s = 'TeSt'
>>> casesensitive = re.compile('test')
>>> ignorecase = re.compile('test', re.IGNORECASE)
>>>
>>> print casesensitive.match(s)
None
>>> print ignorecase.match(s)
<_sre.SRE_Match object at 0x02F0B608>
Is there a way to do the same, but without using re.compile
. I can’t find anything like Perl’s i
suffix (e.g. m/test/i
) in the documentation.
You can also perform case insensitive searches using search/match without the IGNORECASE flag (tested in Python 2.7.3):
re.search(r'(?i)test', 'TeSt').group() ## returns 'TeSt'
re.match(r'(?i)test', 'TeSt').group() ## returns 'TeSt'
You can also define case insensitive during the pattern compile:
pattern = re.compile('FIle:/+(.*)', re.IGNORECASE)
#'re.IGNORECASE' for case insensitive results short form re.I
#'re.match' returns the first match located from the start of the string.
#'re.search' returns location of the where the match is found
#'re.compile' creates a regex object that can be used for multiple matches
>>> s = r'TeSt'
>>> print (re.match(s, r'test123', re.I))
<_sre.SRE_Match object; span=(0, 4), match='test'>
# OR
>>> pattern = re.compile(s, re.I)
>>> print(pattern.match(r'test123'))
<_sre.SRE_Match object; span=(0, 4), match='test'>
The case-insensitive marker, (?i)
can be incorporated directly into the regex pattern:
>>> import re
>>> s = 'This is one Test, another TEST, and another test.'
>>> re.findall('(?i)test', s)
['Test', 'TEST', 'test']
In imports
import re
In run time processing:
RE_TEST = r'test'
if re.match(RE_TEST, 'TeSt', re.IGNORECASE):
It should be mentioned that not using re.compile
is wasteful. Every time the above match method is called, the regular expression will be compiled. This is also faulty practice in other programming languages. The below is the better practice.
In app initialization:
self.RE_TEST = re.compile('test', re.IGNORECASE)
In run time processing:
if self.RE_TEST.match('TeSt'):
To perform case-insensitive operations, supply re.IGNORECASE
>>> import re
>>> test = 'UPPER TEXT, lower text, Mixed Text'
>>> re.findall('text', test, flags=re.IGNORECASE)
['TEXT', 'text', 'Text']
and if we want to replace text matching the case…
>>> def matchcase(word):
def replace(m):
text = m.group()
if text.isupper():
return word.upper()
elif text.islower():
return word.lower()
elif text[0].isupper():
return word.capitalize()
else:
return word
return replace
>>> re.sub('text', matchcase('word'), test, flags=re.IGNORECASE)
'UPPER WORD, lower word, Mixed Word'
If you would like to replace but still keeping the style of previous str. It is possible.
For example: highlight the string “test asdasd TEST asd tEst asdasd”.
sentence = "test asdasd TEST asd tEst asdasd"
result = re.sub(
'(test)',
r'<b>1</b>', # 1 here indicates first matching group.
sentence,
flags=re.IGNORECASE)
test asdasd TEST asd tEst asdasd
For Case insensitive regular expression(Regex):
There are two ways by adding in your code:
-
flags=re.IGNORECASE
Regx3GList = re.search("(WCDMA:)((d*)(,?))*", txt, re.IGNORECASE)
-
The case-insensitive marker
(?i)
Regx3GList = re.search("**(?i)**(WCDMA:)((d*)(,?))*", txt)
(?i)
match the remainder of the pattern with the following effective flags: i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
>>> import pandas as pd
>>> s = pd.DataFrame({ 'a': ["TeSt"] })
>>> r = s.replace(to_replace=r'(?i)test', value=r'TEST', regex=True)
>>> print(r)
a
0 TEST