python regular expression "1"
Question:
Can anyone tell me what does “1” mean in the following regular expression in Python?
re.sub(r'(b[a-z]+) 1', r'1', 'cat in the the hat')
Answers:
The first 1
means the first group – i.e. the first bracketed expression (b[a-z]+)
From the docs number
“Matches the contents of the group of the same number. Groups are numbered starting from 1. For example, (.+) 1 matches ‘the the’ or ’55 55′, but not ‘thethe’ (note the space after the group)”
In your case it is looking for a repeated “word” (well, block of lower case letters).
The second 1
is the replacement to use in case of a match, so a repeated word will be replaced by a single word.
From the python docs for the re module:
number
Matches the contents of the group of the same number. Groups are
numbered starting from 1. For example, (.+) 1
matches 'the the'
or
'55 55'
, but not 'thethe'
(note the space after the group). This
special sequence can only be used to match one of the first 99 groups.
If the first digit of number is 0, or number is 3 octal digits long,
it will not be interpreted as a group match, but as the character with
octal value number. Inside the '['
and ']'
of a character class, all
numeric escapes are treated as characters.
Your example is basically the same as what is explained in the docs.
1
is a backreference.
It matches, what ever matched in your brackets, in this case the
You are basically saying
- match empty string at the beginning of a word (b)
- match alphabetical characters from a-z, one or more times
- match the term in brackets again
cat in (‘ ”the’)’ the’ hat
1
is equivalent to re.search(...).group(1)
, the first parentheses-delimited expression inside of the regex.
It’s also, fun fact, part of the reason that regular expressions are significantly slower in Python and other programming languages than required to be by CS theory.
Example
The following code using Python regex to find the repeating digits in given string
import re
result = re.search(r'(d)1{3}','54222267890' )
print result.group()
This gives the output:
2222
r'(b[a-z]+ 1′, ‘1’, ‘cat in the the hat’)
word
next-word
IsMatched()
replace with word
cat
in
No
NA
in
the
No
NA
the
the
Yes
the
the
hat
No
NA
Can anyone tell me what does “1” mean in the following regular expression in Python?
re.sub(r'(b[a-z]+) 1', r'1', 'cat in the the hat')
The first 1
means the first group – i.e. the first bracketed expression (b[a-z]+)
From the docs number
“Matches the contents of the group of the same number. Groups are numbered starting from 1. For example, (.+) 1 matches ‘the the’ or ’55 55′, but not ‘thethe’ (note the space after the group)”
In your case it is looking for a repeated “word” (well, block of lower case letters).
The second 1
is the replacement to use in case of a match, so a repeated word will be replaced by a single word.
From the python docs for the re module:
number
Matches the contents of the group of the same number. Groups are
numbered starting from 1. For example,(.+) 1
matches'the the'
or
'55 55'
, but not'thethe'
(note the space after the group). This
special sequence can only be used to match one of the first 99 groups.
If the first digit of number is 0, or number is 3 octal digits long,
it will not be interpreted as a group match, but as the character with
octal value number. Inside the'['
and']'
of a character class, all
numeric escapes are treated as characters.
Your example is basically the same as what is explained in the docs.
1
is a backreference.
It matches, what ever matched in your brackets, in this case the
You are basically saying
- match empty string at the beginning of a word (b)
- match alphabetical characters from a-z, one or more times
- match the term in brackets again
cat in (‘ ”the’)’ the’ hat
1
is equivalent to re.search(...).group(1)
, the first parentheses-delimited expression inside of the regex.
It’s also, fun fact, part of the reason that regular expressions are significantly slower in Python and other programming languages than required to be by CS theory.
Example
The following code using Python regex to find the repeating digits in given string
import re
result = re.search(r'(d)1{3}','54222267890' )
print result.group()
This gives the output:
2222
r'(b[a-z]+ 1′, ‘1’, ‘cat in the the hat’)
word | next-word | IsMatched() | replace with word |
---|---|---|---|
cat | in | No | NA |
in | the | No | NA |
the | the | Yes | the |
the | hat | No | NA |