Capturing group with findall?
Question:
How can I access captured groups if I do findall(r'regex(with)capturing.goes.here')
?
I know I can do it through finditer
, but I don’t want to iterate.
Answers:
Use groups freely. The matches will be returned as a list of group-tuples:
>>> re.findall('(1(23))45', '12345')
[('123', '23')]
If you want the full match to be included, just enclose the entire regex in a group:
>>> re.findall('(1(23)45)', '12345')
[('12345', '23')]
findall
just returns the captured groups:
>>> re.findall('abc(de)fg(123)', 'abcdefg123 and again abcdefg123')
[('de', '123'), ('de', '123')]
Relevant doc excerpt:
Return all non-overlapping matches of
pattern in string, as a list of
strings. The string is scanned
left-to-right, and matches are
returned in the order found. If one or
more groups are present in the
pattern, return a list of groups; this
will be a list of tuples if the
pattern has more than one group. Empty
matches are included in the result
unless they touch the beginning of
another match.
Several ways are possible:
>>> import re
>>> r = re.compile(r"'(d+)'")
>>> result = r.findall("'1', '2', '345'")
>>> result
['1', '2', '345']
>>> result[0]
'1'
>>> for item in result:
... print(item)
...
1
2
345
>>>
import re
string = 'Perotto, Pier Giorgio'
names = re.findall(r'''
(?P<first>[-w ]+),s #first name
(?P<last> [-w ]+) #last name
''',string, re.X|re.M)
print(names)
returns
[('Perotto', 'Pier Giorgio')]
re.M
would make sense if your string is multiline. Also you need VERBOSE
(equal to re.X
) mode in the regex I’ve written because it is using '''
How can I access captured groups if I do findall(r'regex(with)capturing.goes.here')
?
I know I can do it through finditer
, but I don’t want to iterate.
Use groups freely. The matches will be returned as a list of group-tuples:
>>> re.findall('(1(23))45', '12345')
[('123', '23')]
If you want the full match to be included, just enclose the entire regex in a group:
>>> re.findall('(1(23)45)', '12345')
[('12345', '23')]
findall
just returns the captured groups:
>>> re.findall('abc(de)fg(123)', 'abcdefg123 and again abcdefg123')
[('de', '123'), ('de', '123')]
Relevant doc excerpt:
Return all non-overlapping matches of
pattern in string, as a list of
strings. The string is scanned
left-to-right, and matches are
returned in the order found. If one or
more groups are present in the
pattern, return a list of groups; this
will be a list of tuples if the
pattern has more than one group. Empty
matches are included in the result
unless they touch the beginning of
another match.
Several ways are possible:
>>> import re
>>> r = re.compile(r"'(d+)'")
>>> result = r.findall("'1', '2', '345'")
>>> result
['1', '2', '345']
>>> result[0]
'1'
>>> for item in result:
... print(item)
...
1
2
345
>>>
import re
string = 'Perotto, Pier Giorgio'
names = re.findall(r'''
(?P<first>[-w ]+),s #first name
(?P<last> [-w ]+) #last name
''',string, re.X|re.M)
print(names)
returns
[('Perotto', 'Pier Giorgio')]
re.M
would make sense if your string is multiline. Also you need VERBOSE
(equal to re.X
) mode in the regex I’ve written because it is using '''