How to get group name of match regular expression in Python?

Question:

Question is very basic whatever I do not know how to figure out group name from match. Let me explain in code:

import re    
a = list(re.finditer('(?P<name>[^Wd_]+)|(?P<number>d+)', 'Ala ma kota'))

How to get group name of a[0].group(0) match – assume that number of named patterns can be larger?

Example is simplified to learn basics.

I can invert match a[0].groupdict() but it will be slow.

Asked By: Chameleon

||

Answers:

You can get this information from the compiled expression:

>>> pattern = re.compile(r'(?P<name>w+)|(?P<number>d+)')
>>> pattern.groupindex
{'name': 1, 'number': 2}

This uses the RegexObject.groupindex attribute:

A dictionary mapping any symbolic group names defined by (?P<id>) to group numbers. The dictionary is empty if no symbolic groups were used in the pattern.

If you only have access to the match object, you can get to the pattern with the MatchObject.re attribute:

>>> a = list(re.finditer(r'(?P<name>w+)|(?P<number>d+)', 'Ala ma kota'))
>>> a[0]
<_sre.SRE_Match object at 0x100264ad0>
>>> a[0].re.groupindex
{'name': 1, 'number': 2}

If all you wanted to know what group matched look at the value; None means a group never was used in a match:

>>> a[0].groupdict()
{'name': 'Ala', 'number': None}

The number group never used to match anything because its value is None.

You can then find the names used in the regular expression with:

names_used = [name for name, value in matchobj.groupdict().iteritems() if value is not None]

or if there is only ever one group that can match, you can use MatchObject.lastgroup:

name_used = matchobj.lastgroup

As a side note, your regular expression has a fatal flaw; everything that d matches, is also matched by w. You’ll never see number used where name can match first. Reverse the pattern to avoid this:

>>> for match in re.finditer(r'(?P<name>w+)|(?P<number>d+)', 'word 42'):
...     print match.lastgroup
... 
name
name
>>> for match in re.finditer(r'(?P<number>d+)|(?P<name>w+)', 'word 42'):
...     print match.lastgroup
... 
name
number

but take into account that words starting with digits will still confuse things for your simple case:

>>> for match in re.finditer(r'(?P<number>d+)|(?P<name>w+)', 'word42 42word'):
...     print match.lastgroup, repr(match.group(0))
... 
name 'word42'
number '42'
name 'word'
Answered By: Martijn Pieters

First of all your regular expression is syntactically wrong: you should write it as r'(?P<name>w+)|(?P<number>d+)'. Moreover even this reg expr does not work, since the special sequence w matches all alphanumeric characters and hence also all characters matched by d.
You should change it to r'(?P<number>d+)|(?P<name>w+)' to give d precedence over w.
However you can get the name of the matching group by using the attribute lastgroup of the matched objects, i.e.:
[m.lastgroup for m in re.finditer(r'(?P<number>d+)|(?P<name>w+)', 'Ala ma 123 kota')]
producing:
['name', 'name', 'number', 'name']

Answered By: davidedb
name_pattern = "(((s+)?)((?P<HeadCount>[0-9]{1,2})(?P<LastName>[A-Z]{1,})((([/]{1,})?)((?P<FirstName>[A-Z]{1,})?)){0,}){1,})"

name_text = "1GILL/HAROONCONSTANTSHER 1HAROON/ANILAMS"
for match in re.finditer(name_pattern,name_text):
    print(match["LastName"])
Answered By: Muthu Kusalavan

Hello, I need to extract the data in highlights from several files, using py, how do I do that?

Hello, I need to extract the data in highlights from several files, using py, how do I do that?

Answered By: LUCAS SILVA
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.