re.findall which returns a dict of named capturing groups?

Question

Inspired by a now-deleted question; given a regex with named groups, is there a method like findall which returns a list of dict with the named capturing groups instead of a list of tuple?

Given:

>>> import re
>>> text = "bob sue jon richard harry"
>>> pat = re.compile('(?P<name>[a-z]+)s+(?P<name2>[a-z]+)')
>>> pat.findall(text)
[('bob', 'sue'), ('jon', 'richard')]

Should instead give:

[{'name': 'bob', 'name2': 'sue'}, {'name': 'jon', 'name2': 'richard'}]

Asked By: beerbajay

||

Source

Answer 1

There’s no built-in method for doing this, but the expected result can be achieved by using list comprehensions.

[dict([[k, i if isinstance(i, str) else i[v-1]] for k,v in pat.groupindex.items()]) for i in pat.findall(text)]

With friendly formatting:

>>> [
...     dict([
...         [k, i if isinstance(i, str) else i[v-1]]
...         for k,v in pat.groupindex.items()
...     ])
...     for i in pat.findall(text)
... ]

We construct a list using a list comprehension, iterate over the result from findall which is either a list of strings or a list of tuples (0 or 1 capturing groups result in a list of str).

For each item in the result we construct a dict from another list comprehension which is generated from the groupindex field of the compiled pattern, which looks like:

>>> pat.groupindex
{'name2': 2, 'name': 1}

A list is constructed for each item in the groupindex and if the item from findall was a tuple, the group number from groupindex is used to find the correct item, otherwise the item is assigned to the (only extant) named group.

[k, i if isinstance(i, str) else i[v-1]]

Finally, a dict is constructed from the list of lists of strings.

Note that groupindex contains only named groups, so non-named capturing groups will be omitted from the resulting dict.

And the result:

[dict([[k, i if isinstance(i, str) else i[v-1]] for k,v in pat.groupindex.items()])  for i in pat.findall(text)]
[{'name2': 'sue', 'name': 'bob'}, {'name2': 'richard', 'name': 'jon'}]

Answered By: beerbajay

Answer 2

Using Pattern.finditer() then Match.groupdict():

>>> import re
>>> s = "bob sue jon richard harry"
>>> r = re.compile('(?P<name>[a-z]+)s+(?P<name2>[a-z]+)')
>>> [m.groupdict() for m in r.finditer(s)]
[{'name2': 'sue', 'name': 'bob'}, {'name2': 'richard', 'name': 'jon'}]

Answered By: Nolen Royalty

Answer 3

you could switch to finditer

>>> import re
>>> text = "bob sue jon richard harry"
>>> pat = re.compile('(?P<name>[a-z]+)s+(?P<name2>[a-z]+)')
>>> for m in pat.finditer(text):
...     print m.groupdict()
... 
{'name2': 'sue', 'name': 'bob'}
{'name2': 'richard', 'name': 'jon'}

Answered By: iruvar

Answer 4

If you are using match :

r = re.match('(?P<name>[a-z]+)s+(?P<name2>[a-z]+)', text)
r.groupdict()

documentation here

Answered By: Lucas B

re.findall which returns a dict of named capturing groups?

Question:

Answers: