Output regex in desired format

Question:

For the following code,

import re 
p = re.compile(r'b(?:w*[aeiou]){3}w*',re.I)

print(p.findall('The group contains some of the most dangerous criminals in the country.'))

the regex is matching any word with at least 3 vowels in it.
the expected output format is

[('contains', 'ins'), ('dangerous', 'us'), ('criminals', 'als')] 

(the second component will start with the last vowel to the last character of the word)

but I get

['contains', 'dangerous', 'criminals']

How to make it output on my expected format?

Asked By: william007

||

Answers:

There are two options:

  1. Match all the words, then transform the result with another regex (e.g. with a list comprehension):

    last_vowel = re.compile('w*([aeiou]w*)$', re.I);
    words = p.findall(…)
    print([(w, *last_vowel.findall(w)) for w in words])
    
  2. Change your regex to capture the word and the last vowel in separate capturing groups:

    #                  vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv- first group
    p = re.compile(r'b((?:w*[aeiou]){2}w*([aeiou]w*))',re.I)
    #                                       ^^^^^^^^^^^^-- second group
    
Answered By: knittl

You can use a single regex with 2 capture groups, where group 2 is inside group 1 starting with the last vowel to the last character of the word.

Then re.findall will return a list of tuples of the 2 capture group values.

b((?:w*[aeiou]){2}w*([aeiou]w*))

Explanation

  • b A word boundary
  • ( Capture group 1
    • (?:w*[aeiou]){2} Repeat 2 times matching optional word chars and a vowel
    • w* Match optional word chars
    • ([aeiou]w*) Capture group 2, match the 3rd vowel and optional word chars
  • ) Close group 1

See a regex demo

Example

import re
p = re.compile(r'b((?:w*[aeiou]){2}w*([aeiou]w*))', re.I)

print(p.findall('The group contains some of the most dangerous criminals in the country.'))

Output

[('contains', 'ins'), ('dangerous', 'us'), ('criminals', 'als')]
Answered By: The fourth bird
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.