Output regex in desired format

Question

For the following code,

import re 
p = re.compile(r'b(?:w*[aeiou]){3}w*',re.I)

print(p.findall('The group contains some of the most dangerous criminals in the country.'))

the regex is matching any word with at least 3 vowels in it.
the expected output format is

[('contains', 'ins'), ('dangerous', 'us'), ('criminals', 'als')]

(the second component will start with the last vowel to the last character of the word)

but I get

['contains', 'dangerous', 'criminals']

How to make it output on my expected format?

Asked By: william007

||

Source

Answer 1

There are two options:

Match all the words, then transform the result with another regex (e.g. with a list comprehension):

last_vowel = re.compile('w*([aeiou]w*)$', re.I);
words = p.findall(…)
print([(w, *last_vowel.findall(w)) for w in words])

Change your regex to capture the word and the last vowel in separate capturing groups:

#                  vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv- first group
p = re.compile(r'b((?:w*[aeiou]){2}w*([aeiou]w*))',re.I)
#                                       ^^^^^^^^^^^^-- second group

Answered By: knittl

Answer 2

You can use a single regex with 2 capture groups, where group 2 is inside group 1 starting with the last vowel to the last character of the word.

Then re.findall will return a list of tuples of the 2 capture group values.

b((?:w*[aeiou]){2}w*([aeiou]w*))

Explanation

b A word boundary
( Capture group 1
- (?:w*[aeiou]){2} Repeat 2 times matching optional word chars and a vowel
- w* Match optional word chars
- ([aeiou]w*) Capture group 2, match the 3rd vowel and optional word chars
) Close group 1

See a regex demo

Example

import re
p = re.compile(r'b((?:w*[aeiou]){2}w*([aeiou]w*))', re.I)

print(p.findall('The group contains some of the most dangerous criminals in the country.'))

Output

[('contains', 'ins'), ('dangerous', 'us'), ('criminals', 'als')]

Answered By: The fourth bird

Output regex in desired format

Question:

Answers: