Output regex in desired format
Question:
For the following code,
import re
p = re.compile(r'b(?:w*[aeiou]){3}w*',re.I)
print(p.findall('The group contains some of the most dangerous criminals in the country.'))
the regex is matching any word with at least 3 vowels in it.
the expected output format is
[('contains', 'ins'), ('dangerous', 'us'), ('criminals', 'als')]
(the second component will start with the last vowel to the last character of the word)
but I get
['contains', 'dangerous', 'criminals']
How to make it output on my expected format?
Answers:
There are two options:
-
Match all the words, then transform the result with another regex (e.g. with a list comprehension):
last_vowel = re.compile('w*([aeiou]w*)$', re.I);
words = p.findall(…)
print([(w, *last_vowel.findall(w)) for w in words])
-
Change your regex to capture the word and the last vowel in separate capturing groups:
# vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv- first group
p = re.compile(r'b((?:w*[aeiou]){2}w*([aeiou]w*))',re.I)
# ^^^^^^^^^^^^-- second group
You can use a single regex with 2 capture groups, where group 2 is inside group 1 starting with the last vowel to the last character of the word.
Then re.findall will return a list of tuples of the 2 capture group values.
b((?:w*[aeiou]){2}w*([aeiou]w*))
Explanation
b
A word boundary
(
Capture group 1
(?:w*[aeiou]){2}
Repeat 2 times matching optional word chars and a vowel
w*
Match optional word chars
([aeiou]w*)
Capture group 2, match the 3rd vowel and optional word chars
)
Close group 1
See a regex demo
Example
import re
p = re.compile(r'b((?:w*[aeiou]){2}w*([aeiou]w*))', re.I)
print(p.findall('The group contains some of the most dangerous criminals in the country.'))
Output
[('contains', 'ins'), ('dangerous', 'us'), ('criminals', 'als')]
For the following code,
import re
p = re.compile(r'b(?:w*[aeiou]){3}w*',re.I)
print(p.findall('The group contains some of the most dangerous criminals in the country.'))
the regex is matching any word with at least 3 vowels in it.
the expected output format is
[('contains', 'ins'), ('dangerous', 'us'), ('criminals', 'als')]
(the second component will start with the last vowel to the last character of the word)
but I get
['contains', 'dangerous', 'criminals']
How to make it output on my expected format?
There are two options:
-
Match all the words, then transform the result with another regex (e.g. with a list comprehension):
last_vowel = re.compile('w*([aeiou]w*)$', re.I); words = p.findall(…) print([(w, *last_vowel.findall(w)) for w in words])
-
Change your regex to capture the word and the last vowel in separate capturing groups:
# vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv- first group p = re.compile(r'b((?:w*[aeiou]){2}w*([aeiou]w*))',re.I) # ^^^^^^^^^^^^-- second group
You can use a single regex with 2 capture groups, where group 2 is inside group 1 starting with the last vowel to the last character of the word.
Then re.findall will return a list of tuples of the 2 capture group values.
b((?:w*[aeiou]){2}w*([aeiou]w*))
Explanation
b
A word boundary(
Capture group 1(?:w*[aeiou]){2}
Repeat 2 times matching optional word chars and a vowelw*
Match optional word chars([aeiou]w*)
Capture group 2, match the 3rd vowel and optional word chars
)
Close group 1
See a regex demo
Example
import re
p = re.compile(r'b((?:w*[aeiou]){2}w*([aeiou]w*))', re.I)
print(p.findall('The group contains some of the most dangerous criminals in the country.'))
Output
[('contains', 'ins'), ('dangerous', 'us'), ('criminals', 'als')]