Trying to understand the difference in what matches and the resulting output for findall vs finditer
Question:
- Using findall:
import re
target_string = "please sir, that's obviously a clip-on."
result = re.findall(r"[a-z]+('[a-z])?[a-z]*", target_string)
print(result)
# result: ['', '', "'s", '', '', '', '']
- Using finditer:
import re
target_string ="please sir, that's obviously a clip-on."
result = re.finditer(r"[a-z]+('[a-z])?[a-z]*", target_string)
matched = []
for match_obj in result:
matched.append(match_obj.group())
print(matched)
# result: ['please', 'sir', "that's", 'obviously', 'a', 'clip', 'on']
How does these two methods match patterns and why is there a difference in resulting output. Please explain.
Tried to read the docs but still confused on the workings of findall vs finditer
Answers:
In the findall
case, the output will be the capturing group ('[a-z])
.
If you want the full match transform your group into a non-capturing one (?:'[a-z])
:
target_string = "please sir, that's obviously a clip-on."
result = re.findall(r"[a-z]+(?:'[a-z])?[a-z]*", target_string)
print(result)
Output:
['please', 'sir', "that's", 'obviously', 'a', 'clip', 'on']
Note that if you have multiple capturing groups, findall
will return a tuple of them:
re.findall(r"([a-z]+('[a-z])?[a-z]*)", target_string)
[('please', ''), ('sir', ''), ("that's", "'s"), ('obviously', ''), ('a', ''), ('clip', ''), ('on', '')]
- Using findall:
import re
target_string = "please sir, that's obviously a clip-on."
result = re.findall(r"[a-z]+('[a-z])?[a-z]*", target_string)
print(result)
# result: ['', '', "'s", '', '', '', '']
- Using finditer:
import re
target_string ="please sir, that's obviously a clip-on."
result = re.finditer(r"[a-z]+('[a-z])?[a-z]*", target_string)
matched = []
for match_obj in result:
matched.append(match_obj.group())
print(matched)
# result: ['please', 'sir', "that's", 'obviously', 'a', 'clip', 'on']
How does these two methods match patterns and why is there a difference in resulting output. Please explain.
Tried to read the docs but still confused on the workings of findall vs finditer
In the findall
case, the output will be the capturing group ('[a-z])
.
If you want the full match transform your group into a non-capturing one (?:'[a-z])
:
target_string = "please sir, that's obviously a clip-on."
result = re.findall(r"[a-z]+(?:'[a-z])?[a-z]*", target_string)
print(result)
Output:
['please', 'sir', "that's", 'obviously', 'a', 'clip', 'on']
Note that if you have multiple capturing groups, findall
will return a tuple of them:
re.findall(r"([a-z]+('[a-z])?[a-z]*)", target_string)
[('please', ''), ('sir', ''), ("that's", "'s"), ('obviously', ''), ('a', ''), ('clip', ''), ('on', '')]