Python re.search

Question:

I have a string variable containing

string = "123hello456world789"

string contain no spacess. I want to write a regex such that prints only words containing(a-z)
I tried a simple regex

pat = "([a-z]+){1,}"
match = re.search(pat, word, re.DEBUG)

match object contains only the word Hello and the word World is not matched.

When is used re.findall() I could get both Hello and World.

My question is why we can’t do this with re.search()?

How do this with re.search()?

Asked By: Krishna M

||

Answers:

re.search() finds the pattern once in the string, documenation:

Scan through string looking for a location where the regular
expression pattern produces a match, and return a corresponding
MatchObject instance. Return None if no position in the string matches
the pattern; note that this is different from finding a zero-length
match at some point in the string.

In order to match every occurrence, you need re.findall(), documentation:

Return all non-overlapping matches of pattern in string, as a list of
strings. The string is scanned left-to-right, and matches are returned
in the order found. If one or more groups are present in the pattern,
return a list of groups; this will be a list of tuples if the pattern
has more than one group. Empty matches are included in the result
unless they touch the beginning of another match.

Example:

>>> import re
>>> regex = re.compile(r'([a-z]+)', re.I)
>>> # using search we only get the first item.
>>> regex.search("123hello456world789").groups()
('hello',)
>>> # using findall we get every item.
>>> regex.findall("123hello456world789")
['hello', 'world']

UPDATE:

Due to your duplicate question (as discussed at this link) I have added my other answer here as well:

>>> import re
>>> regex = re.compile(r'([a-z][a-z-']+[a-z])')
>>> regex.findall("HELLO W-O-R-L-D") # this has uppercase
[]  # there are no results here, because the string is uppercase
>>> regex.findall("HELLO W-O-R-L-D".lower()) # lets lowercase
['hello', 'w-o-r-l-d'] # now we have results
>>> regex.findall("123hello456world789")
['hello', 'world']

As you can see, the reason why you were failing on the first sample you provided is because of the uppercase, you can simply add the re.IGNORECASE flag, though you mentioned that matches should be lowercase only.

Answered By: Inbar Rose

@InbarRose answer shows why re.search works that way, but if you want match objects rather than just the string outputs from re.findall, use re.finditer

>>> for match in re.finditer(pat, string):
...     print match.groups()
...
('hello',)
('world',)
>>>

Or alternatively if you wanted a list

>>> list(re.finditer(pat, string))
[<_sre.SRE_Match object at 0x022DB320>, <_sre.SRE_Match object at 0x022DB660>]

It’s also generally a bad idea to use string as a variable name given that it’s a common module.

Answered By: Peter Gibson
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.