Python – Using regex to find multiple matches and print them out

Question:

I need to find content of forms from HTML source file, I did some searching and found very good method to do that, but the problem is that it prints out only first found, how can I loop through it and output all form contents, not just first one?

line = 'bla bla bla<form>Form 1</form> some text...<form>Form 2</form> more text?'
matchObj = re.search('<form>(.*?)</form>', line, re.S)
print matchObj.group(1)
# Output: Form 1
# I need it to output every form content he found, not just first one...
Asked By: Stan

||

Answers:

Using regexes for this purpose is the wrong approach. Since you are using python you have a really awesome library available to extract parts from HTML documents: BeautifulSoup.

Answered By: ThiefMaster

Do not use regular expressions to parse HTML.

But if you ever need to find all regexp matches in a string, use the findall function.

import re
line = 'bla bla bla<form>Form 1</form> some text...<form>Form 2</form> more text?'
matches = re.findall('<form>(.*?)</form>', line, re.DOTALL)
print(matches)

# Output: ['Form 1', 'Form 2']
Answered By: Petr Viktorin

Instead of using re.search use re.findall it will return you all matches in a List. Or you could also use re.finditer (which i like most to use) it will return an Iterator Object and you can just use it to iterate over all found matches.

line = 'bla bla bla<form>Form 1</form> some text...<form>Form 2</form> more text?'
for match in re.finditer('<form>(.*?)</form>', line, re.S):
    print match.group(1)
Answered By: Aamir Rind
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.