Matching partial ids in BeautifulSoup

Question:

I’m using BeautifulSoup. I have to find any reference to the <div> tags with id like: post-#.

For example:

<div id="post-45">...</div>
<div id="post-334">...</div>

I have tried:

html = '<div id="post-45">...</div> <div id="post-334">...</div>'
soupHandler = BeautifulSoup(html)
print soupHandler.findAll('div', id='post-*')

How can I filter this?

Asked By: Max Frai

||

Answers:

You can pass a function to findAll:

>>> print soupHandler.findAll('div', id=lambda x: x and x.startswith('post-'))
[<div id="post-45">...</div>, <div id="post-334">...</div>]

Or a regular expression:

>>> print soupHandler.findAll('div', id=re.compile('^post-'))
[<div id="post-45">...</div>, <div id="post-334">...</div>]
Answered By: Mark Byers
soupHandler.findAll('div', id=re.compile("^post-$"))

looks right to me.

Answered By: Auston

Since he is asking to match “post-#somenumber#”, it’s better to precise with

import re
[...]
soupHandler.findAll('div', id=re.compile("^post-d+"))
Answered By: xiamx

This works for me:

from bs4 import BeautifulSoup
import re

html = '<div id="post-45">...</div> <div id="post-334">...</div>'
soupHandler = BeautifulSoup(html)

for match in soupHandler.find_all('div', id=re.compile("post-")):
    print match.get('id')

>>> 
post-45
post-334
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.