Beautiful Soup if Class "Contains" or Regex?
Question:
If my class names are constantly different say for example:
listing-col-line-3-11 dpt 41
listing-col-block-1-22 dpt 41
listing-col-line-4-13 CWK 12
Normally I could do:
for EachPart in soup.find_all("div", {"class" : "ClassNamesHere"}):
print EachPart.get_text()
There are way too many class names to work with here so a bunch of these are out.
I know Python doesn’t have a “.contains” I would normally use but it does have an “in”. Though I haven’t been able to work out a way to incorporate that.
I’m hoping there’s a way to do this with regex. Though again my Python syntax is really letting me down I’ve been trying variations on:
regex = re.compile('.*listing-col-.*')
for EachPart in soup.find_all(regex):
But that doesn’t seem to be doing the trick.
Answers:
You can try this for loop:
regex = re.compile('.*listing-col-.*')
for EachPart in soup.find_all("div", {"class" : regex}):
print EachPart.get_text()
BeautifulSoup supports CSS selectors which allow you to select elements based on the content of particular attributes. This includes the selector *=
for contains.
The following will return all div
elements with a class
attribute containing the text ‘listing-col-‘:
for EachPart in soup.select('div[class*="listing-col-"]'):
print EachPart.get_text()
You could avoid regex by using partial matching with gazpacho…
Input:
html = """
<div class="listing-col-line-3-11 dpt 41">A</div>
<div class="listing-col-block-1-22 dpt 41">B</div>
<div class="listing-col-line-4-13 CWK 12">C</div>
"""
Partial matching code:
from gazpacho import Soup
soup = Soup(html)
divs = soup.find("div", {"class": "listing-col-"}, partial=True)
[div.text for div in divs]
Output:
['A', 'B', 'C']
If my class names are constantly different say for example:
listing-col-line-3-11 dpt 41
listing-col-block-1-22 dpt 41
listing-col-line-4-13 CWK 12
Normally I could do:
for EachPart in soup.find_all("div", {"class" : "ClassNamesHere"}):
print EachPart.get_text()
There are way too many class names to work with here so a bunch of these are out.
I know Python doesn’t have a “.contains” I would normally use but it does have an “in”. Though I haven’t been able to work out a way to incorporate that.
I’m hoping there’s a way to do this with regex. Though again my Python syntax is really letting me down I’ve been trying variations on:
regex = re.compile('.*listing-col-.*')
for EachPart in soup.find_all(regex):
But that doesn’t seem to be doing the trick.
You can try this for loop:
regex = re.compile('.*listing-col-.*')
for EachPart in soup.find_all("div", {"class" : regex}):
print EachPart.get_text()
BeautifulSoup supports CSS selectors which allow you to select elements based on the content of particular attributes. This includes the selector *=
for contains.
The following will return all div
elements with a class
attribute containing the text ‘listing-col-‘:
for EachPart in soup.select('div[class*="listing-col-"]'):
print EachPart.get_text()
You could avoid regex by using partial matching with gazpacho…
Input:
html = """
<div class="listing-col-line-3-11 dpt 41">A</div>
<div class="listing-col-block-1-22 dpt 41">B</div>
<div class="listing-col-line-4-13 CWK 12">C</div>
"""
Partial matching code:
from gazpacho import Soup
soup = Soup(html)
divs = soup.find("div", {"class": "listing-col-"}, partial=True)
[div.text for div in divs]
Output:
['A', 'B', 'C']