BeautifulSoup — extracting both "td" objects without class (_class = None or False) and other class types

Question:

I am trying to scrap from a website that has td objects. Some of those have no class, which I can extract with

object.find_all("td", class_=None)

And others have a class called sem_dados, which I can extract using

object.find_all("td", class_="sem_dados")

Main issue is: I can’t do both at the same time. For instance,

object.find_all("td", class_=[None, "sem_dados"])

will not return the td objects that have no class. This seems to be a problem with the None or False behavior within a list, since

object.find_all("td", class_=[None])

Will also return an empty list.

Anyone knows how to change the syntax so I can call both together? The ordering of the extraction would be important. I could manually reorder, but I believe there must be a syntax to do what I am trying to do.

Tried many different syntaxes, but still couldn’t get something working.

Answers:

Maybe you can use custom lambda function:

from bs4 import BeautifulSoup

html_doc = '''
<td class="sem_dados">I want this 1</td>
<td class="other">I don't want this</td>
<td>I want this 2</td>'''

soup = BeautifulSoup(html_doc, 'html.parser')

print(soup.find_all('td', class_=lambda c: not c or 'sem_dados' == c))

Prints:

[<td class="sem_dados">I want this 1</td>, <td>I want this 2</td>]
Answered By: Andrej Kesely
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.