Validate HTML with BeautifulSoup

Question:

I use BeautifulSoup 3.2.1 to parse a lot of HTML files translated with eTranslation.

I found
soup = BeautifulSoup(html_file, "html.parser") sometimes cuts a section of my HTML file. And it is related to invalid tags or problems found in the HTML.

Also I found soup = BeautifulSoup(html_file, "lxml") works better in these cases of bad written HTML.

Is there a way to detect which HTML file is invalid using BeautifulSoup?

I image something like this:

if valid(html_file):
    soup = BeautifulSoup(html_file, "html.parser")
else:
    soup = BeautifulSoup(html_file, "lxml")
Asked By: GhitaB

||

Answers:

I solved it using lxml all the time.

Answered By: GhitaB
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.