Is there a way to stop BeautifulSoup from removing leading spaces?

Question:

I am using BeautifulSoup in a project and noticed that it removes leading spaces.
For example:

from bs4 import BeautifulSoup
sample = " Test"
soup = BeautifulSoup(sample, features="lxml")
[s.extract() for s in soup(["style", "script", "[document]", "head", "title"])]
print(soup.getText(strip=False))

Returns "Test"

I tried setting the strip option to "False" but it did not help and I cannot find any discussion of this behavior anywhere. This is a MWE but the goal is to take HTML-formatted input and print the plain text.

Asked By: bbernicker

||

Answers:

To avoid the leading whitespace, you can use html.parser instead of lxml as your parser:

soup = BeautifulSoup(html_doc, 'html.parser')

See the BeautifulSoup documentation on using different parser:

But if the document is not perfectly-formed, different parsers will
give different results…

Answered By: MendelG
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.