BeautifulSoup is HTML escaping strings which have escaped characters

Question:

I am reading a string from file:

a = '<script>closedSign: '<img src="/static/images/drop-down.png" style="margin-top: -3px;"  />'</script>'

Now, when I run

BeautifulSoup(a)

<script>closedSign: '&lt;img src="/static/images/drop-down.png" style="margin-top: -3px;"   /&gt;'</script>

Thus, <img is being HTML escaped into &lt;img

How can I avoid this?

Asked By: Jamal

||

Answers:

Look at the “Entity Conversion” section of the Beautiful Soup Documentation.

soup = BeautifulSoup(html, convertEntities=BeautifulSoup.HTML_ENTITIES)
Answered By: Paulo Scardine

Use BeautifulSoup 3.2.0 instead of 3.2.1 to fix this problem.

Answered By: Jamal

If the page is rendering with javascript after loading, you can wait for render like in this answer.

Answered By: Efraim