BeautifulSoup is HTML escaping strings which have escaped characters
Question:
I am reading a string from file:
a = '<script>closedSign: '<img src="/static/images/drop-down.png" style="margin-top: -3px;" />'</script>'
Now, when I run
BeautifulSoup(a)
<script>closedSign: '<img src="/static/images/drop-down.png" style="margin-top: -3px;" />'</script>
Thus, <img
is being HTML escaped into <img
How can I avoid this?
Answers:
Look at the “Entity Conversion” section of the Beautiful Soup Documentation.
soup = BeautifulSoup(html, convertEntities=BeautifulSoup.HTML_ENTITIES)
Use BeautifulSoup 3.2.0 instead of 3.2.1 to fix this problem.
If the page is rendering with javascript after loading, you can wait for render like in this answer.
I am reading a string from file:
a = '<script>closedSign: '<img src="/static/images/drop-down.png" style="margin-top: -3px;" />'</script>'
Now, when I run
BeautifulSoup(a)
<script>closedSign: '<img src="/static/images/drop-down.png" style="margin-top: -3px;" />'</script>
Thus, <img
is being HTML escaped into <img
How can I avoid this?
Look at the “Entity Conversion” section of the Beautiful Soup Documentation.
soup = BeautifulSoup(html, convertEntities=BeautifulSoup.HTML_ENTITIES)
Use BeautifulSoup 3.2.0 instead of 3.2.1 to fix this problem.
If the page is rendering with javascript after loading, you can wait for render like in this answer.