beautiful soup escaping in html

Question

I’m trying to read lines from a file, and try to put it in html by using beautiful soup.
each line will be appended into a list, and using for loop, I appended them in the string, and ‘n’ in every end of the line.
for example,

lines = [a,b,c,d]
string = ''
for line in lines:
    string = string + line + 'n'

and then using beautiful soup, I added string into html.

soup = BeautifulSoup(open('simple.html'), 'html.parser')
sentences = soup.new_tag('p')
sentences.string = string
soup.body.div.append(sentences)

then, I noticed that 'n' is not breaking lines, so I changed bit

sentences.string = string.replace('n', '<br>')

but in the html, it appears as <br>

how can I convert this escaped characters back to normal so I can break the line?

Asked By: maz32

||

Source

Answer 1

Try

import html

sentences.string = html.unescape(string + '<br>')

Answered By: Jeffrey Lim

Answer 2

Instead of building a single string and escaping the HTML for the <br> tag, use the .append method to add each line followed by soup.new_tag('br')

from bs4 import BeautifulSoup

lines = ["apple", "banana", "cats", "dogs"]
soup = BeautifulSoup(open('simple.html'), 'html.parser')
sentences = soup.new_tag('p')

for i, line in enumerate(lines):
    sentences.append(line)
    # Don't add <br> after the last line
    if i < len(lines)-1:
        sentences.append(soup.new_tag('br'))

soup.body.div.append(sentences)

print(soup)

"""
<html>
<body>
<div><p>apple<br/>banana<br/>cats<br/>dogs</p></div>
</body>
</html>
"""

The for loop could be modified to just use index instead of enumerate or eliminate the check that prevents a <br> from being added to the end per your own preference.

Answered By: nigh_anxiety

beautiful soup escaping in html

Question:

Answers: