beautiful soup escaping in html
Question:
I’m trying to read lines from a file, and try to put it in html by using beautiful soup.
each line will be appended into a list, and using for loop, I appended them in the string, and ‘n’ in every end of the line.
for example,
lines = [a,b,c,d]
string = ''
for line in lines:
string = string + line + 'n'
and then using beautiful soup, I added string into html.
soup = BeautifulSoup(open('simple.html'), 'html.parser')
sentences = soup.new_tag('p')
sentences.string = string
soup.body.div.append(sentences)
then, I noticed that 'n'
is not breaking lines, so I changed bit
sentences.string = string.replace('n', '<br>')
but in the html, it appears as <br>
how can I convert this escaped characters back to normal so I can break the line?
Answers:
Try
import html
sentences.string = html.unescape(string + '<br>')
Instead of building a single string and escaping the HTML for the <br>
tag, use the .append
method to add each line followed by soup.new_tag('br')
from bs4 import BeautifulSoup
lines = ["apple", "banana", "cats", "dogs"]
soup = BeautifulSoup(open('simple.html'), 'html.parser')
sentences = soup.new_tag('p')
for i, line in enumerate(lines):
sentences.append(line)
# Don't add <br> after the last line
if i < len(lines)-1:
sentences.append(soup.new_tag('br'))
soup.body.div.append(sentences)
print(soup)
"""
<html>
<body>
<div><p>apple<br/>banana<br/>cats<br/>dogs</p></div>
</body>
</html>
"""
The for loop could be modified to just use index instead of enumerate or eliminate the check that prevents a <br>
from being added to the end per your own preference.
I’m trying to read lines from a file, and try to put it in html by using beautiful soup.
each line will be appended into a list, and using for loop, I appended them in the string, and ‘n’ in every end of the line.
for example,
lines = [a,b,c,d]
string = ''
for line in lines:
string = string + line + 'n'
and then using beautiful soup, I added string into html.
soup = BeautifulSoup(open('simple.html'), 'html.parser')
sentences = soup.new_tag('p')
sentences.string = string
soup.body.div.append(sentences)
then, I noticed that 'n'
is not breaking lines, so I changed bit
sentences.string = string.replace('n', '<br>')
but in the html, it appears as <br>
how can I convert this escaped characters back to normal so I can break the line?
Try
import html
sentences.string = html.unescape(string + '<br>')
Instead of building a single string and escaping the HTML for the <br>
tag, use the .append
method to add each line followed by soup.new_tag('br')
from bs4 import BeautifulSoup
lines = ["apple", "banana", "cats", "dogs"]
soup = BeautifulSoup(open('simple.html'), 'html.parser')
sentences = soup.new_tag('p')
for i, line in enumerate(lines):
sentences.append(line)
# Don't add <br> after the last line
if i < len(lines)-1:
sentences.append(soup.new_tag('br'))
soup.body.div.append(sentences)
print(soup)
"""
<html>
<body>
<div><p>apple<br/>banana<br/>cats<br/>dogs</p></div>
</body>
</html>
"""
The for loop could be modified to just use index instead of enumerate or eliminate the check that prevents a <br>
from being added to the end per your own preference.