How to change string content of a node that has child elements?

Question:

I’m trying to make a script in Python using BeautifulSoup where the text on the whole page is going to be changed into something else.

So far it’s going good, but I’m having trouble whenever I encounter a node that has both a string and another node inside it.

As an example, here is some sample HTML:

   <div>
        abc
        <p>xyz</p>
   </div>

What I want to do is change the "abc" part of the HTML without affecting the remaining content of the node.

As you probably already know, using element.string in BeautifulSoup only works with nodes that have one child element, and since in this example the <div> node has two children (text and the <p> tag), trying to access the string attribute is going to end with a Runtime Error, saying that NoneType has no string attribute.

Is there a way to go around using the string attribute and changing the text portion of a node in this specific scenario?

Asked By: Senff1389

||

Answers:

You can access various contents of the <div> tag with .contents property and then use .replace_with() to put new text there:

from bs4 import BeautifulSoup

html_doc = '''
<div>
    abc
    <p>xyz</p>
</div>'''

soup = BeautifulSoup(html_doc, 'html.parser')

soup.div.contents[0].replace_with('n    Hello Worldn    ')
print(soup)

Prints:

<div>
    Hello World
    <p>xyz</p>
</div>
Answered By: Andrej Kesely
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.