Strip the first (top level) tag in Beautifulsoup

Question:

I create a soup:

from bs4 import BeautifulSoup
soup = BeautifulSoup("<div><p>My paragraph <a>My link</a></p></div>", "html.parser")

I want to strip the first top-level tag to reveal its contents, regardless of the tag:

<p>My paragraph<a>My link</a></p>

with all the children. So I don’t want to find and replace by tag like soup.find("div"), but do this positionally.

How can this be done?

Asked By: osolmaz

||

Answers:

Maybe you can use its children?

soup.findChildren()[1] -> <p>My paragraph <a>My link</a></p>

soup.findChildren()[0] returns the element itself which contains the div element. So the index 1 would be the first child.

Answered By: AliBZ

Use the provided .unwrap() function:

from bs4 import BeautifulSoup
soup = BeautifulSoup("<div><p>My paragraph <a>My link</a></p><p>hello again</p></div>","html.parser")

soup.contents[0].unwrap()

print soup
print len(soup.contents)

Result:

<p>My paragraph <a>My link</a></p><p>hello again</p>
2
Answered By: Robᵩ