Identify a unique tag using BeautifulSoup

Question

BeautifulSoup treats two tags as identical if they both contain the exact same content, even when the two tags are not the same DOM node.

Example:

from bs4 import BeautifulSoup
x = '<div class="a"><span>hello</span></div><div class="b"><span>hello</span></div>'
page = BeautifulSoup(x, 'html.parser')

spans = page.select('span')

spans[0] == spans[1] # prints True

The way I have managed to get around this is to account for their parents as well, e.g.:

spans = page.select('span')

spans[0] == spans[1] and list(spans[0].parents) == list(spans[1].parents) # prints False

However, this method – when used on a normal HTML page with many nested DOM nodes – is often an order of magnitude slower than just comparing spans[0] to spans[1] without the parents.

My question is: is there a more efficient way to determine, via Beautiful Soup, whether two nodes are truly the same one?

Asked By: jayp

||

Source

Answer 1

You can use id():

print(id(spans[0]) == id(spans[1]))

Prints:

False

Or is operator:

print(spans[0] is spans[1])

Answered By: Andrej Kesely

Identify a unique tag using BeautifulSoup

Question:

Answers: