Identify a unique tag using BeautifulSoup

Question:

BeautifulSoup treats two tags as identical if they both contain the exact same content, even when the two tags are not the same DOM node.

Example:

from bs4 import BeautifulSoup
x = '<div class="a"><span>hello</span></div><div class="b"><span>hello</span></div>'
page = BeautifulSoup(x, 'html.parser')

spans = page.select('span')

spans[0] == spans[1] # prints True

The way I have managed to get around this is to account for their parents as well, e.g.:

spans = page.select('span')

spans[0] == spans[1] and list(spans[0].parents) == list(spans[1].parents) # prints False

However, this method – when used on a normal HTML page with many nested DOM nodes – is often an order of magnitude slower than just comparing spans[0] to spans[1] without the parents.

My question is: is there a more efficient way to determine, via Beautiful Soup, whether two nodes are truly the same one?

Asked By: jayp

||

Answers:

You can use id():

print(id(spans[0]) == id(spans[1]))

Prints:

False

Or is operator:

print(spans[0] is spans[1])
Answered By: Andrej Kesely
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.