Python: Using xpath locally / on a specific element

Question:

I’m trying to get the links from a page with xpath. The problem is that I only want the links inside a table, but if I apply the xpath expression on the whole page I’ll capture links which I don’t want.

For example:

tree = lxml.html.parse(some_response)
links = tree.xpath("//a[contains(@href, 'http://www.example.com/filter/')]")

The problem is that applies the expression to the whole document. I located the element I want, for example:

tree = lxml.html.parse(some_response)
root = tree.getroot()
table = root[1][5] #for example
links = table.xpath("//a[contains(@href, 'http://www.example.com/filter/')]")

But that seems to be performing the query in the whole document as well, as I still am capturing the links outside of the table. This page says that “When xpath() is used on an Element, the XPath expression is evaluated against the element (if relative) or against the root tree (if absolute):”. So, what I using is an absolute expression and I need to make it relative? Is that it?

Basically, how can I go about filtering only elements that exist inside of this table?

Asked By: pvt pns

||

Answers:

Your xpath starts with a slash (/) and is therefore absolute. Add a dot (.) in front to make it relative to the current element i.e.

links = table.xpath(".//a[contains(@href, 'http://www.example.com/filter/')]")
Answered By: phihag

Another option would be to ask directly for elements inside your table.
For instance:

tree = lxml.html.parse(some_response)
links = tree.xpath("//table[**criteria**]//a[contains(@href, 'http://www.example.com/filter/')]")

Where **criteria** is necessary if there are many tables in the page. Some possible criteria would be to filter based on the table id or class. For instance:

links = tree.xpath("//table[@id='my_table_id']//a[contains(@href, 'http://www.example.com/filter/')]")
Answered By: Pablo Guerrero
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.