Retrieving only XML Tag Names in Scrapy

Question:

The Short:

How can I retrieve only tag names with .xpath() in Scrapy?

The Long:

I am currently using a scrapy.Spider and using response.selector.remove_namespaces() in the parse() function to keep things simple.

I am trying to do something like this, but with Scrapy:

Iterate on XML tags and get elements' xpath in Python

However, I can’t seem to figure out how to retrieve only the name of the tags. What is the .xpath() command to grab just the tag names?

Asked By: SC4RECROW

||

Answers:

There is no built in way of extracting just the tag name from a scrapy.selector class, at least that I am aware of.

That being said, you can use the re method of any selector and use a regular expression pattern to extract the tag name.

For example:

for selector in response.xpath("//*"):
    print(selector.re(r'<(w+)s'))
Answered By: Alexander
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.