scrapy get tag a attribute values of rel
Question:
types of tags a:
<a rel="sponsored" href="https://cheese.example.com/Appenzeller_cheese">Appenzeller</a>
or
<a rel="ugc" href="https://cheese.example.com/Appenzeller_cheese">Appenzeller</a>
and one or more of the following values:
rel="sponsored"
or
rel="ugc"
or
rel="ugc nofollow noreferrer"
Apparently, Scrapy only supports the following value (Just "nofollow"):
<a rel="nofollow" href="https://cheese.example.com/Appenzeller_cheese">Appenzeller</a>
How can I get other values (like: ugc, noreferrer and … ) with the help of Link Extractors?
Answers:
You must use the from lxml import etree
library.
You can’t do this with Link Extractors.
Like: etree.fromstring(tag)
types of tags a:
<a rel="sponsored" href="https://cheese.example.com/Appenzeller_cheese">Appenzeller</a>
or
<a rel="ugc" href="https://cheese.example.com/Appenzeller_cheese">Appenzeller</a>
and one or more of the following values:
rel="sponsored"
or
rel="ugc"
or
rel="ugc nofollow noreferrer"
Apparently, Scrapy only supports the following value (Just "nofollow"):
<a rel="nofollow" href="https://cheese.example.com/Appenzeller_cheese">Appenzeller</a>
How can I get other values (like: ugc, noreferrer and … ) with the help of Link Extractors?
You must use the from lxml import etree
library.
You can’t do this with Link Extractors.
Like: etree.fromstring(tag)