Scrapy empty .json file return

Question:

I am trying to get data from a website.

Everything seems to be correct (xpath was tested on the shell):

>>> scrapy shell "https://stopcovid19.fr/"

>>> for cat in response.xpath("//ul[@class='level0 submenu']/li/a"):
    {
        'name': cat.xpath("./span/text()").get(),
        'link': cat.xpath("./@href").get(),
    }

Here is the code:

import scrapy

class ToScrapeSpiderXPath(scrapy.Spider):
    name = 'categories'
    start_urls = ['https://stopcovid19.fr']

    def parse(self, response):

        for cat in response.xpath("//ul[@class='level0 submenu']/li/a"):
            yield {
                'name': cat.xpath("./span/text()").get(),
                'link': cat.xpath("./@href").get(),
            }

But when I try to get result on a json file with the following code, the file is empty.

scrapy crawl categories -O categories.json

Could you help me? Sorry in advance, this is my first program…

Asked By: Adrien82

||

Answers:

You forget to add contains() function to your xpath:

//ul[contains(@class, 'level0 submenu')]

try like that:

for cat in response.xpath("//ul[contains(@class, 'level0 submenu')]/li/a"):
    ...

so spider looks like:

import scrapy


class ToScrapeSpiderXPath(scrapy.Spider):
    name = 'categories'
    start_urls = ['https://stopcovid19.fr']

    def parse(self, response, **kwargs):
        for cat in response.xpath("//ul[contains(@class, 'level0 submenu')]/li/a"):
            yield {
                'name': cat.xpath("./span/text()").get(),
                'link': cat.xpath("./@href").get(),
            }

and run the script like:

scrapy crawl categories -o file.json

++++
EDIT
++++
the code is running well but the spider was not saved in the right file… Thanks for your help!!

Answered By: Vova
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.