Scrapy one item with multiple parsing functions

Question:

I am using Scrapy with python to scrape a website and I face some difficulties with filling the item that I have created.

The products are properly scraped and everything is working well as long as the info is located within the response.xpath mentioned in the for loop.

‘trend’ and ‘number’ are properly added to the Item using ItemLoader.

However, the date of the product is not located within the response.xpath cited below but in the response.css as a title : response.css('title')


import scrapy
import datetime
from trends.items import Trend_item
from scrapy.loader import ItemLoader

#Initiate the spider

class trendspiders(scrapy.Spider):
    name = 'milk'
    start_urls = ['https://thewebsiteforthebestmilk/ireland/2022-03-16/7/']

    def parse(self, response):

       for milk_unique in response.xpath('/html/body/main/div/div[2]/div[1]/section[1]/div/div[3]/table/tbody/tr'):
                l = ItemLoader(item=Milk_item(), selector=milk_unique, response=response)
                l.add_css('milk', 'a::text')
                l.add_css('number', 'span.small.text-muted::text')

            return l.load_item()

How can I add the ‘date’ to my item please (found in response.css('title')?

I have tried to add l.add_css('date', "response.css('title')")in the for loop but it returns an error.

Should I create a new parsing function? If yes then how to send the info to the same Item?

I hope I’ve made myself clear.

Thank you very much for your help,

Asked By: silkywork

||

Answers:

If response.css('title').get() gives you the answer you need, why not use the same CSS with add_css:

l.add_css('date', 'title')

Also, .add_css('date', "response.css('title')") is invalid because the second argument a valid CSS selector.

Answered By: Upendra

Since the date is outside of the selector you are using for each row, what you should do is extract that first before your for loop, since it doesn’t need to be updated on each iteration.

Then with your item loader you can just use l.add_value to load it with the rest of the fields.

For example:

class trendspiders(scrapy.Spider):
    name = 'trends'
    start_urls = ['https://getdaytrends.com/ireland/2022-03-16/7/']

    def parse(self, response):
        date_str = response.xpath("//title/text()").get()
        for trend_unique in response.xpath('/html/body/main/div/div[2]/div[1]/section[1]/div/div[3]/table/tbody/tr'):
            l = ItemLoader(item=Trend_item(), selector=trend_unique, response=response)
            l.add_css('trend', 'a::text')
            l.add_css('number', 'span.small.text-muted::text')
            l.add_value('date', date_str)
            yield l.load_item()
Answered By: Alexander
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.