How to do a task after scraping all the pages of website using Scrapy-Python
Question:
I want to perform some task after my scraper scrapes all the anchors of a home page of a website. But the print statement is executed before processing the parse_details of all pages.
Any HELP would be appreciated. Thanks in advance
def parse_site(self,response):
next_links = response.css('a::attr(href)').getall()
for next_link in next_links:
yield response.follow(next_link,callback=self.parse_detail)
print("Task after complettion of all pages")
def parse_detail(self,response):
print("@@@@@@@@@@@@@@@@@GETTING HERE################")
all_content = response.xpath('//body').extract()
print("###############")
print(response.url)
Answers:
You can add the method closed
to your spider which will be called by scrapy after your spider is done. However, you can not yield any more items in the method. Scrapy docs
def closed(self, reason):
# do something here.
pass
I want to perform some task after my scraper scrapes all the anchors of a home page of a website. But the print statement is executed before processing the parse_details of all pages.
Any HELP would be appreciated. Thanks in advance
def parse_site(self,response):
next_links = response.css('a::attr(href)').getall()
for next_link in next_links:
yield response.follow(next_link,callback=self.parse_detail)
print("Task after complettion of all pages")
def parse_detail(self,response):
print("@@@@@@@@@@@@@@@@@GETTING HERE################")
all_content = response.xpath('//body').extract()
print("###############")
print(response.url)
You can add the method closed
to your spider which will be called by scrapy after your spider is done. However, you can not yield any more items in the method. Scrapy docs
def closed(self, reason):
# do something here.
pass