cannot import name 'PageCoroutine' from 'scrapy_playwright.page'

Question

I am trying to use scrapy and playwright to scrape dynamic webpages, I installed scrapy and playwright, however, when I try to run my spider, i get this error.

ImportError: cannot import name 'PageCoroutine' from 'scrapy_playwright.page' (C:AliDataCampWeb Scraping in PythonScrapyvenvlibsite-packagesscrapy_playwrightpage.py)

This is my code(it’s a test code):

import scrapy
from scrapy_playwright.page import PageCoroutine

class PwspiderSpider(scrapy.Spider):
    name = 'pwspider'
    
    def start_requests(self):
        yield scrapy.Request("https://shoppable-campaign-demo.netlify.app/#/", meta=dict(playwright=True, playwright_include_page=True, playwright_page_coroutine=[PageCoroutine('wait_for_selector', 'div#productListing')]))

    async def parse(self, response):
        yield {'text': response.text}

I even added the DOWNLOAD_HANDLERS and the TWISTED_REACTOR in the settings file.

Asked By: Ali_Khaled

||

Source

Answer 1

PageCoroutine is deprecated/obsolute. Use playwright_page_methods instead.

Working code as an example:

import scrapy
from scrapy_playwright.page import PageMethod

class TestSpider(scrapy.Spider):
    name = "test"
    def start_requests(self):
        yield scrapy.Request(

            url="https://shoppable-campaign-demo.netlify.app/#/",
            callback=self.parse,
            meta={
                "playwright": True,
                "playwright_page_methods": [
                    PageMethod("wait_for_selector", '.card-body'),
                ],
            },
        )

    def parse(self, response):
        
        products = response.xpath('//*[@class="card-body"]')
        for product in products:
            yield {
            'title':product.xpath('.//*[@class="card-title"]/text()').get()
          
            }

Output:

{'title': 'Oxford Loafers'}
2022-11-05 20:40:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://shoppable-campaign-demo.netlify.app/#/>
{'title': 'Ankle-length Slack'}
2022-11-05 20:40:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://shoppable-campaign-demo.netlify.app/#/>
{'title': 'White Baseball Cap'}
2022-11-05 20:40:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://shoppable-campaign-demo.netlify.app/#/>
{'title': 'Triangle Bikini Top'}
2022-11-05 20:40:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://shoppable-campaign-demo.netlify.app/#/>
{'title': 'Short Blazer'}
2022-11-05 20:40:40 [scrapy.core.engine] INFO: Closing spider (finished)
2022-11-05 20:40:40 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 235,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 39851,
 'downloader/response_count': 1,
 'downloader/response_status_count/200': 1,
 'elapsed_time_seconds': 41.370211,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2022, 11, 5, 14, 40, 40, 261151),
 'item_scraped_count': 5,

Answered By: Fazlul

cannot import name 'PageCoroutine' from 'scrapy_playwright.page'

Question:

Answers: