scrapy

Why does my scraper return empty result set when I run it?

Why does my scraper return empty result set when I run it? Question: When i run this code block, it runs successfully but returns empty result for the name field. please help me check, what am I missing? Here’s my block code import scrapy class TruckspiderSpider(scrapy.Spider): name = ‘truckspider’ allowed_domains = [‘www.quicktransportsolutions.com’] start_urls = [‘https://www.quicktransportsolutions.com/carrier/usa-trucking-companies.php’] …

Total answers: 2

Scrapy one item with multiple parsing functions

Scrapy one item with multiple parsing functions Question: I am using Scrapy with python to scrape a website and I face some difficulties with filling the item that I have created. The products are properly scraped and everything is working well as long as the info is located within the response.xpath mentioned in the for …

Total answers: 2

Python3 error No module named 'attrs' when running Scrapy in Ubuntu terminal

Python3 error No module named 'attrs' when running Scrapy in Ubuntu terminal Question: I am new to Python. I installed Scrapy on ubuntu linux. When I run Scrapy shell I get this error File "/home/user/.local/lib/python3.10/site-packages/scrapy/downloadermiddlewares/retry.py", line 25, in <module> from twisted.web.client import ResponseFailed File "/home/user/.local/lib/python3.10/site-packages/twisted/web/client.py", line 24, in <module> from twisted.internet.endpoints import HostnameEndpoint, wrapClientTLS File …

Total answers: 2

Looking for alternative of scrapy.Selector in Selenium project

Looking for alternative of scrapy.Selector in Selenium project Question: Right now using scrapy.Selector to extract data from driver.page_source (Selenium). Looking for another way of doing this without loading scrapy library. Don’t want to use driver.find_elements method import selenium, scrapy from scrapy import Selector driver.get(link) page_source = driver.page_source selector = Selector(text=page_source) links = selector.xpath(‘//a[contains(@class, "jcs-JobTitle")]/@href’).extract() next_page …

Total answers: 1

Scrapy Crawl (referer: None)

Scrapy Crawl (referer: None) Question: I am new to scrapy and python I am scrapping data from Aliexpress.com with playwright method and it returns (referer: None): Here is my code class AliSpider(scrapy.Spider): name = "aliex" def start_requests(self): # GET request search_value = ‘phones’ yield scrapy.Request(f"https://www.aliexpress.com/premium/{search_value}.html?spm=a2g0o.productlist.1000002.0&initiative_id=SB_20230118063054&dida=y", meta=dict( playwright= True, playwright_include_page = True, playwright_page_coroutines =[ PageMethod(‘wait_for_selector’, ‘.list–gallery–34TropR’) …

Total answers: 1

Scrapy Crawl (referer: None) ['partial']

Scrapy Crawl (referer: None) ['partial'] Question: I am new at scrapy and python. I am trying to scrap data from www.freepatentonline.com. Here is my code. class FreePatentSpider(scrapy.Spider): name = ‘freepatent’ allowed_domains = [‘freepatentsonline.com’] search_value = ‘laptop’ start_urls = [f’https://www.freepatentsonline.com/result.html?sort=relevance&srch=top&query_txt={search_value}&submit=&patents_us=on’] user_agent = ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36’ def request_header(self): …

Total answers: 1

yield scrapy.Request does not invoke parse function on each iteration

yield scrapy.Request does not invoke parse function on each iteration Question: in my code i have to functions inside scrapy class. start_request takes data from excel workbook and assigns value to plate_num_xlsx variable. def start_requests(self): df=pd.read_excel(‘data.xlsx’) columnA_values=df[‘PLATE’] for row in columnA_values: global plate_num_xlsx plate_num_xlsx=row print("+",plate_num_xlsx) base_url =f"https://dvlaregistrations.dvla.gov.uk/search/results.html?search={plate_num_xlsx}&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=" url=base_url yield scrapy.Request(url,callback=self.parse) But, on each iteration it …

Total answers: 1

Scrapy Unable to scrape API

Scrapy Unable to scrape API Question: I am trying to crawl API using scrapy form this link The thing is the API request I was trying to get solves my all issues but I am not able to load the response in json form and I cannot proceed further. Though code seems long but the …

Total answers: 2

scrapy python project does not export data to excel with pandas

scrapy python project does not export data to excel with pandas Question: my script is below, first it reads plate_num_xlsx value from excel file data.xlsx successfully then requests scrapy to scrape data from url. At each parse() invocation, I am taking values parsed to item then trying to export them to excel with pandas. if …

Total answers: 2

python scrapy yield request typerror on url paramter with f string

python scrapy yield request typerror on url paramter with f string Question: Trying to get data from excel column then start scraping by concatenating the value taken from excel to url. Script gives a TypeError raise TypeError(f"Request url must be str, got {type(url).__name__}") Below is my script. import scrapy from scrapy.crawler import CrawlerProcess import pandas …

Total answers: 1