scrapy

python Scrapy framework adding to my code proxy

python Scrapy framework adding to my code proxy Question: I am trying new feature for myself as adding proxy port to my python scraper code. I took free proxy from this site, and looked for an answer from SO. With help of user @dskrypa I changed in my code meta={‘proxy’:’103.42.162.50:8080′} Now it gives an error …

Total answers: 1

Unable to get data from Scrapy API

Unable to get data from Scrapy API Question: I’m facing some unexpected issues while trying to get data from an API request. I found that it is throwing a "500" error and also this error message. I’m trying to scrape this URL "https://www.machinerytrader.com/listings/for-sale/excavators/1031" but I have no idea what I actually missing here. raise JSONDecodeError("Expecting …

Total answers: 1

Web Scraping Got Empty Array Values

Web Scraping Got Empty Array Values Question: Hello All I was trying to scrape a stock website to get stock sector wise info in this website. I just hit scrapy shell in terminal if the data is acheivable for this table In the terminal this was my command after I ran scrapy shell "https://nepsealpha.com/" response.xpath("//table[@id=’fixTable’]//tbody//tr") …

Total answers: 1

Scrapy – recursive function as callback for pagination

Scrapy – recursive function as callback for pagination Question: I’m running into some difficulties with a Scrapy spider. Function parse() is not working as it should. It receives a response for a url with a search keyword and then for each listing in the page follows the url to fill the Scrapy Data item. It …

Total answers: 1

Scrapy – only returns yield from 1 url of list

Scrapy – only returns yield from 1 url of list Question: I’m crawling a website which has many countries, ex: amazon.com, .mx, .fr, .de, .es,… (the website is not actually amazon) I’ve made a url list of each base url and call parse with each one: def start_requests(self): for url in self.start_urls: yield scrapy.Request(url, callback=self.parse) …

Total answers: 1

How to get all data when "show more" button clicked with scrapy-playwright

How to get all data when "show more" button clicked with scrapy-playwright Question: Currently, I’ve had trouble getting all data on this page: https://www.espn.com/nba/stats/player/_/season/2023/seasontype/2 so if scrape right now it only gets 50 of the data, this is not what I want, what I want is to scrape all data, to show all table data …

Total answers: 1

how to use scrapy package with Juypter Notebook

how to use scrapy package with Juypter Notebook Question: i’m trying to learn web scraping/crawling and trying to apply the below code on Juypter Notebook but it didn’t show any outputs, So can anyone help me and guide me to how to use scrapy package on Juypter notbook. The code:- import scrapy from scrapy.linkextractors import …

Total answers: 1

How do you add environment variables to a python scrapy project? dotenv didn't work

How do you add environment variables to a python scrapy project? dotenv didn't work Question: I’m having trouble incorporating an IP address into a format string in my Python Scrapy project. I was trying to use python-dotenv to store sensitive information, such as server IPs, in a .env file and load it into my project, …

Total answers: 1

How can I make Scrapy follow the links in order

How can I make Scrapy follow the links in order Question: I’m doing a small scrape project and everything is working fine, but I’m having a problem with the order of links since Scrapy is synchronous. The ‘rankings["Men’s Pound-for-Pound"]’ is a list of links which I except to be followed on its order, so the …

Total answers: 1

How to get the proxy used for each request in scrapy logs?

How to get the proxy used for each request in scrapy logs? Question: I am using a custom proxy middleware for rotating proxies and I would like to get a log for the proxy used for each request: packetstream_proxies = [ settings.get("PS_PROXY_USA"), settings.get("PS_PROXY_CA"), settings.get("PS_PROXY_IT"), settings.get("PS_PROXY_GLOBAL"), ] unlimited_proxies = [ settings.get("UNLIMITED_PROXY_1"), settings.get("UNLIMITED_PROXY_2"), settings.get("UNLIMITED_PROXY_3"), settings.get("UNLIMITED_PROXY_4"), settings.get("UNLIMITED_PROXY_5"), settings.get("UNLIMITED_PROXY_6"), …

Total answers: 1