web-crawler

Selenium finding and updating class with web app

Selenium finding and updating class with web app Question: Some time ago I set up a Python script utilizing selenium to crawl across a website. I have tried to adopt the code to get data from a new webpage but have run into trouble. The issue seems to be that the page is an app …

Total answers: 1

Python: BeautifulSoup select_one cannot find the tag

Python: BeautifulSoup select_one cannot find the tag Question: English is my second language, please excuse me for poor English. Follow code is an easy code that gets tag info with using requests and bs4. The problem is, this code is returning none. import requests from bs4 import BeautifulSoup url = ‘http://ch1.skbroadband.com/content/view?parent_no=24&content_no=57&p_no=154494’ web = requests.get(url,headers={‘User-Agent’:’Mozilla/5.0′}) source …

Total answers: 1

Scrapy – recursive function as callback for pagination

Scrapy – recursive function as callback for pagination Question: I’m running into some difficulties with a Scrapy spider. Function parse() is not working as it should. It receives a response for a url with a search keyword and then for each listing in the page follows the url to fill the Scrapy Data item. It …

Total answers: 1

In Python, Downloading Png and Jpg images

In Python, Downloading Png and Jpg images Question: I am writing a script to download images from a certain website. The website contains jpg and png images. I was expecting the code to run normally. But the png images are taking a while to download (very slow) while the jpg images are quick. img_data = …

Total answers: 1

Save file dialog when trying to download file in selenium chrome

Save file dialog when trying to download file in selenium chrome Question: I am using selenium grid in docker. My nodes are created from selenium/node-chrome:4.8.0 image and my hub is created from selenium/hub:4.8.0 image when I try to download a file with code below, google chrome shows a dialog asking for download path. from selenium …

Total answers: 1

how to use scrapy package with Juypter Notebook

how to use scrapy package with Juypter Notebook Question: i’m trying to learn web scraping/crawling and trying to apply the below code on Juypter Notebook but it didn’t show any outputs, So can anyone help me and guide me to how to use scrapy package on Juypter notbook. The code:- import scrapy from scrapy.linkextractors import …

Total answers: 1

How to change a parsed text into integer or remove decimal points?

How to change a parsed text into integer or remove decimal points? Question: How to convert a text into integer? The "reference_price" parsed from a webpage is "123.45" and its type is Text. However, I would like to change this to an integer like "123". ###### Parsing tbody = table.tbody for i, tr in enumerate(tbody.find_all(‘tr’)): …

Total answers: 2

How to get scrap web entire page data without physically scrolling?

How to get scrap web entire page data without physically scrolling? Question: I am using the following code to extract information of this webpage, but it only fetches first 18 rows of information. How can I ensure that I am loading 2063 rows of information. from bs4 import BeautifulSoup from selenium import webdriver from selenium.webdriver.common.keys …

Total answers: 2

Add the spider's name to each line of log

Add the spider's name to each line of log Question: I am looking for a way to prefix each log produced by Scrapy with the name of the spider that generated it. Until now, I was launching each spider synchronously in a loop, so it was easy to track which spider generated which log. But …

Total answers: 2