screen-scraping

Explain sum(int(td.text) for td in soup.select('td:last-child')[1:])

Explain sum(int(td.text) for td in soup.select('td:last-child')[1:]) Question: I came across this piece of code during solving a problem. I just cannot understand how the last line of the code before the print functions. Please explain. import re import urllib.request from bs4 import BeautifulSoup # url = ‘http://py4e-data.dr-chuck.net/comments_42.html’ url = ‘http://py4e-data.dr-chuck.net/comments_228869.html’ soup = BeautifulSoup(urllib.request.urlopen(url).read(), ‘html.parser’) s …

Total answers: 2

AttributeError: 'NoneType' object has no attribute 'text' when scraping Ebay product titles

AttributeError: 'NoneType' object has no attribute 'text' when scraping Ebay product titles Question: Following this tutorial to create an Ebay Price Tracker with Python, I am encountering an AttributeError: ‘NoneType’ object has no attribute ‘text’ when trying to get the title of a product from a search results page in Ebay. The class is the …

Total answers: 1

Proxy requests alway slow

Proxy requests alway slow Question: i need to do many requests to one url, but after ~20 requests, I get a 429 too many requests. So my plan was to use proxy requests. I have tried 3 things: Tor-proxy using python Free proxy lists ScraperApi But all of them(even the scraperApi-trial) are unbelieveably slow, like …

Total answers: 2

Instagram web scraping with selenium Python problem

Instagram web scraping with selenium Python problem Question: I have a problem with scraping all pictures from Instagram profile, I’m scrolling the page till bottom then find all "a" tags finally always I get only last 30 links to pictures. I think that driver doesn’t see full content of page. #scroll scrolldown = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var …

Total answers: 1

Scraping : How to exclude specific tag with bS4

Scraping : How to exclude specific tag with bS4 Question: I hope you’re well do you know how I can exclude a specific tag in scraping? #Récupération des ingrédients try: ingredientsdiv = soup.find("div", class_="c-recipe-ingredients") ingredientsbloc = ingredientsdiv.find("ul", class_="c-recipe-ingredients__list") ingredients = [re.findall(r’^(?:(d+)s([^Wd_]*))?(.*)’, item.text.replace("n", "").strip()) for item in ingredientsbloc.find_all("li", {"class": ""})] except Exception as e: ingredients = …

Total answers: 2

Scraping YouTube links from a webpage

Scraping YouTube links from a webpage Question: I’ve been trying to scrape YouTube links from a webpage, but nothing has worked. This is a picture of what I’ve been trying to scrape.: This is the code I tried most recently: youtube_link = soup.find(“a”, class_=”ytp-title-link yt-uix-sessionlink”) And this is the link to the website the YouTube …

Total answers: 3

Scrapy-splash not rendering dynamic content from a certain react-driven site

Scrapy-splash not rendering dynamic content from a certain react-driven site Question: I am curious to see if any splash can get the dynamic job content from this page – https://nreca.csod.com/ux/ats/careersite/4/home?c=nreca#/requisition/182 in order for splash to receive the URL fragment you have to use a SplashRequest. In order for it to handle the JS cookies I …

Total answers: 1

Scrapy Python Set up User Agent

Scrapy Python Set up User Agent Question: I tried to override the user-agent of my crawlspider by adding an extra line to the project configuration file. Here is the code: [settings] default = myproject.settings USER_AGENT = “Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36” [deploy] #url = http://localhost:6800/ project = myproject But …

Total answers: 2

unable to call firefox from selenium in python on AWS machine

unable to call firefox from selenium in python on AWS machine Question: I am trying to use selenium from python to scrape some dynamics pages with javascript. However, I cannot call firefox after I followed the instruction of selenium on the pypi page(http://pypi.python.org/pypi/selenium). I installed firefox on AWS ubuntu 12.04. The error message I got …

Total answers: 4

scrape websites with infinite scrolling

scrape websites with infinite scrolling Question: I have written many scrapers but I am not really sure how to handle infinite scrollers. These days most website etc, Facebook, Pinterest has infinite scrollers. Asked By: add-semi-colons || Source Answers: Most sites that have infinite scrolling do (as Lattyware notes) have a proper API as well, and …

Total answers: 3