web-crawler

No error just DEBUG: Crawled (200) and referer: None)

No error just DEBUG: Crawled (200) and referer: None) Question: I was trying to scrape some data from a Korean web page but failed to do so. No data is scraped at all though the xpath query is working fine in the browser filter. Here is my Python snippet. Thank you for your help. p.s. …

Total answers: 1

Crawl by many keywords in python

Crawl by many keywords in python Question: I’m doing a project about crawling website under specific keywords in python. My code (below) only can handle one keyword at onece.How do I fix it to handle many keywords? like : keywordlist = ["worm", "inflammation", "fever"] I want print every result after search. Thank you for any …

Total answers: 1

python selenium clicking a list object

python selenium clicking a list object Question: I am trying to click 1 Min button on this site below is my python code url = ‘https://www.investing.com/technical/technical-analysis’ driver.get(url) events = WebDriverWait(driver, 30).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "section#leftColumn"))) print("Required elements found") events.find_element(By.XPATH,"//a[text()=’1 Min’]").click() Am getting the following error: events.find_element(By.XPATH,"//a[text()=’1 Min’]").click() AttributeError: ‘list’ object has no attribute ‘find_element’ What can I change …

Total answers: 2

Having trouble getting next page in Scrapy

Having trouble getting next page in Scrapy Question: I am learning to use scrapy and am building a simple crawler to reinforce what I am learning, and am attempting to get the next page link but am having trouble. Can anyone point me in the right direction of getting the next page link, which is …

Total answers: 1

403 response when using scrapy python

403 response when using scrapy python Question: I am trying to learn scrapy and do crawl for a website, but I am getting a 403 response when doing crawl this is my spider: import scrapy from scrapy.loader import ItemLoader from itemloaders.processors import TakeFirst, MapCompose from w3lib.html import remove_tags def remove_currency(value): return value.replace(‘£’,”).strip() class WhiskyscraperItem(scrapy.Item): # …

Total answers: 1

Selenium can't download correct file in headless mode

Selenium can't download correct file in headless mode Question: Even after implementing the enable_download_headless(driver, path) that was suggested in the following thread, the download of the file is incorrect. While the non headless version can always download the file of the site correctly the headless version downloads an "chargeinfo.xhtml" excerpt, which is the last extension …

Total answers: 1

Not making list of urls in scrapy spider

Not making list of urls in scrapy spider Question: I have created a scrapy spider that has to crawl the whole webpage and extract the urls. now I have to remove the social media URL for that I want to make a list of the URLs, but somehow it’s not working. when I try to …

Total answers: 1

Unable to crawl full website HTML content using python selenium or request library

Unable to crawl full website HTML content using python selenium or request library Question: I am trying to crawl this site "https://ec.europa.eu/info/law/better-regulation/have-your-say/initiatives/12527-Artificial-intelligence-ethical-and-legal-requirements/feedback_en?p_id=24212003" but getting only header and few body responses, unable to get full paragraph content and links of pages. from selenium import webdriver from selenium.webdriver.chrome.options import Options options = Options() options.headless = True options.add_argument("–window-size=1920,1200") …

Total answers: 1

How to scrape all ufc fighters and not repeat only the first fighter?

How to scrape all ufc fighters and not repeat only the first fighter? Question: I am making a program to scrape UFC fighters’ names and info using BeautifulSoup. I am using a for-loop iterating through the div holding this info scraping specific information. The issue I am having is when printing the data only the …

Total answers: 2

Cannot get correct href value when crawling sqlite website using BeautifulSoup in python

Cannot get correct href value when crawling sqlite website using BeautifulSoup in python Question: I tried to get the sqlite download link on the sqlite download webpage using BeautifulSoup. I can see the correct href value when inspecting the webpage in the chrome. screenshot for the webpage However, I cannot get the href value using …

Total answers: 2