mechanize

Scrape the absolute URL instead of a relative path in python

Scrape the absolute URL instead of a relative path in python Question: I’m trying to get all the href’s from a HTML code and store it in a list for future processing such as this: Example URL: www.example-page-xl.com <body> <section> <a href=”/helloworld/index.php”> Hello World </a> </section> </body> I’m using the following code to list the …

Total answers: 4

brute force script python and mechanize

brute force script python and mechanize Question: I’m trying to brute force the Facebook login page with a python script, however whenever I run the code I get the errors below. My code is: br = mechanize.Browser() br.set_handle_equiv(True) br.set_handle_redirect(True) br.set_handle_referer(True) br.set_handle_robots(False) br.addheaders = [(‘User-agent’, ‘Firefox’)] print “enter target email” email = raw_input(‘>>>’) print “continue at …

Total answers: 1

Having Trouble With Python Login Bot

Having Trouble With Python Login Bot Question: I am currently trying to create a python bot using mechanize that scrapes my account for a school project however I am having trouble logging in to this website: https://marketwatch.com/login import mechanize loginurl = https://marketwatch.com/login user = raw_input("enter user") passcode = raw_input("enter passcode") browser = mechanize.Browser() browser.set_handle_robots(False) browser.open(loginurl) …

Total answers: 1

HTTP Error 999: Request denied

HTTP Error 999: Request denied Question: I am trying to scrape some web pages from LinkedIn using BeautifulSoup and I keep getting error “HTTP Error 999: Request denied”. Is there a way around to avoid this error. If you look at my code, I have tried Mechanize and URLLIB2 and both are giving me the …

Total answers: 3

adding directory to sys.path /PYTHONPATH

adding directory to sys.path /PYTHONPATH Question: I am trying to import a module from a particular directory. The problem is that if I use sys.path.append(mod_directory) to append the path and then open the python interpreter, the directory mod_directory gets added to the end of the list sys.path. If I export the PYTHONPATH variable before opening …

Total answers: 5

How to handle IncompleteRead: in python

How to handle IncompleteRead: in python Question: I am trying to fetch some data from a website. However it returns me incomplete read. The data I am trying to get is a huge set of nested links. I did some research online and found that this might be due to a server error (A chunked …

Total answers: 8

Unable to import a module that is definitely installed

Unable to import a module that is definitely installed Question: After installing mechanize, I don’t seem to be able to import it. I have tried installing from pip, easy_install, and via python setup.py install from this repo: https://github.com/abielr/mechanize. All of this to no avail, as each time I enter my Python interactive I get: Python …

Total answers: 39

BeautifulSoup and Amazon.co.uk

BeautifulSoup and Amazon.co.uk Question: I am trying to parse amazon to compile a list of prices, as part of a bigger project relating to statistics. However, I am stumped. I was wondering If anyone can review my code and tell me where I went wrong? #!/usr/bin/python # -*- coding: utf-8 -*- import mechanize from bs4 …

Total answers: 1

Screen scraping: getting around "HTTP Error 403: request disallowed by robots.txt"

Screen scraping: getting around "HTTP Error 403: request disallowed by robots.txt" Question: Is there a way to get around the following? httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt Is the only way around this to contact the site-owner (barnesandnoble.com).. i’m building a site that would bring them more sales, not sure why they would …

Total answers: 8

Python Mechanize select a form with no name

Python Mechanize select a form with no name Question: I am attempting to have mechanize select a form from a page, but the form in question has no “name” attribute in the html. What should I do? when I try to use br.select_form(name = “”) I get errors that no form is declared with that …

Total answers: 2