Can't open url with requests and urllib2
Question:
I try to access https://www.collinsdictionary.com/browse/english/words-starting-with-a
with python requests
, but get requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",))
.
I tried with urllib2.urlopen
, but I got different html from what I see in browser (there is no <ul class="columns2 browse-list">
).
What am I doing wrong?
Answers:
With the following code I do get the page you seem to want:
import urllib2
page =urllib2.urlopen("https://www.collinsdictionary.com/browse/english/words-starting-with-a")
print page.read()
it does contain <ul class="columns2 browse-list">
the website reject the get request of requests because of the default user-agent python use, you should set customized User-Agent to act as if you come from browser, User-Agent below is just an example to get the browser user agent for more current version google search my useragent
import requests
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'}
r = requests.get("https://www.collinsdictionary.com/browse/english/words-starting-with-a",headers=headers)
I try to access https://www.collinsdictionary.com/browse/english/words-starting-with-a
with python requests
, but get requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",))
.
I tried with urllib2.urlopen
, but I got different html from what I see in browser (there is no <ul class="columns2 browse-list">
).
What am I doing wrong?
With the following code I do get the page you seem to want:
import urllib2
page =urllib2.urlopen("https://www.collinsdictionary.com/browse/english/words-starting-with-a")
print page.read()
it does contain <ul class="columns2 browse-list">
the website reject the get request of requests because of the default user-agent python use, you should set customized User-Agent to act as if you come from browser, User-Agent below is just an example to get the browser user agent for more current version google search my useragent
import requests
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'}
r = requests.get("https://www.collinsdictionary.com/browse/english/words-starting-with-a",headers=headers)