Why do I get a "Connection aborted" error when trying to crawl a specific website?

Question:

I wrote a Web crawler in Python 2.7, but a specific site cannot be downloaded although it can be viewed in browser.

My code is as following:

# -*- coding: utf-8 -*-

import requests

# OK
url = 'http://blog.ithome.com.tw/'
url = 'http://7club.ithome.com.tw/'
url = 'https://member.ithome.com.tw/'
url = 'http://ithome.com.tw/'
url = 'http://weekly.ithome.com.tw'

# NOT OK
url = 'http://download.ithome.com.tw'
url = 'http://apphome.ithome.com.tw/'
url = 'http://ithelp.ithome.com.tw/'

try:
    response = requests.get(url)
    print 'OK!'
    print 'response.status_code: %s' %(response.status_code)

except Exception, e:
    print 'NOT OK!'
    print 'Error: %s' %(e)
print 'DONE!'
print 'response.status_code: %s' %(response.status_code)

Each time I have tried I get this error:

C:Python27python.exe "E:/python crawler/test_ConnectionFailed.py"
NOT OK!
Error: ('Connection aborted.', BadStatusLine("''",))
DONE!
Traceback (most recent call last):
  File "E:/python crawler/test_ConnectionFailed.py", line 29, in <module>
    print 'response.status_code: %s' %(response.status_code)
NameError: name 'response' is not defined

Process finished with exit code 1

Why is this happening and how can I fix it?

SOLVED! I just use another proxy software, then OK!

Asked By: oner ptkh

||

Answers:

The connection could not be resolved for those domains, doing a normal ping operation on the urls yield this result

Command to run:

ping http://download.ithome.com.tw

Result

The host could not be resolved

No response and hence no status line which in normal cases would contain a status code.

Answered By: cafebabe1991

I found that using urllib2 library better than request.

import urllib2
def get_page(url):
  request = urllib2.Request(url)
  request = urllib2.urlopen(request)
  data = request.read()
  return data
url = "http://blog.ithome.com.tw/"
print get_page(url)
Answered By: Hans