Python HTTPConnectionPool Failed to establish a new connection: [Errno 11004] getaddrinfo failed

Question

I was wondering if my requests is stopped by the website and I need to set a proxy.I first try to close the http’s connection ,bu I failed.I also try to test my code but now it seems no outputs.Mybe I use a proxy everything will be OK?
Here is the code.

import requests
from urllib.parse import urlencode
import json
from bs4 import BeautifulSoup
import re
from html.parser import HTMLParser
from multiprocessing import Pool
from requests.exceptions import RequestException
import time


def get_page_index(offset, keyword):
    #headers = {'User-Agent':'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50'}
    data = {
        'offset': offset,
        'format': 'json',
        'keyword': keyword,
        'autoload': 'true',
        'count': 20,
        'cur_tab': 1
    }
    url = 'http://www.toutiao.com/search_content/?' + urlencode(data)
    try:
        response = requests.get(url, headers={'Connection': 'close'})
        response.encoding = 'utf-8'
        if response.status_code == 200:
            return response.text
        return None
    except RequestException as e:
        print(e)

def parse_page_index(html):
    data = json.loads(html)
    if data and 'data' in data.keys():
        for item in data.get('data'):
            url = item.get('article_url')
            if url and len(url) < 100:
                yield url

def get_page_detail(url):
    try:
        response = requests.get(url, headers={'Connection': 'close'})
        response.encoding = 'utf-8'
        if response.status_code == 200:
            return response.text
        return None
    except RequestException as e:
        print(e)

def parse_page_detail(html):
    soup = BeautifulSoup(html, 'lxml')
    title = soup.select('title')[0].get_text()
    pattern = re.compile(r'articleInfo: (.*?)},', re.S)
    pattern_abstract = re.compile(r'abstract: (.*?).', re.S)
    res = re.search(pattern, html)
    res_abstract = re.search(pattern_abstract, html)
    if res and res_abstract:
        data = res.group(1).replace(r".replace(/<br />|n|r/ig, '')", "") + '}'
        abstract = res_abstract.group(1).replace(r"'", "")
        content = re.search(r'content: (.*?),', data).group(1)
        source = re.search(r'source: (.*?),', data).group(1)
        time_pattern = re.compile(r'time: (.*?)}', re.S)
        date = re.search(time_pattern, data).group(1)
        date_today = time.strftime('%Y-%m-%d')
        img = re.findall(r'src=&quot;(.*?)&quot', content)
        if date[1:11] == date_today and len(content) > 50 and img:
            return {
                'title': title,
                'content': content,
                'source': source,
                'date': date,
                'abstract': abstract,
                'img': img[0]
            }

def main(offset):
    flag = 1
    html = get_page_index(offset, '光伏')
    for url in parse_page_index(html):
        html = get_page_detail(url)
        if html:
            data = parse_page_detail(html)
            if data:
                html_parser = HTMLParser()
                cwl = html_parser.unescape(data.get('content'))
                data['content'] = cwl
                print(data)
                print(data.get('img'))
                flag += 1
                if flag == 5:
                    break



if __name__ == '__main__':
    pool = Pool()
    pool.map(main, [i*20 for i in range(10)])

and the error is the here!

HTTPConnectionPool(host='tech.jinghua.cn', port=80): Max retries exceeded with url: /zixun/20160720/f191549.shtml (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x00000000048523C8>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

By the way, When I test my code at first it shows everything is OK!
Thanks in advance!

Asked By: cwl

||

Source

Answer 1

It seems to me you’re hitting the limit of connection in the HTTPConnectionPool. Since you start 10 threads at the same time

Try one of the following:

Increase the request timeout (seconds): requests.get('url', timeout=5)
Close the response: Response.close(). Instead of returning response.text, assign response to a varialble, close Response, and then return variable

Answered By: BA.

Answer 2

When I faced this issue I had the following problems

I wasn’t able to do the following
– The requests python module was unable to get information from any url. Although I was able to surf the site with browser, also could get wget or curl to download that page.
– pip install was also not working and use to fail with following errors

Failed to establish a new connection: [Errno 11004] getaddrinfo failed

Certain site blocked me so i tried forcebindip to use another network interface for my python modules and then i removed it. Probably that cause my network to mess up and my request module and even the direct socket module were stuck and not able to fetch any url.

So I followed network configuration reset in the below URL and now I am good.

network configuration reset

Answered By: GreyCells

Answer 3

In case it helps someone else, I faced this same error message:

Client-Request-ID=long-string Retry policy did not allow for a retry: , HTTP status code=Unknown, Exception=HTTPSConnectionPool(host='table.table.core.windows.net', port=443): Max retries exceeded with url: /service(PartitionKey='requests',RowKey='9999') (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001D920ADA970>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')).

…when trying to retrieve a record from Azure Table Storage using

table_service.get_entity(table_name, partition_key, row_key).

My issue:

I had the table_name incorrectly defined.

Answered By: ericOnline

Answer 4

My structural URL was incorrect (after ".com" there was no slash and there was a coupling of another part of the url)

Answered By: Артём Олейник

Answer 5

Sometimes it’s due to a VPN connection. I had the same problem. I wasn’t even capable of installing the package requests via pip. I turned off my VPN and voilà, I managed to install it and also to make requests. The [Errno 11004] code was gone.

Answered By: L'Artiste

Python HTTPConnectionPool Failed to establish a new connection: [Errno 11004] getaddrinfo failed

Question:

Answers: