error installing nltk supporting packages : nltk.download()

Question:

I have installed the nltk package. Following that I am trying to download the supporting packages using nltk.download() and am getting error:

[Errno 11001] getaddrinfo

My machine / software details are:

OS: Windows 8.1
Python: 3.3.4
NLTK Package: 3.0

Below are the commands run in python:

Python 3.3.4 (v3.3.4:7ff62415e426, Feb 10 2014, 18:13:51) [MSC v.1600 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.

import nltk

nltk.download()
showing info http://nltk.github.com/nltk_data/
True

nltk.download("all")
[nltk_data] Error loading all: <urlopen error [Errno 11001]
[nltk_data]     getaddrinfo failed>
False

enter image description here

It looks like it is going to http://nltk.github.com/nltk_data/ whereas it should Ideally try to get the data from http://www.nltk.org/nltk_data/.

On another machine when we type http://nltk.github.com/nltk_data/ in the browser, it redirects to http://www.nltk.org/nltk_data/. I am not understanding why the redirection is not happening on my laptop.

I feel that this might be the issue.

Kindly help.

I have added the command prompt screenshot. Need help..

enter image description here

Regards,
Bonson

Asked By: Bonson

||

Answers:

Got the solution. The issue in my case was that when the NLTK downloader started it had the server index as – http://nltk.github.com/nltk_data/

This needs to be changed to – http://nltk.org/nltk_data/

You can change this by going into the NLTK Downloader window and the File->Change Server Index.

Regards,
Bonson

Answered By: Bonson

it resolved issues for me by “setting http & https proxy in environment variables”

set http_proxy=http://IPN:PWD@ipaddress:port
set https_proxy=https://IPN:PWD@ipaddress:port

ask your network or admin team for this proxy IP address

Answered By: jawad.shaik shaik

The Error might be of the proxy that the system has. Refer the following link for the answer, have posted the answer there:

Error in downloading NLTK data: [Errno 11004] getaddrinfo failed

Answered By: Ranjeet

I got this error because of network constraint. Here is how I solved

Browsed http://www.nltk.org/nltk_data/ and downloaded required corpora from the corresponding link.

Then placed the downloaded files in C:/ folder path in windows (or any other relevant directories like C:/ProgramData/Anaconda3) in a same folder structure mentioned in https://github.com/nltk/nltk_data/tree/gh-pages/packages

Answered By: Avijit Das

Try below code. It has downloaded package as expected

import nltk
import ssl

try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context

nltk.download()

Looks before link was broken whicvh been fixed by ssl.

Note :- MAC been used

Answered By: Swarit Agarwal

I was facing this issue on my Jupyter notebook as well. The below code snippet from another stackoverflow answer helped. Just in case it might help someone else –

import socket
socket.getaddrinfo('localhost', 8080)

Ref : "getaddrinfo failed", what does that mean?

Answered By: Sneha Valabailu

We also have an option to download the packages using python prompt or from within notebooks with following config. It can be http or https based on your proxy settings.

import nltk
nltk.set_proxy('http://username:[email protected]:port')
Answered By: Arun

I was also facing same problem. Initially I was using broadband(Jio fiber) which might restrict me to download the file(due to security) but then I used mobile internet(through sim card) and it got downloaded and my issue got resolved.

Try the code below to download stopwords or change accordingly :

import nltk

nltk.download('stopwords')

from nltk.corpus import stopwords

stopwords.words('english')
Answered By: Mohit Yadav
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.