error installing nltk supporting packages : nltk.download()
Question:
I have installed the nltk package. Following that I am trying to download the supporting packages using nltk.download() and am getting error:
[Errno 11001] getaddrinfo
My machine / software details are:
OS: Windows 8.1
Python: 3.3.4
NLTK Package: 3.0
Below are the commands run in python:
Python 3.3.4 (v3.3.4:7ff62415e426, Feb 10 2014, 18:13:51) [MSC v.1600 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
import nltk
nltk.download()
showing info http://nltk.github.com/nltk_data/
True
nltk.download("all")
[nltk_data] Error loading all: <urlopen error [Errno 11001]
[nltk_data] getaddrinfo failed>
False
It looks like it is going to http://nltk.github.com/nltk_data/ whereas it should Ideally try to get the data from http://www.nltk.org/nltk_data/.
On another machine when we type http://nltk.github.com/nltk_data/ in the browser, it redirects to http://www.nltk.org/nltk_data/. I am not understanding why the redirection is not happening on my laptop.
I feel that this might be the issue.
Kindly help.
I have added the command prompt screenshot. Need help..
Regards,
Bonson
Answers:
Got the solution. The issue in my case was that when the NLTK downloader started it had the server index as – http://nltk.github.com/nltk_data/
This needs to be changed to – http://nltk.org/nltk_data/
You can change this by going into the NLTK Downloader window and the File->Change Server Index.
Regards,
Bonson
it resolved issues for me by “setting http & https proxy in environment variables”
set http_proxy=http://IPN:PWD@ipaddress:port
set https_proxy=https://IPN:PWD@ipaddress:port
ask your network or admin team for this proxy IP address
The Error might be of the proxy that the system has. Refer the following link for the answer, have posted the answer there:
Error in downloading NLTK data: [Errno 11004] getaddrinfo failed
I got this error because of network constraint. Here is how I solved
Browsed http://www.nltk.org/nltk_data/ and downloaded required corpora from the corresponding link.
Then placed the downloaded files in C:/
folder path in windows (or any other relevant directories like C:/ProgramData/Anaconda3
) in a same folder structure mentioned in https://github.com/nltk/nltk_data/tree/gh-pages/packages
Try below code. It has downloaded package as expected
import nltk
import ssl
try:
_create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
pass
else:
ssl._create_default_https_context = _create_unverified_https_context
nltk.download()
Looks before link was broken whicvh been fixed by ssl.
Note :- MAC been used
I was facing this issue on my Jupyter notebook as well. The below code snippet from another stackoverflow answer helped. Just in case it might help someone else –
import socket
socket.getaddrinfo('localhost', 8080)
We also have an option to download the packages using python prompt or from within notebooks with following config. It can be http or https based on your proxy settings.
import nltk
nltk.set_proxy('http://username:[email protected]:port')
I was also facing same problem. Initially I was using broadband(Jio fiber) which might restrict me to download the file(due to security) but then I used mobile internet(through sim card) and it got downloaded and my issue got resolved.
Try the code below to download stopwords or change accordingly :
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
stopwords.words('english')
I have installed the nltk package. Following that I am trying to download the supporting packages using nltk.download() and am getting error:
[Errno 11001] getaddrinfo
My machine / software details are:
OS: Windows 8.1
Python: 3.3.4
NLTK Package: 3.0
Below are the commands run in python:
Python 3.3.4 (v3.3.4:7ff62415e426, Feb 10 2014, 18:13:51) [MSC v.1600 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
import nltk
nltk.download()
showing info http://nltk.github.com/nltk_data/
True
nltk.download("all")
[nltk_data] Error loading all: <urlopen error [Errno 11001]
[nltk_data] getaddrinfo failed>
False
It looks like it is going to http://nltk.github.com/nltk_data/ whereas it should Ideally try to get the data from http://www.nltk.org/nltk_data/.
On another machine when we type http://nltk.github.com/nltk_data/ in the browser, it redirects to http://www.nltk.org/nltk_data/. I am not understanding why the redirection is not happening on my laptop.
I feel that this might be the issue.
Kindly help.
I have added the command prompt screenshot. Need help..
Regards,
Bonson
Got the solution. The issue in my case was that when the NLTK downloader started it had the server index as – http://nltk.github.com/nltk_data/
This needs to be changed to – http://nltk.org/nltk_data/
You can change this by going into the NLTK Downloader window and the File->Change Server Index.
Regards,
Bonson
it resolved issues for me by “setting http & https proxy in environment variables”
set http_proxy=http://IPN:PWD@ipaddress:port
set https_proxy=https://IPN:PWD@ipaddress:port
ask your network or admin team for this proxy IP address
The Error might be of the proxy that the system has. Refer the following link for the answer, have posted the answer there:
Error in downloading NLTK data: [Errno 11004] getaddrinfo failed
I got this error because of network constraint. Here is how I solved
Browsed http://www.nltk.org/nltk_data/ and downloaded required corpora from the corresponding link.
Then placed the downloaded files in C:/
folder path in windows (or any other relevant directories like C:/ProgramData/Anaconda3
) in a same folder structure mentioned in https://github.com/nltk/nltk_data/tree/gh-pages/packages
Try below code. It has downloaded package as expected
import nltk
import ssl
try:
_create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
pass
else:
ssl._create_default_https_context = _create_unverified_https_context
nltk.download()
Looks before link was broken whicvh been fixed by ssl.
Note :- MAC been used
I was facing this issue on my Jupyter notebook as well. The below code snippet from another stackoverflow answer helped. Just in case it might help someone else –
import socket
socket.getaddrinfo('localhost', 8080)
We also have an option to download the packages using python prompt or from within notebooks with following config. It can be http or https based on your proxy settings.
import nltk
nltk.set_proxy('http://username:[email protected]:port')
I was also facing same problem. Initially I was using broadband(Jio fiber) which might restrict me to download the file(due to security) but then I used mobile internet(through sim card) and it got downloaded and my issue got resolved.
Try the code below to download stopwords or change accordingly :
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
stopwords.words('english')