downloading error using nltk.download()
Question:
I am experimenting NLTK package using Python. I tried to downloaded NLTK using nltk.download()
. I got this kind of error message. How to solve this problem? Thanks.
The system I used is Ubuntu installed under VMware. The IDE is Spyder.
After using nltk.download('all')
, it can download some packages, but it gets error message when downloading oanc_masc
Answers:
To download a particular dataset/models, use the nltk.download()
function, e.g. if you are looking to download the punkt
sentence tokenizer, use:
$ python3
>>> import nltk
>>> nltk.download('punkt')
If you’re unsure of which data/model you need, you can start out with the basic list of data + models with:
>>> import nltk
>>> nltk.download('popular')
It will download a list of “popular” resources.
Ensure that you’ve the latest version of NLTK
because it’s always improving and constantly maintain:
$ pip install --upgrade nltk
EDITED
In case anyone is avoiding errors from downloading larger datasets from nltk
, from https://stackoverflow.com/a/38135306/610569
$ rm /Users/<your_username>/nltk_data/corpora/panlex_lite.zip
$ rm -r /Users/<your_username>/nltk_data/corpora/panlex_lite
$ python
>>> import nltk
>>> dler = nltk.downloader.Downloader()
>>> dler._update_index()
>>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed.
>>> dler.download('popular')
And if anyone wants to find nltk_data
directory, see https://stackoverflow.com/a/36383314/610569
And to config nltk_data
path, see https://stackoverflow.com/a/22987374/610569
From command line, after importing nltk, try
nltk.download('popular', halt_on_error=False)
After an error it will ask to retry broken package, just decline with n and it will continue with proper packages.
I had this error:
Resource punkt not found. Please use the NLTK Downloader to obtain the resource: import nltk nltk.download('punkt')
When I tried to solve by writing:
import nltk
nltk.download()
my computer shut downs suddenly and anaconda also closed. When I try to open it always shows an error.
I solved the problem by writing:
import nltk
nltk.download('punkt')
a) in OSX either run
sudo /Applications/Python 3.6/Install Certificates.command
b) switch to admin user (the one you have set up with administrator privileges)
and type at command line:
/Applications/Python 3.6/Install Certificates.command
Notes:
- “” are necessary because they escape blank characters in file names.
- This procedure worked if you have python 3.6 installed, otherwise
change it in order to match your install python version… for this
execute:
ls /Applications
and look at the python directory name you have there.
An easy(hard) way to get over this error is to do the process manually. Just go to the website https://www.nltk.org/nltk_data/
and download the required zip file and extract the contents.
In Windows, go to user/AppData/local/Programs/Python/Python(version)/lib
and create a folder nltk_data. Then create the respective folder. As an example, for ‘punkt’ create the folder tokenizers
and add the folder ‘punkt’ inside the extracted folder to it. This info is mostly given by the terminal itself.
Run your program. Cheers!
EDIT 1: Of course, downloading all files can be time-consuming, but it’s the only option if the "urlopen error" persists.
EDIT 2 It is also mostly your router or network at fault that you are not able to download nltk files. Try changing your network and that should help.
I am experimenting NLTK package using Python. I tried to downloaded NLTK using nltk.download()
. I got this kind of error message. How to solve this problem? Thanks.
The system I used is Ubuntu installed under VMware. The IDE is Spyder.
After using nltk.download('all')
, it can download some packages, but it gets error message when downloading oanc_masc
To download a particular dataset/models, use the nltk.download()
function, e.g. if you are looking to download the punkt
sentence tokenizer, use:
$ python3
>>> import nltk
>>> nltk.download('punkt')
If you’re unsure of which data/model you need, you can start out with the basic list of data + models with:
>>> import nltk
>>> nltk.download('popular')
It will download a list of “popular” resources.
Ensure that you’ve the latest version of NLTK
because it’s always improving and constantly maintain:
$ pip install --upgrade nltk
EDITED
In case anyone is avoiding errors from downloading larger datasets from nltk
, from https://stackoverflow.com/a/38135306/610569
$ rm /Users/<your_username>/nltk_data/corpora/panlex_lite.zip
$ rm -r /Users/<your_username>/nltk_data/corpora/panlex_lite
$ python
>>> import nltk
>>> dler = nltk.downloader.Downloader()
>>> dler._update_index()
>>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed.
>>> dler.download('popular')
And if anyone wants to find nltk_data
directory, see https://stackoverflow.com/a/36383314/610569
And to config nltk_data
path, see https://stackoverflow.com/a/22987374/610569
From command line, after importing nltk, try
nltk.download('popular', halt_on_error=False)
After an error it will ask to retry broken package, just decline with n and it will continue with proper packages.
I had this error:
Resource punkt not found. Please use the NLTK Downloader to obtain the resource: import nltk nltk.download('punkt')
When I tried to solve by writing:
import nltk
nltk.download()
my computer shut downs suddenly and anaconda also closed. When I try to open it always shows an error.
I solved the problem by writing:
import nltk
nltk.download('punkt')
a) in OSX either run
sudo /Applications/Python 3.6/Install Certificates.command
b) switch to admin user (the one you have set up with administrator privileges)
and type at command line:
/Applications/Python 3.6/Install Certificates.command
Notes:
- “” are necessary because they escape blank characters in file names.
- This procedure worked if you have python 3.6 installed, otherwise
change it in order to match your install python version… for this
execute:
ls /Applications
and look at the python directory name you have there.
An easy(hard) way to get over this error is to do the process manually. Just go to the website https://www.nltk.org/nltk_data/
and download the required zip file and extract the contents.
In Windows, go to user/AppData/local/Programs/Python/Python(version)/lib
and create a folder nltk_data. Then create the respective folder. As an example, for ‘punkt’ create the folder tokenizers
and add the folder ‘punkt’ inside the extracted folder to it. This info is mostly given by the terminal itself.
Run your program. Cheers!
EDIT 1: Of course, downloading all files can be time-consuming, but it’s the only option if the "urlopen error" persists.
EDIT 2 It is also mostly your router or network at fault that you are not able to download nltk files. Try changing your network and that should help.