I am having trouble downloading nltk's punkt tokenizer
Question:
I’m trying to download punkt, but I’m getting the following error…
>>> import nltk
>>> nltk.download('punkt')
>>> [nltk_data] Error loading punkt: <urlopen error [SSL] unknown error
>>> [nltk_data] (_ssl.c:590)>
>>> False
>>>
can someone please help I’ve been trying for days…
Answers:
I guess the downloader script is broken. As a temporal workaround can manually download the punkt tokenizer from here and then place the unzipped folder in the corresponding location. The default folders for each OS are:
- Windows:
C:nltk_datatokenizers
- OSX:
/usr/local/share/nltk_data/tokenizers
- Unix:
/usr/share/nltk_data/tokenizers
I am not sure but you may find this post helpful.
Though this is an old question, I had the same issue on my mac today. The solution here helped me solve it.
Edit:
Run the following command on the OSX before running nltk.download():
/Applications/Python PYTHON_VERSION_HERE/Install Certificates.command
Here is detailed instruction to install punkt
manually if nltk.download()
doesn’t work for you.
Context: I tried to use nltk.word_tokenize()
and it throwed the error:
LookupError:
**********************************************************************
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('punkt')
For more information see: https://www.nltk.org/data.html
Attempted to load tokenizers/punkt/english.pickle
Searched in:
- 'C:\Users\username/nltk_data'
- 'C:\Users\username\anaconda3\envs\conda-env\nltk_data'
Solution: to download the package manually.
Step 1: Look up corresponding corpus in http://www.nltk.org/nltk_data/. For example, it’s Punkt Tokenizer Models in this case; click download and store in one of the folder mentioned above (if nltk_data
folder does not exist, create one). For me, I picked ‘C:Usersusername/nltk_data’.
Step 2: Notice that it said "Attempted to load tokenizers/punkt/english.pickle", that means you must create the same folder structure. I created "tokenizers" folder inside "nltk_data", then copy the unzipped content inside and ensure the file path "C:/Users/username/nltk_data/tokenizers/punkt/english.pickle" valid.
I’m trying to download punkt, but I’m getting the following error…
>>> import nltk
>>> nltk.download('punkt')
>>> [nltk_data] Error loading punkt: <urlopen error [SSL] unknown error
>>> [nltk_data] (_ssl.c:590)>
>>> False
>>>
can someone please help I’ve been trying for days…
I guess the downloader script is broken. As a temporal workaround can manually download the punkt tokenizer from here and then place the unzipped folder in the corresponding location. The default folders for each OS are:
- Windows:
C:nltk_datatokenizers
- OSX:
/usr/local/share/nltk_data/tokenizers
- Unix:
/usr/share/nltk_data/tokenizers
I am not sure but you may find this post helpful.
Though this is an old question, I had the same issue on my mac today. The solution here helped me solve it.
Edit:
Run the following command on the OSX before running nltk.download():
/Applications/Python PYTHON_VERSION_HERE/Install Certificates.command
Here is detailed instruction to install punkt
manually if nltk.download()
doesn’t work for you.
Context: I tried to use nltk.word_tokenize()
and it throwed the error:
LookupError:
**********************************************************************
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('punkt')
For more information see: https://www.nltk.org/data.html
Attempted to load tokenizers/punkt/english.pickle
Searched in:
- 'C:\Users\username/nltk_data'
- 'C:\Users\username\anaconda3\envs\conda-env\nltk_data'
Solution: to download the package manually.
Step 1: Look up corresponding corpus in http://www.nltk.org/nltk_data/. For example, it’s Punkt Tokenizer Models in this case; click download and store in one of the folder mentioned above (if nltk_data
folder does not exist, create one). For me, I picked ‘C:Usersusername/nltk_data’.
Step 2: Notice that it said "Attempted to load tokenizers/punkt/english.pickle", that means you must create the same folder structure. I created "tokenizers" folder inside "nltk_data", then copy the unzipped content inside and ensure the file path "C:/Users/username/nltk_data/tokenizers/punkt/english.pickle" valid.