I am having trouble downloading nltk's punkt tokenizer

Question:

I’m trying to download punkt, but I’m getting the following error…

>>> import nltk
>>> nltk.download('punkt')
>>> [nltk_data] Error loading punkt: <urlopen error [SSL] unknown error
>>> [nltk_data]     (_ssl.c:590)>
>>> False
>>> 

can someone please help I’ve been trying for days…

Asked By: Blind_Lizard

||

Answers:

I guess the downloader script is broken. As a temporal workaround can manually download the punkt tokenizer from here and then place the unzipped folder in the corresponding location. The default folders for each OS are:

  • Windows: C:nltk_datatokenizers
  • OSX: /usr/local/share/nltk_data/tokenizers
  • Unix: /usr/share/nltk_data/tokenizers

I am not sure but you may find this post helpful.

Answered By: Wasi Ahmad

Though this is an old question, I had the same issue on my mac today. The solution here helped me solve it.

Edit:

Run the following command on the OSX before running nltk.download():

/Applications/Python PYTHON_VERSION_HERE/Install Certificates.command
Answered By: DjangoNoob

Here is detailed instruction to install punkt manually if nltk.download() doesn’t work for you.

Context: I tried to use nltk.word_tokenize() and it throwed the error:

LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt/english.pickle

  Searched in:
    - 'C:\Users\username/nltk_data'
    - 'C:\Users\username\anaconda3\envs\conda-env\nltk_data'

Solution: to download the package manually.

Step 1: Look up corresponding corpus in http://www.nltk.org/nltk_data/. For example, it’s Punkt Tokenizer Models in this case; click download and store in one of the folder mentioned above (if nltk_data folder does not exist, create one). For me, I picked ‘C:Usersusername/nltk_data’.

Step 2: Notice that it said "Attempted to load tokenizers/punkt/english.pickle", that means you must create the same folder structure. I created "tokenizers" folder inside "nltk_data", then copy the unzipped content inside and ensure the file path "C:/Users/username/nltk_data/tokenizers/punkt/english.pickle" valid.

Answered By: Hồ Xuân Vinh
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.