spaCy and spaCy models in setup.py

Question:

In my project I have spaCy as a dependency in my setup.py, but I want to add also a default model.

My attempt so far has been:

install_requires=['spacy', 'en_core_web_sm'],
dependency_links=['https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm'],

inside my setup.py, but both a regular pip install of my package and a pip install --process-dependency-links return:

pip._internal.exceptions.DistributionNotFound: No matching distribution found for en_core_web_sm (from mypackage==0.1)

I found this github issue from AllenAI with the same problem and no solution.

Note that if I pip install the url of the model directly, it works fine, but I want to install it as a dependency when my package is install with pip install.

Asked By: w4nderlust

||

Answers:

Not sure if this works for you, but in setup.py you might try:

os.system('python -m spacy download en')

after calling setuptools.setup(...)

edit:

According to spaCy docs, it looks like you can now add SpaCy models to your requirements.txt via url as well. You should then be able to import the model as a module where it is required:

import en_core_web_sm
nlp = en_core_web_sm.load()

Ref: https://spacy.io/usage/models

Answered By: Wes Doyle

You can use pip’s recent support for PEP 508 URL requirements:

install_requires=[
    'spacy',
    'en_core_web_sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz',
],

Note that this requires you to build your project with up-to-date versions of setuptools and wheel (at least v0.32.0 for wheel; not sure about setuptools), and your users will only be able to install your project if they’re using at least version 18.1 of pip.

More importantly, though, this is not a viable solution if you intend to distribute your package on PyPI; quoting pip’s release notes:

As a security measure, pip will raise an exception when installing packages from PyPI if those packages depend on packages not also hosted on PyPI. In the future, PyPI will block uploading packages with such external URL dependencies directly.

Answered By: jwodder

Here is my workaround for a PyPi-installable package (edited slightly for clarity):

try:
    nlp = spacy.load('en')
except OSError:
    print('Downloading language model for the spaCy POS taggern'
        "(don't worry, this will only happen once)", file=stderr)
    from spacy.cli import download
    download('en')
    nlp = spacy.load('en')

It’s cumbersome, but at least it works without having to involve the user. I’m trying to convince the spaCy team to package the most important model files for PyPi.

Answered By: Christian Siefkes

Here’s an example of a slightly more "pythonic" way to handle spaCy model downloads, based on https://github.com/explosion/spaCy/issues/4592#issuecomment-704373657:

spacy_model: str = "en-core-web-sm"

if not spacy.util.is_package(spacy_model):
    spacy.cli.download(spacy_model)

nlp: spacy.Language = spacy.load(spacy_model)
Answered By: Paco
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.