Specifying Huggingface model as project dependency

Question:

Is it possible to install huggingface models as a project dependency?

Currently it is downloaded automatically by the SentenceTransformer library, but this means in a docker container it downloads every time it starts.

This is the model I am trying to use: https://huggingface.co/sentence-transformers/all-mpnet-base-v2

I have tried specifying the url as a dependency in my pyproject.toml:

all-mpnet-base-v2 = {git = "https://huggingface.co/sentence-transformers/all-mpnet-base-v2.git", branch = "main"}

The first error I got was that the name was incorrect and it should be called train-script, which I renamed the dependency to, but I’m not sure if this is correct. Now I have:

train-script = {git = "https://huggingface.co/sentence-transformers/all-mpnet-base-v2.git", branch = "main"}

However, now I get the following error:

 Package operations: 1 install, 0 updates, 0 removals

   • Installing train-script (0.0.0 bd44305)

   EnvCommandError

   Command ['/srv/.venv/bin/pip', 'install', '--no-deps', '-U', '/srv/.venv/src/train-script'] errored with the following return code 1, and output:
   ERROR: Directory '/srv/.venv/src/train-script' is not installable. Neither 'setup.py' nor 'pyproject.toml' found.

   [notice] A new release of pip available: 22.2.2 -> 22.3.1
   [notice] To update, run: pip install --upgrade pip


   at /usr/local/lib/python3.10/site-packages/poetry/utils/env.py:1183 in _run
       1179│                 output = subprocess.check_output(
       1180│                     cmd, stderr=subprocess.STDOUT, **kwargs
       1181│                 )
       1182│         except CalledProcessError as e:
     → 1183│             raise EnvCommandError(e, input=input_)
       1184│
       1185│         return decode(output)
       1186│
       1187│     def execute(self, bin, *args, **kwargs):

Is this possible? If not, is there a recommended way to bake the model download into a docker container so it doesn’t need to be downloaded each time?

Asked By: rbhalla

||

Answers:

I was not able to find a native way to do this with project dependency files, so I did this using a multi-stage docker file.

First I clone the model locally, then copy it into the appropriate /root/.cache/torch/ folder.

Here is an example:

FROM python:3.10.3 as model-download-stage

RUN apt update && apt install git-lfs -y

RUN git lfs install

RUN git clone https://huggingface.co/sentence-transformers/all-mpnet-base-v2 /tmp/model
RUN rm -rf /tmp/model/.git

FROM python:3.10.3

COPY --from=model-download-stage /tmp/model /root/.cache/torch/sentence_transformers/sentence-transformers_all-mpnet-base-v2
Answered By: rbhalla