How to cache pip packages within Azure Pipelines
Question:
Although this source provides a lot of information on caching within Azure pipelines, it is not clear how to cache Python pip packages for a Python project.
How to proceed if one is willing to cache Pip packages on an Azure pipelines build?
According to this, it may be so that pip cache will be enabled by default in the future. As far as I know it is not yet the case.
Answers:
I used the pre-commit
documentation as inspiration:
- https://pre-commit.com/#azure-pipelines-example
- https://github.com/asottile/azure-pipeline-templates/blob/master/job–pre-commit.yml
and configured the following Python pipeline with Anaconda:
pool:
vmImage: 'ubuntu-latest'
variables:
CONDA_ENV: foobar-env
CONDA_HOME: /usr/share/miniconda/envs/$(CONDA_ENV)/
steps:
- script: echo "##vso[task.prependpath]$CONDA/bin"
displayName: Add conda to PATH
- task: Cache@2
displayName: Use cached Anaconda environment
inputs:
key: conda | environment.yml
path: $(CONDA_HOME)
cacheHitVar: CONDA_CACHE_RESTORED
- script: conda env create --file environment.yml
displayName: Create Anaconda environment (if not restored from cache)
condition: eq(variables.CONDA_CACHE_RESTORED, 'false')
- script: |
source activate $(CONDA_ENV)
pytest
displayName: Run unit tests
To cache a standard pip install use this:
variables:
# variables are automatically exported as environment variables
# so this will override pip's default cache dir
- name: pip_cache_dir
value: $(Pipeline.Workspace)/.pip
steps:
- task: Cache@2
inputs:
key: 'pip | "$(Agent.OS)" | requirements.txt'
restoreKeys: |
pip | "$(Agent.OS)"
path: $(pip_cache_dir)
displayName: Cache pip
- script: |
pip install -r requirements.txt
displayName: "pip install"
I wasn’t very happy with the standard pip cache implementation that is mentioned in the official documentation. You basically always install your dependencies normally, which means that pip will perform loads of checks that take up time. Pip will find the cached builds (*.whl, *.tar.gz) eventually, but it all takes up time. You can opt to use venv
or conda
instead, but for me it lead to buggy situations with unexpected behaviour. What I ended up doing instead was using pip download
and pip install
separately:
variables:
pipDownloadDir: $(Pipeline.Workspace)/.pip
steps:
- task: Cache@2
displayName: Load cache
inputs:
key: 'pip | "$(Agent.OS)" | requirements.txt'
path: $(pipDownloadDir)
cacheHitVar: cacheRestored
- script: pip download -r requirements.txt --dest=$(pipDownloadDir)
displayName: "Download requirements"
condition: eq(variables.cacheRestored, 'false')
- script: pip install -r requirements.txt --no-index --find-links=$(pipDownloadDir)
displayName: "Install requirements"
Although this source provides a lot of information on caching within Azure pipelines, it is not clear how to cache Python pip packages for a Python project.
How to proceed if one is willing to cache Pip packages on an Azure pipelines build?
According to this, it may be so that pip cache will be enabled by default in the future. As far as I know it is not yet the case.
I used the pre-commit
documentation as inspiration:
- https://pre-commit.com/#azure-pipelines-example
- https://github.com/asottile/azure-pipeline-templates/blob/master/job–pre-commit.yml
and configured the following Python pipeline with Anaconda:
pool:
vmImage: 'ubuntu-latest'
variables:
CONDA_ENV: foobar-env
CONDA_HOME: /usr/share/miniconda/envs/$(CONDA_ENV)/
steps:
- script: echo "##vso[task.prependpath]$CONDA/bin"
displayName: Add conda to PATH
- task: Cache@2
displayName: Use cached Anaconda environment
inputs:
key: conda | environment.yml
path: $(CONDA_HOME)
cacheHitVar: CONDA_CACHE_RESTORED
- script: conda env create --file environment.yml
displayName: Create Anaconda environment (if not restored from cache)
condition: eq(variables.CONDA_CACHE_RESTORED, 'false')
- script: |
source activate $(CONDA_ENV)
pytest
displayName: Run unit tests
To cache a standard pip install use this:
variables:
# variables are automatically exported as environment variables
# so this will override pip's default cache dir
- name: pip_cache_dir
value: $(Pipeline.Workspace)/.pip
steps:
- task: Cache@2
inputs:
key: 'pip | "$(Agent.OS)" | requirements.txt'
restoreKeys: |
pip | "$(Agent.OS)"
path: $(pip_cache_dir)
displayName: Cache pip
- script: |
pip install -r requirements.txt
displayName: "pip install"
I wasn’t very happy with the standard pip cache implementation that is mentioned in the official documentation. You basically always install your dependencies normally, which means that pip will perform loads of checks that take up time. Pip will find the cached builds (*.whl, *.tar.gz) eventually, but it all takes up time. You can opt to use venv
or conda
instead, but for me it lead to buggy situations with unexpected behaviour. What I ended up doing instead was using pip download
and pip install
separately:
variables:
pipDownloadDir: $(Pipeline.Workspace)/.pip
steps:
- task: Cache@2
displayName: Load cache
inputs:
key: 'pip | "$(Agent.OS)" | requirements.txt'
path: $(pipDownloadDir)
cacheHitVar: cacheRestored
- script: pip download -r requirements.txt --dest=$(pipDownloadDir)
displayName: "Download requirements"
condition: eq(variables.cacheRestored, 'false')
- script: pip install -r requirements.txt --no-index --find-links=$(pipDownloadDir)
displayName: "Install requirements"