Does it make sense to use Conda + Poetry for a Machine Learning project? Allow me to share my (novice) understanding and please correct or enlighten me:
As far as I understand, Conda and Poetry have different purposes but are largely redundant:
My idea is to use both and compartmentalize their roles: let Conda be the environment manager and Poetry the package manager. My reasoning is that (it sounds like) Conda is best for managing environments and can be used for compiling and installing non-python packages, especially CUDA drivers (for GPU capability), while Poetry is more powerful than Conda as a Python package manager.
I’ve managed to make this work fairly easily by using Poetry within a Conda environment. The trick is to not use Poetry to manage the Python environment: I’m not using commands like
poetry shell or
poetry run, only
poetry install etc (after activating the Conda environment).
For full disclosure, my environment.yml file (for Conda) looks like this:
name: N channels: - defaults - conda-forge dependencies: - python=3.9 - cudatoolkit - cudnn
and my poetry.toml file looks like that:
[tool.poetry] name = "N" authors = ["B"] [tool.poetry.dependencies] python = "3.9" torch = "^1.10.1" [build-system] requires = ["poetry-core>=1.0.0"] build-backend = "poetry.core.masonry.api"
To be honest, one of the reasons I proceeded this way is that I was struggling to install CUDA (for GPU support) without Conda.
Does this project design look reasonable to you?
I have experience with a Conda + Poetry setup, and it’s been working fine. The great majority of my dependencies are specified in
pyproject.toml, but when there’s something that’s unavailable in PyPI, or installing it with Conda is easier, I add it to
environment.yml. Moreover, Conda is used as a virtual environment manager, which works well with Poetry: there is no need to use
poetry run or
poetry shell, it is enough to activate the right Conda environment.
environment.yml, so that you get Poetry installed when you run
conda create, along with Python and other non-PyPI dependencies.
conda-lock, which gives you lock files for Conda dependencies, just like you have
poetry.lockfor Poetry dependencies.
mambawhich is generally compatible with
conda, but is better at resolving conflicts, and is also much faster. An additional benefit is that all users of your setup will use the same package resolver, independent from the locally-installed version of Conda.
environment.yml, and after it’s installed, to add an entry with the same version specification to Poetry’s
~before the version number). This will let Poetry know that the package is there and should not be upgraded.
pytorchentry below), and another solution is to enable strict channel priority. Unfortunately, in Conda 4.x there is no way to enable this option through
sys.path, which may cause lack of reproducibility if the user has installed Python packages outside Conda environments. One possible solution is to make sure that the
PYTHONNOUSERSITEenvironment variable is set to
True(or to any other non-empty value).
name: my_project_env channels: - pytorch - conda-forge # We want to have a reproducible setup, so we don't want default channels, # which may be different for different users. All required channels should # be listed explicitly here. - nodefaults dependencies: - python=3.10.* # or don't specify the version and use the latest stable Python - mamba - pip # pip must be mentioned explicitly, or conda-lock will fail - poetry=1.* # or 1.1.*, or no version at all -- as you want - tensorflow=2.8.0 - pytorch::pytorch=1.11.0 - pytorch::torchaudio=0.11.0 - pytorch::torchvision=0.12.0 # Non-standard section listing target platforms for conda-lock: platforms: - linux-64
virtual-packages.yml (may be used e.g. when we want
conda-lock to generate CUDA-enabled lock files even on platforms without CUDA):
subdirs: linux-64: packages: __cuda: 11.5
You can avoid playing with the bootstrap env and simplify the example below if you have
poetry already installed outside your target environment.
# Create a bootstrap env conda create -p /tmp/bootstrap -c conda-forge mamba conda-lock poetry='1.*' conda activate /tmp/bootstrap # Create Conda lock file(s) from environment.yml conda-lock -k explicit --conda mamba # Set up Poetry poetry init --python=~3.10 # version spec should match the one from environment.yml # Fix package versions installed by Conda to prevent upgrades poetry add --lock tensorflow=2.8.0 torch=1.11.0 torchaudio=0.11.0 torchvision=0.12.0 # Add conda-lock (and other packages, as needed) to pyproject.toml and poetry.lock poetry add --lock conda-lock # Remove the bootstrap env conda deactivate rm -rf /tmp/bootstrap # Add Conda spec and lock files git add environment.yml virtual-packages.yml conda-linux-64.lock # Add Poetry spec and lock files git add pyproject.toml poetry.lock git commit
The above setup may seem complex, but it can be used in a fairly simple way.
conda create --name my_project_env --file conda-linux-64.lock conda activate my_project_env poetry install
conda activate my_project_env
# Re-generate Conda lock file(s) based on environment.yml conda-lock -k explicit --conda mamba # Update Conda packages based on re-generated lock file mamba update --file conda-linux-64.lock # Update Poetry packages and re-generate poetry.lock poetry update
To anyone using @michau’s answer but having issues including poetry in the
environment.yml. Currently, poetry versions 1.2 or greater aren’t supported by conda-forge. You can still include poetry v1.2 in the
.yml with the below as an alternative:
dependencies: - python=3.9.* - mamba - pip - pip: - "poetry>=1.2"