dbx execute install from azure artifacts / private pypi

Question:

I would like to use dbx execute to run a task/job on an azure databricks cluster.
However, i cannot make it install my code.

More Details on the situation:

  • Project A with a setup.py is dependent on Project B
  • Project B is also python based and is realeased as a azure devops artifact
  • I can successfully install A by using an init script on an azure databricks cluster by git clone both projects in the init script and then pip install -e project B and A.
  • It also works when i create a pip.conf file in the init script which configures a token to use my artifacts feed
  • So dbx deploy/launch works fine as my clusters use the init script
  • However dbx execute always fails telling me that it cannot find and install Project B

Does anyone know how to configure the pip which is used during dbx execute installation process? Somehow this seems to be ignoring any conf which was set with init scripts.

I searched through lots of documentation such as https://docs.databricks.com/libraries/index.html and
https://dbx.readthedocs.io/en/latest/reference/deployment/#advanced-package-dependency-management but with no luck

When i look into dbx package seems not that there is an option to set any pip.conf 🙁
https://github.com/databrickslabs/dbx/blob/main/dbx/commands/execute.py

Asked By: thompson

||

Answers:

I raised an issue also in the github repo of dbx.
https://github.com/databrickslabs/dbx/issues/669
They pointed me to this link

https://dbx.readthedocs.io/en/latest/guides/general/dependency_management/?h=custom+rep#installing-python-packages-from-custom-pypi-repos

which explains how to do it.

In short. Overwrite the global pip.conf in /etc/pip.conf in your init.sh

#!/bin/bash

echo """[global]
index-url=https://pypi.org/simple
extra-index-url=https://my.custom.pypi.example.com/simple/
""" > /etc/pip.conf

To make it work with azure devops. I created an azure devops personal access token and adapted extra-index-url looked like this:

https://<anyname>:<token_with_read_package_permissions>@pkgs.dev.azure.com/<organisation>/<project>/_packaging/<feedname>/pypi/simple/

replace all values in <….> with your values. can have any value as the token is enough for authentication

Answered By: thompson