How do I install the same pip dependencies locally as are installed in my Cloud Composer Airflow environment on GCP?

Question:

I’m trying to set up a local development environment in VS Code where I’d get code completion for the packages Cloud Composer/Apache Airflow uses. I’ve been successful so far using a virtual environment (created with python -m venv .venv) and a very minimal requirements.txt file that contains just the Airflow package, installed into the local environment.

The file is like this:

apache-airflow==1.10.15

And I can install it into my virtual environment by running pip install -r requirements.txt after activating my virtual environment in VS Code, after which I get code completion in VS Code for the quickstart DAG in their docs, the BashOperator:

I wanted to get more code completion as I followed more tutorials. For example, following the KubernetesPodOperator tutorial (https://cloud.google.com/composer/docs/how-to/using/using-kubernetes-pod-operator), I get this error, and VS Code doesn’t recognize the import:

Import "airflow.providers.cncf.kubernetes.operators.kubernetes_pod" could not be resolved Pylance(reportMissingImports)

I figured that a good next step would be to install exactly the same PyPI packages into my virtual environment as are running in the Cloud Composer environment. I used the page https://cloud.google.com/composer/docs/concepts/versioning/composer-versions to see which packages were installed:

versions in GCP UI

So my requirements.txt file then looked like this:

absl-py==1.0.0
alembic==1.5.7
amqp==2.6.1
apache-airflow==1.10.15+composer
apache-airflow-backport-providers-apache-beam==2021.3.13
apache-airflow-backport-providers-cncf-kubernetes==2021.3.3
apache-airflow-backport-providers-google==2022.4.1+composer
apache-beam==2.37.0
apispec==1.3.3
appdirs==1.4.4
argcomplete==1.12.2
astunparse==1.6.3
attrs==20.3.0
Babel==2.9.0
bcrypt==3.2.0
billiard==3.6.3.0
cached-property==1.5.2
cachetools==4.2.1
cattrs==1.1.2
celery==4.4.7
certifi==2020.12.5
cffi==1.14.5
chardet==4.0.0
click==6.7
cloudpickle==2.0.0
colorama==0.4.4
colorlog==4.0.2
configparser==3.5.3
crcmod==1.7
croniter==0.3.37
cryptography==3.4.6
defusedxml==0.7.1
dill==0.3.1.1
distlib==0.3.1
dnspython==2.1.0
docopt==0.6.2
docutils==0.16
email-validator==1.1.2
fastavro==1.3.4
fasteners==0.17.3
filelock==3.0.12
Flask==1.1.2
Flask-Admin==1.5.4
Flask-AppBuilder==2.3.4
Flask-Babel==1.0.0
Flask-Bcrypt==0.7.1
Flask-Caching==1.3.3
Flask-JWT-Extended==3.25.1
Flask-Login==0.4.1
Flask-OpenID==1.3.0
Flask-SQLAlchemy==2.5.1
flask-swagger==0.2.14
Flask-WTF==0.14.3
flower==0.9.7
funcsigs==1.0.2
future==0.18.2
gast==0.3.3
google-ads==7.0.0
google-api-core==1.31.5
google-api-python-client==1.12.8
google-apitools==0.5.31
google-auth==1.28.0
google-auth-httplib2==0.1.0
google-auth-oauthlib==0.4.3
google-cloud-aiplatform==1.12.1
google-cloud-automl==2.7.2
google-cloud-bigquery==1.28.0
google-cloud-bigquery-datatransfer==3.6.1
google-cloud-bigquery-storage==2.6.3
google-cloud-bigtable==1.7.0
google-cloud-build==2.0.0
google-cloud-container==1.0.1
google-cloud-core==1.6.0
google-cloud-datacatalog==3.7.1
google-cloud-dataplex==0.2.1
google-cloud-dataproc==3.3.1
google-cloud-dataproc-metastore==1.5.0
google-cloud-datastore==1.15.3
google-cloud-dlp==1.0.0
google-cloud-kms==2.11.1
google-cloud-language==1.3.0
google-cloud-logging==2.2.0
google-cloud-memcache==1.3.1
google-cloud-monitoring==2.0.0
google-cloud-os-login==2.6.1
google-cloud-pubsub==2.12.0
google-cloud-pubsublite==1.4.1
google-cloud-redis==2.8.0
google-cloud-resource-manager==1.4.1
google-cloud-secret-manager==1.0.0
google-cloud-spanner==1.19.1
google-cloud-speech==1.3.2
google-cloud-storage==1.36.2
google-cloud-tasks==2.8.1
google-cloud-texttospeech==1.0.1
google-cloud-translate==1.7.0
google-cloud-videointelligence==1.16.1
google-cloud-vision==1.0.0
google-cloud-workflows==1.6.1
google-crc32c==1.1.2
google-pasta==0.2.0
google-resumable-media==1.2.0
googleapis-common-protos==1.53.0
graphviz==0.16
grpc-google-iam-v1==0.12.3
grpcio==1.44.0
grpcio-gcp==0.2.2
grpcio-status==1.44.0
gunicorn==20.0.4
h5py==2.10.0
hdfs==2.6.0
httplib2==0.17.4
humanize==3.3.0
idna==2.8
importlib-metadata==2.1.1
importlib-resources==1.5.0
iso8601==0.1.14
itsdangerous==1.1.0
Jinja2==2.11.3
json-merge-patch==0.2
jsonschema==3.2.0
Keras-Preprocessing==1.1.2
kombu==4.6.11
kubernetes==11.0.0
lazy-object-proxy==1.4.3
libcst==0.3.17
lockfile==0.12.2
Mako==1.1.4
Markdown==2.6.11
MarkupSafe==1.1.1
marshmallow==2.21.0
marshmallow-enum==1.5.1
marshmallow-sqlalchemy==0.23.1
mock==2.0.0
monotonic==1.5
mypy-extensions==0.4.3
mysqlclient==1.3.14
natsort==7.1.1
numpy==1.19.5
oauth2client==4.1.3
oauthlib==3.1.0
opt-einsum==3.3.0
orjson==3.6.8
overrides==6.1.0
packaging==20.9
pandas==1.1.5
pandas-gbq==0.14.1
pbr==5.8.1
pendulum==1.4.4
pip==20.1.1
pipdeptree==1.0.0
prison==0.1.3
prometheus-client==0.8.0
proto-plus==1.18.1
protobuf==3.15.6
psutil==5.8.0
psycopg2-binary==2.8.6
pyarrow==2.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
pydata-google-auth==1.1.0
pydot==1.4.2
Pygments==2.8.1
PyJWT==1.7.1
pymongo==3.11.3
pyOpenSSL==20.0.1
pyparsing==2.4.7
pyrsistent==0.17.3
python-daemon==2.3.0
python-dateutil==2.8.1
python-editor==1.0.4
python-http-client==3.3.4
python-nvd3==0.15.0
python-slugify==4.0.1
python3-openid==3.2.0
pytz==2021.1
pytzdata==2020.1
PyYAML==5.4.1
redis==3.5.3
requests==2.25.1
requests-oauthlib==1.3.0
rsa==4.7.2
scipy==1.4.1
sendgrid==5.6.0
setproctitle==1.2.2
setuptools==57.5.0
six==1.15.0
SQLAlchemy==1.3.20
SQLAlchemy-JSONField==0.9.0
SQLAlchemy-Utils==0.36.8
statsd==3.3.0
tabulate==0.8.9
tenacity==4.12.0
tensorboard==2.2.2
tensorboard-plugin-wit==1.8.1
tensorflow==2.2.0
tensorflow-estimator==2.2.0
termcolor==1.1.0
text-unidecode==1.3
thrift==0.13.0
tornado==5.1.1
typing-extensions==3.7.4.3
typing-inspect==0.6.0
typing-utils==0.1.0
tzlocal==1.5.1
unicodecsv==0.14.1
uritemplate==3.0.1
urllib3==1.26.4
vine==1.3.0
virtualenv==20.4.3
websocket-client==0.58.0
Werkzeug==0.16.1
wheel==0.37.1
wrapt==1.12.1
WTForms==2.3.3
zipp==3.4.1
zope.deprecation==4.4.0

When I tried running pip install -r requirements.txt again, I get the following error:

ERROR: Could not find a version that satisfies the requirement apache-airflow==1.10.15+composer (from versions: 1.10.9-bin, 1.8.1, 1.8.2rc1, 1.8.2, 1.9.0, 1.10.0, 1.10.1b1, 1.10.1rc2, 1.10.1, 1.10.2b2, 1.10.2rc1, 1.10.2rc2, 1.10.2rc3, 1.10.2, 1.10.3b1, 1.10.3b2, 1.10.3rc1, 1.10.3rc2, 1.10.3, 1.10.4b2, 1.10.4rc1, 1.10.4rc2, 1.10.4rc3, 1.10.4rc4, 1.10.4rc5, 1.10.4, 1.10.5rc1, 1.10.5, 1.10.6rc1, 1.10.6rc2, 1.10.6, 1.10.7rc1, 1.10.7rc2, 1.10.7rc3, 1.10.7, 1.10.8rc1, 1.10.8, 1.10.9rc1, 1.10.9, 1.10.10rc1, 1.10.10rc2, 1.10.10rc3, 1.10.10rc4, 1.10.10rc5, 1.10.10, 1.10.11rc1, 1.10.11rc2, 1.10.11, 1.10.12rc1, 1.10.12rc2, 1.10.12rc3, 1.10.12rc4, 1.10.12, 1.10.13rc1, 1.10.13, 1.10.14rc1, 1.10.14rc2, 1.10.14rc3, 1.10.14rc4, 1.10.14, 1.10.15rc1, 1.10.15, 2.0.0b1, 2.0.0b2, 2.0.0b3, 2.0.0rc1, 2.0.0rc2, 2.0.0rc3, 2.0.0, 2.0.1rc1, 2.0.1rc2, 2.0.1, 2.0.2rc1, 2.0.2, 2.1.0rc1, 2.1.0rc2, 2.1.0, 2.1.1rc1, 2.1.1, 2.1.2rc1, 2.1.2, 2.1.3rc1, 2.1.3, 2.1.4rc1, 2.1.4rc2, 2.1.4, 2.2.0b1, 2.2.0b2, 2.2.0rc1, 2.2.0, 2.2.1rc1, 2.2.1rc2, 2.2.1, 2.2.2rc1, 2.2.2rc2, 2.2.2, 2.2.3rc1, 2.2.3rc2, 2.2.3, 2.2.4rc1, 2.2.4, 2.2.5rc1, 2.2.5rc2, 2.2.5rc3, 2.2.5, 2.3.0b1, 2.3.0rc1, 2.3.0rc2, 2.3.0)
ERROR: No matching distribution found for apache-airflow==1.10.15+composer

When I looked at the PyPI website, I noticed that some of the packages that have "+composer" in their name in requirements.txt don’t exist in PyPI. For example, apache-airflow==1.10.15+composer and apache-airflow-backport-providers-google==2022.4.1+composer don’t exist there. Does this mean that those packages are not publicly available? I’m relatively new to Python and Airflow, so these are just some ideas I’ve been thinking of since I encountered this issue. I may be on the wrong track.

I’d appreciate any help I can get here in installing these packages into my local virtual environment, or installing some other packages that would achieve my goal of being able to do local development, with code completion, on DAGs.

Here’s the script I used to create my environment for this test, for reference:

#!/bin/bash

gcloud composer environments create my-environment 
    --location us-central1 
    --image-version composer-1.18.8-airflow-1.10.15 # uses Python 3.8.12

Asked By: Matt Welke

||

Answers:

So the two incompatibilities in Cloud Composer dependencies as listed on the official website are apache-airflow and apache-airflow-providers-google (or apache-airflow-backport-providers-google if you are using Cloud Composer v1).


What you need to do is to replace these two dependencies with the correct pins.

For example, if you are running composer-2.0.16-airflow-2.2.5 version that specifies the two dependencies as

apache-airflow==2.2.5+composer
apache-airflow-providers-google==2022.5.18+composer

You need to replace them with

apache-airflow==2.2.5
apache-airflow-providers-google==7.0.0

If you are wondering how I came up with the specific version for apache-airflow-providers-google then what you need to do is head the page containing the list of commits included in each release.

At the top of each release, you can see the date of the latest commit. Then the specific package version will be the one with the latest ‘Latest change’ prior to the date specified in the original listing on Cloud Composer version page (in this example that’d be 2022.5.18).

enter image description here


Note that for some specific composer versions, the apache-ariflow-providers-google dependency is specified explicitly (.e.g 6.7.0 or 6.8.0). Not sure if the date convention is there by mistake or perhaps a convention that we are not aware of (?)

Answered By: Giorgos Myrianthous

Composer Local Development CLI tool streamlines Apache Airflow DAG development for Cloud Composer 2 by running an Airflow environment locally. This local Airflow environment uses an image of a specific Cloud Composer version.

You can create a local Airflow environment based on an existing Cloud Composer environment. In this case, the local Airflow environment takes the list of installed PyPI packages and environment variable names from your Cloud Composer environment.

https://cloud.google.com/composer/docs/composer-2/run-local-airflow-environments

Answered By: SANN3