Azure ML Pipeline prohibit file upload

Question:

When creating a Pipeline with Python SDK V2 for Azure ML all contents of my current working directory are uploaded. Can I blacklist some files being upload? E.g. I use load_env(".env") in order to read some credentials but I don’t wan’t it to be uploaded.

Directory content:

./src
    utilities.py           # contains helper function to get Azure credentials
.env                       # contains credentials
conda.yaml
script.py

A minimal pipeline example:

import mldesigner
import mlflow
from azure.ai.ml import MLClient
from azure.ai.ml.dsl import pipeline

from src.utilities import get_credential

credential = get_credential()  # calls `load_env(".env") locally
ml_client = MLClient(
    credential=credential,
    subscription_id="foo",
    resource_group_name="bar",
    workspace_name="foofoo",
)


@mldesigner.command_component(
    name="testcomponent",
    display_name="Test Component",
    description="Test Component description.",
    environment=dict(
        conda_file="./conda.yaml",
        image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
    ),
)
def test_component():
    mlflow.log_metric("metric", 0)


cluster_name = "foobar"


@pipeline(default_compute=cluster_name)
def pipe():
    test_component()


pipeline_job = pipe()

pipeline_job = ml_client.jobs.create_or_update(
    pipeline_job, experiment_name="pipeline_samples"
)

After running python script.py the pipeline job is created and runs in Azure ML. If I have a look at the Pipeline in Azure ML UI and inspect Test Component and the tab Code I find all source files including .env.

How can I prevent uploading this file using the SDK while creating a pipeline job?

Asked By: Ken Jiiii

||

Answers:

you can use a .gitignore or .amlignore file in your working directory to specify files and directories to ignore. These files will not be included when you run the pipeline by default.

Here is the document to prevent unnecessary files.

or

# Get all files in the current working directory
all_files = os.listdir()
# Remove ".env" file from the list of files
all_files.remove(".env")

@pipeline(default_compute=cluster_name, files=all_files)
def pipe():
    test_component()


pipeline_job = pipe()

pipeline_job = ml_client.jobs.create_or_update(
    pipeline_job, experiment_name="pipeline_samples"
)
Answered By: Ram-msft