Azure ML Pipeline prohibit file upload
Question:
When creating a Pipeline with Python SDK V2 for Azure ML all contents of my current working directory are uploaded. Can I blacklist some files being upload? E.g. I use load_env(".env")
in order to read some credentials but I don’t wan’t it to be uploaded.
Directory content:
./src
utilities.py # contains helper function to get Azure credentials
.env # contains credentials
conda.yaml
script.py
A minimal pipeline example:
import mldesigner
import mlflow
from azure.ai.ml import MLClient
from azure.ai.ml.dsl import pipeline
from src.utilities import get_credential
credential = get_credential() # calls `load_env(".env") locally
ml_client = MLClient(
credential=credential,
subscription_id="foo",
resource_group_name="bar",
workspace_name="foofoo",
)
@mldesigner.command_component(
name="testcomponent",
display_name="Test Component",
description="Test Component description.",
environment=dict(
conda_file="./conda.yaml",
image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
),
)
def test_component():
mlflow.log_metric("metric", 0)
cluster_name = "foobar"
@pipeline(default_compute=cluster_name)
def pipe():
test_component()
pipeline_job = pipe()
pipeline_job = ml_client.jobs.create_or_update(
pipeline_job, experiment_name="pipeline_samples"
)
After running python script.py
the pipeline job is created and runs in Azure ML. If I have a look at the Pipeline in Azure ML UI and inspect Test Component and the tab Code I find all source files including .env
.
How can I prevent uploading this file using the SDK while creating a pipeline job?
Answers:
you can use a .gitignore
or .amlignore
file in your working directory to specify files and directories to ignore. These files will not be included when you run the pipeline by default.
Here is the document to prevent unnecessary files.
or
# Get all files in the current working directory
all_files = os.listdir()
# Remove ".env" file from the list of files
all_files.remove(".env")
@pipeline(default_compute=cluster_name, files=all_files)
def pipe():
test_component()
pipeline_job = pipe()
pipeline_job = ml_client.jobs.create_or_update(
pipeline_job, experiment_name="pipeline_samples"
)
When creating a Pipeline with Python SDK V2 for Azure ML all contents of my current working directory are uploaded. Can I blacklist some files being upload? E.g. I use load_env(".env")
in order to read some credentials but I don’t wan’t it to be uploaded.
Directory content:
./src
utilities.py # contains helper function to get Azure credentials
.env # contains credentials
conda.yaml
script.py
A minimal pipeline example:
import mldesigner
import mlflow
from azure.ai.ml import MLClient
from azure.ai.ml.dsl import pipeline
from src.utilities import get_credential
credential = get_credential() # calls `load_env(".env") locally
ml_client = MLClient(
credential=credential,
subscription_id="foo",
resource_group_name="bar",
workspace_name="foofoo",
)
@mldesigner.command_component(
name="testcomponent",
display_name="Test Component",
description="Test Component description.",
environment=dict(
conda_file="./conda.yaml",
image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
),
)
def test_component():
mlflow.log_metric("metric", 0)
cluster_name = "foobar"
@pipeline(default_compute=cluster_name)
def pipe():
test_component()
pipeline_job = pipe()
pipeline_job = ml_client.jobs.create_or_update(
pipeline_job, experiment_name="pipeline_samples"
)
After running python script.py
the pipeline job is created and runs in Azure ML. If I have a look at the Pipeline in Azure ML UI and inspect Test Component and the tab Code I find all source files including .env
.
How can I prevent uploading this file using the SDK while creating a pipeline job?
you can use a .gitignore
or .amlignore
file in your working directory to specify files and directories to ignore. These files will not be included when you run the pipeline by default.
Here is the document to prevent unnecessary files.
or
# Get all files in the current working directory
all_files = os.listdir()
# Remove ".env" file from the list of files
all_files.remove(".env")
@pipeline(default_compute=cluster_name, files=all_files)
def pipe():
test_component()
pipeline_job = pipe()
pipeline_job = ml_client.jobs.create_or_update(
pipeline_job, experiment_name="pipeline_samples"
)