AWS Batch analog in GCP?

Question:

I was using AWS and am new to GCP. One feature I used heavily was AWS Batch, which automatically creates a VM when the job is submitted and deletes the VM when the job is done. Is there a GCP counterpart? Based on my research, the closest is GCP Dataflow. The GCP Dataflow documentation led me to Apache Beam. But when I walk through the examples here (link), it feels totally different from AWS Batch.

Any suggestions on submitting jobs for batch processing in GCP? My requirement is to simply retrieve data from Google Cloud Storage, analyze the data using a Python script, and then put the result back to Google Cloud Storage. The process can take overnight and I don’t want the VM to be idle when the job is finished but I’m sleeping.

Asked By: Hung-Yi Wu

||

Answers:

Officially, according to the “Map AWS services to Google Cloud Platform products” page, there is no direct equivalent but you can put a few things together that might get you to get close.

I wasn’t sure if you were or had the option to run your python code in Docker. Then the Kubernetes controls might do the trick. From the GCP docs:

Note: Beginning with Kubernetes version 1.7, you can specify a minimum size of zero for your node pool. This allows your node pool to scale down completely if the instances within aren’t required to run your workloads. However, while a node pool can scale to a zero size, the overall cluster size does not scale down to zero nodes (as at least one node is always required to run system Pods).

So, if you are running other managed instances anyway you can scale up or down to and from 0 but you have the Kubernetes node is still active and running the pods.

I’m guessing you are already using something like “Creating API Requests and Handling Responses” to get an ID you can verify that the process is started, instance created, and the payload is processing. You can use that same process to submit that the process completes as well. That takes care of the instance creation and launch of the python script.

You could use Cloud Pub/Sub. That can help you keep track of the state of that: can you modify your python to notify the completion of the task? When you create the task and launch the instance, you can also report that the python job is complete and then kick off an instance tear down process.

Another thing you can do to drop costs is to use Preemptible VM Instances so that the instances run at 1/2 cost and will run a maximum of 1 day anyway.

Hope that helps.

Answered By: Roy Tokeshi

I think the Cron job can help you in this regard and you can implement it with the help of App engine, Pub/sub and Compute engine. Reliable Task Scheduling on Google Compute Engine In distributed systems, such as a network of Google Compute Engine instances, it is challenging to reliably schedule tasks because any individual instance may become unavailable due to autoscaling or network partitioning.

Google App Engine provides a Cron service. Using this service for scheduling and Google Cloud Pub/Sub for distributed messaging, you can build an application to reliably schedule tasks across a fleet of Compute Engine instances.

For a detailed look you can check it here: https://cloud.google.com/solutions/reliable-task-scheduling-compute-engine

Answered By: Abdul Rehman

I recommend checking out dsub. It’s an open-source tool initially developed by the Google Genomics teams for doing batch processing on Google Cloud.

Answered By: Paul Billing-Ross

The Product that best suits your use-case in GCP is Cloud Task. We are using it for a similar use-case where we are retrieving files from another HTTP server and after some processing storing them in Google Cloud Storage.

This GCP documentation describes in full detail the steps to create tasks and using them.

You schedule your task programmatically in Cloud Tasks and you have to create task handlers(worker services) in the App Engine. Some limitation For worker services running in App Engine

  • the standard environment:

    • Automatic scaling: task processing must finish in 10 minutes.
    • Manual and basic scaling: requests can run up to 24 hours.
  • the flex environment: all types have a 60 minutes timeout.

Answered By: Ilyas

You can do this using AI Platform Jobs which is now able to run arbitrary docker images:

gcloud ai-platform jobs submit training $JOB_NAME 
       --scale-tier BASIC  
       --region $REGION  
       --master-image-uri gcr.io/$PROJECT_ID/some-image

You can define the master instance type and even additional worker instances if desired. They should consider creating a sibling product without the AI buzzword so people can find this functionality easier.

Answered By: Cristian Garcia

UPDATE: I have now used this service and I think it’s awesome.

As of July 13, 2022, GCP now has it’s own new fully managed Batch processing service (GCP Batch), which seems very akin to AWS Batch.

See the GCP Blog post announcing it at: https://cloud.google.com/blog/products/compute/new-batch-service-processes-batch-jobs-on-google-cloud (with links to docs as well)

Answered By: Max Power