Overwrite single file in a Google Cloud Storage bucket, via Python code

Question:

I have a logs.txt file at certain location, in a Compute Engine VM Instance. I want to periodically backup (i.e. overwrite) logs.txt in a Google Cloud Storage bucket. Since logs.txt is the result of some preprocessing made inside a Python script, I want to also use that script to upload / copy that file, into the Google Cloud Storage bucket (therefore, the use of cp cannot be considered an option). Both the Compute Engine VM instance, and the Cloud Storage bucket, stay at the same GCP project, so "they see each other". What I am attempting right now, based on this sample code, looks like:

from google.cloud import storage

bucket_name = "my-bucket"
destination_blob_name = "logs.txt"
source_file_name = "logs.txt"  # accessible from this script

storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)

generation_match_precondition = 0
blob.upload_from_filename(source_file_name, if_generation_match=generation_match_precondition)

print(f"File {source_file_name} uploaded to {destination_blob_name}.")

If gs://my-bucket/logs.txt does not exist, the script works correctly, but if I try to overwrite, I get the following error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 2571, in upload_from_file
    created_json = self._do_upload(
  File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 2372, in _do_upload
    response = self._do_multipart_upload(
  File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 1907, in _do_multipart_upload
    response = upload.transmit(
  File "/usr/local/lib/python3.8/dist-packages/google/resumable_media/requests/upload.py", line 153, in transmit
    return _request_helpers.wait_and_retry(
  File "/usr/local/lib/python3.8/dist-packages/google/resumable_media/requests/_request_helpers.py", line 147, in wait_and_retry
    response = func()
  File "/usr/local/lib/python3.8/dist-packages/google/resumable_media/requests/upload.py", line 149, in retriable_request
    self._process_response(result)
  File "/usr/local/lib/python3.8/dist-packages/google/resumable_media/_upload.py", line 114, in _process_response
    _helpers.require_status_code(response, (http.client.OK,), self._get_status_code)
  File "/usr/local/lib/python3.8/dist-packages/google/resumable_media/_helpers.py", line 105, in require_status_code
    raise common.InvalidResponse(
google.resumable_media.common.InvalidResponse: ('Request failed with status code', 412, 'Expected one of', <HTTPStatus.OK: 200>)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/my_folder/upload_to_gcs.py", line 76, in <module>
    blob.upload_from_filename(source_file_name, if_generation_match=generation_match_precondition)
  File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 2712, in upload_from_filename
    self.upload_from_file(
  File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 2588, in upload_from_file
    _raise_from_invalid_response(exc)
  File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 4455, in _raise_from_invalid_response
    raise exceptions.from_http_status(response.status_code, message, response=response)
google.api_core.exceptions.PreconditionFailed: 412 POST https://storage.googleapis.com/upload/storage/v1/b/production-onementor-dt-data/o?uploadType=multipart&ifGenerationMatch=0: {
  "error": {
    "code": 412,
    "message": "At least one of the pre-conditions you specified did not hold.",
    "errors": [
      {
        "message": "At least one of the pre-conditions you specified did not hold.",
        "domain": "global",
        "reason": "conditionNotMet",
        "locationType": "header",
        "location": "If-Match"
      }
    ]
  }
}
: ('Request failed with status code', 412, 'Expected one of', <HTTPStatus.OK: 200>)

I have checked the documentation for upload_from_filename, but it seems there is no flag to "enable overwritting".

How to properly overwrite a file existing in a Google Cloud Storage Bucket, using Python language?

Asked By: David Espinosa

||

Answers:

It’s because of if_generation_match

As a special case, passing 0 as the value for if_generation_match
makes the operation succeed only if there are no live versions of the
blob.

This is what is meant by the return message "At least one of the pre-conditions you specified did not hold."

You should pass None or leave out that argument altogether.

Answered By: Emanuel P

You can use the Google Cloud Storage client library for Python to upload your logs.txt file from your Compute Engine VM instance to a Cloud Storage bucket in your project. Here’s how:

  1. Install the google-cloud-storage library on your Compute Engine VM instance using pip:

pip install google-cloud-storage

  1. Create a service account key that has write access to the Cloud Storage bucket. You can do this by going to the Google Cloud Console, selecting the project that contains the Compute Engine VM and the Cloud Storage bucket, and creating a new service account with the "Storage Object Creator" role. Download the service account key as a JSON file.

  2. Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of the service account key JSON file:

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service/account/key.json

  1. In your Python script that preprocesses logs.txt, use the following code to upload the file to the Cloud Storage bucket:

    from google.cloud import storage

    bucket_name = ‘my-bucket’
    source_file_name = ‘/path/to/logs.txt’
    destination_blob_name = ‘logs.txt’
    storage_client = storage.Client()

    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(destination_blob_name)
    blob.upload_from_filename(source_file_name)

Replace my-bucket with the name of your Cloud Storage bucket, /path/to/logs.txt with the path to your logs.txt file on the Compute Engine VM instance, and logs.txt with the name you want to give the file in the Cloud Storage bucket.

  1. Set up a cron job on the Compute Engine VM instance to run the Python script periodically. You can do this by editing the crontab file with the following command:

    crontab -e

Then add the following line to the file to run the Python script every hour:

0 * * * * /path/to/python /path/to/script.py

Replace /path/to/python with the path to your Python executable and /path/to/script.py with the path to your Python script that uploads logs.txt to the Cloud Storage bucket.