How to move files in Google Cloud Storage from one bucket to another bucket by Python

Question:

Are there any API function that allow us to move files in Google Cloud Storage from one bucket in another bucket?

The scenario is we want Python to move read files in A bucket to B bucket. I knew that gsutil could do that but not sure Python can support that or not.

Thanks.

Asked By: user3769827

||

Answers:

you can use GCS Client Library Functions documented at [1] to read to one bucket and write to the other and then delete source file.

You can even use the GCS REST API documented at [2].

Link:

[1] – https://developers.google.com/appengine/docs/python/googlecloudstorageclient/functions

[2] – https://developers.google.com/storage/docs/concepts-techniques#overview

Answered By: Paolo P.

Using the google-api-python-client, there is an example on the storage.objects.copy page. After you copy, you can delete the source with storage.objects.delete.

destination_object_resource = {}
req = client.objects().copy(
        sourceBucket=bucket1,
        sourceObject=old_object,
        destinationBucket=bucket2,
        destinationObject=new_object,
        body=destination_object_resource)
resp = req.execute()
print json.dumps(resp, indent=2)

client.objects().delete(
        bucket=bucket1,
        object=old_object).execute()
Answered By: jterrace

Here’s a function I use when moving blobs between directories within the same bucket or to a different bucket.

from google.cloud import storage
import os
    
    os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="path_to_your_creds.json"

def mv_blob(bucket_name, blob_name, new_bucket_name, new_blob_name):
    """
    Function for moving files between directories or buckets. it will use GCP's copy 
    function then delete the blob from the old location.
    
    inputs
    -----
    bucket_name: name of bucket
    blob_name: str, name of file 
        ex. 'data/some_location/file_name'
    new_bucket_name: name of bucket (can be same as original if we're just moving around directories)
    new_blob_name: str, name of file in new directory in target bucket 
        ex. 'data/destination/file_name'
    """
    storage_client = storage.Client()
    source_bucket = storage_client.get_bucket(bucket_name)
    source_blob = source_bucket.blob(blob_name)
    destination_bucket = storage_client.get_bucket(new_bucket_name)

    # copy to new destination
    new_blob = source_bucket.copy_blob(
        source_blob, destination_bucket, new_blob_name)
    # delete in old destination
    source_blob.delete()
    
    print(f'File moved from {source_blob} to {new_blob_name}')
Answered By: dmlee8
def GCP_BUCKET_A_TO_B():                                                                           
    source_bucket = storage_client.get_bucket("Bucket_A_Name")
    filename = [filename.name for filename in 
    list(source_bucket.list_blobs(prefix=""))]
    for i in range (0,len(filename)):
        source_blob = source_bucket.blob(filename[i])
        destination_bucket = storage_client.get_bucket("Bucket_B_Name")
        new_blob = source_bucket.copy_blob(
            source_blob, destination_bucket, filename[i])  
Answered By: Sarthak'Khare

I just wanted to point out that there’s another possible approach and that is using gsutil through the use of the subprocess module.

The advantages of using gsutil like that:

  • You don’t have to deal with individual blobs
  • gsutil’s implementation of the move and especially rsync will probably be much better and more resilient that what we do ourselves.

The disadvantages:

  • You can’t deal with individual blobs easily
  • It’s hacky and generally a library is preferable to executing shell commands

Example:

def move(source_uri: str,
         destination_uri: str) -> None:
    """
    Move file from source_uri to destination_uri.

    :param source_uri: gs:// - like uri of the source file/directory
    :param destination_uri: gs:// - like uri of the destination file/directory
    :return: None
    """
    cmd = f"gsutil -m mv {source_uri} {destination_uri}"
    subprocess.run(cmd)
Answered By: dom
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.