Recursively copy a child directory to the parent in Google Cloud Storage

Question:

I need to recursively move the contents of a sub-folder to a parent folder in google cloud storage. This code works for moving a single file from sub-folder to the parent.

client = storage.Client()
bucket = client.get_bucket(BUCKET_NAME)

source_path = Path(parent_dir, sub_folder, filename).as_posix()
source_blob = bucket.blob(source_path)

dest_path = Path(parent_dir, filename).as_posix()
bucket.copy_blob(source_blob, bucket, dest_path)

but I don’t know how to properly format the command because if my dest_path is "parent_dir", I get the following error:

google.api_core.exceptions.NotFound: 404 POST https://storage.googleapis.com/storage/v1/b/bucket/o/parent_dir%2Fsubfolder/copyTo/b/geo-storage/o/parent_dir?prettyPrint=false: No such object: geo-storage/parent_dir/subfolder

Note: This works for recursive copy with gsutils but I would prefer to use the blob object:

os.system(f"gsutil cp -r gs://bucket/parent_dir/subfolder/* gs://bucket/parent_dir")
Asked By: DanGoodrick

||

Answers:

GCS does not have the concept of "directory" – just a flat namespace of objects. You can have an object named "foo/a.txt" and "foo/b.txt", but there is not actual thing representing "foo/" – its’ just a prefix on the object names.

gsutil uses prefixes on the object names to pretend directories exist, but under the hood it is really just acting on all the individual objects with that prefix.

You need to do the same, with a copy for each object:

client = storage.Client()
bucket = client.Bucket(BUCKET_NAME)

for source_blob in client.list_blobs(BUCKET_NAME, prefix="old/prefix/"):
   dest_name = source_blob.name.replace("old/prefix/", "new/prefix/")
   # do these in parallel for more speed
   bucket.copy_blob(source_blob, bucket, new_name=dest_name)
Answered By: David