How to copy entire structure between storage accounts in python

Question:

my case is the following:

  1. Two Azure Storage Accounts (Source/Destination)
  2. Source Account may contain multiple containers, folders, blobs, etc.
  3. All of the above needs to be copied exactly in the same structure to the DESTINATION account.
  4. If any elements already exist in the Destination account then if they are older then in the SOURCE storage account they need to be overriden.

What I made so far:

from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, BlobLeaseClient, BlobPrefix, ContentSettings

# Set the connection string for the source and destination storage accounts
SOURCE_CONNECTION_STRING = "your SOURCE connection string"
DESTINATION_CONNECTION_STRING = "your DESTINATION connection string"

# Create the BlobServiceClient objects for the source and destination storage accounts
source_blob_service_client = BlobServiceClient.from_connection_string(SOURCE_CONNECTION_STRING)
destination_blob_service_client = BlobServiceClient.from_connection_string(DESTINATION_CONNECTION_STRING)

# List all containers in the source storage account
source_containers = source_blob_service_client.list_containers()

# Iterate through each container in the source storage account
for source_container in source_containers:
    print(f"Processing container '{source_container.name}'...")

    # Create a new container in the destination storage account (if it doesn't exist already)
    destination_container = destination_blob_service_client.get_container_client(source_container.name)
    if not destination_container.exists():
        print(f"Creating container '{source_container.name}' in the destination storage account...")
        destination_container.create_container()

    # Get a list of all blobs in the current source container
    source_container_client = source_blob_service_client.get_container_client(source_container.name)
    source_blobs = source_container_client.list_blobs()
    
    #source_blobs = source_blob_service_client.list_blobs(source_container.name)

    # Iterate through each blob in the current source container
    for source_blob in source_blobs:
        
        # Check if the blob already exists in the destination container
        destination_blob = destination_blob_service_client.get_blob_client(source_container.name, source_blob.name)
        print(source_blob)
        if not destination_blob.exists() or source_blob.last_modified > destination_blob.get_blob_properties().last_modified:
            # Copy the blob to the destination container (with the same directory structure as in the source)
            #source_blob_client = BlobClient.from_blob_url(source_blob.url)
            source_blob_client = BlobClient.from_blob_url(source_blob.url)
            destination_blob.start_copy_from_url(source_url=source_blob.url)

            print(f"Copied blob '{source_blob.name}' to container '{source_container.name}' in the destination storage account.")

However I get an error — AttributeError: ‘BlobProperties’ object has no attribute ‘url’ — while in the this notebook https://github.com/Azure-Samples/AzureStorageSnippets/blob/master/blobs/howto/python/blob-devguide-py/blob-devguide-blobs.py & https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.blobclient?view=azure-python#azure-storage-blob-blobclient-start-copy-from-url – I see it being used.

Can someone suggest what am I doing wrong? I have opted for python due to the iterative requirement (go to the most granular level of each container), which seemed not doable in Synapse via pipeline activities.

Asked By: Thanatos

||

Answers:

I tried in my environment and got below results:

Initially, I got an same error in my environment.

I got an error — AttributeError: ‘BlobProperties’ object has no
attribute ‘url’ — while in the this notebook

The above error occurs due to source_blob object is of type BlobProperties, which doesn’t have a url attribute. Instead, you should use the source_blob_client object you created earlier to get the source blob URL.

Code:

from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, BlobLeaseClient, BlobPrefix, ContentSettings

# Set the connection string for the source and destination storage accounts
SOURCE_CONNECTION_STRING = "<src_connect_strng>"
DESTINATION_CONNECTION_STRING = "<dest_connect_strng>"

# Create the BlobServiceClient objects for the source and destination storage accounts
source_blob_service_client = BlobServiceClient.from_connection_string(SOURCE_CONNECTION_STRING)
destination_blob_service_client = BlobServiceClient.from_connection_string(DESTINATION_CONNECTION_STRING)

# List all containers in the source storage account
source_containers = source_blob_service_client.list_containers()

# Iterate through each container in the source storage account
for source_container in source_containers:
    print(f"Processing container '{source_container.name}'...")

    # Create a new container in the destination storage account (if it doesn't exist already)
    destination_container = destination_blob_service_client.get_container_client(source_container.name)
    if not destination_container.exists():
        print(f"Creating container '{source_container.name}' in the destination storage account...")
        destination_container.create_container()

    # Get a list of all blobs in the current source container
    source_container_client = source_blob_service_client.get_container_client(source_container.name)
    source_blobs = source_container_client.list_blobs()
    
    # Iterate through each blob in the current source container
    for source_blob in source_blobs:
        
        # Check if the blob already exists in the destination container
        destination_blob = destination_blob_service_client.get_blob_client(source_container.name, source_blob.name)
        print(source_blob.name)
        source_blob_client = source_blob_service_client.get_blob_client(source_container.name, source_blob.name)
        print(source_blob_client.url)
        destination_blob.start_copy_from_url(source_url=source_blob_client.url)
        print(f"Copied blob '{source_blob.name}' to container '{source_container.name}' in the destination storage account.")

Console:

The above code executed and successfully copied same structure from one storage account to another storage account using synapse.

enter image description here

Portal:
In portal I can able to see the destination account as same structure as source account.

enter image description here

Answered By: Venkatesan