How to list all blobs inside of a specific subdirectory in Azure Cloud Storage using Python?

Question:

I worked through the example code from the Azure docs https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python

from azure.storage.blob import BlockBlobService
account_name = "x"
account_key = "x"
top_level_container_name = "top_container"

blob_service = BlockBlobService(account_name, account_key)

print("nList blobs in the container")
generator = blob_service.list_blobs(top_level_container_name)
for blob in generator:
    print("t Blob name: " + blob.name)

Now I would like to know how to get more fine grained in my container walking. My container top_level_container_name has several subdirectories

  • top_level_container_name/dir1
  • top_level_container_name/dir2
  • etc in that pattern

I would like to be able to list all of the blobs that are inside just one of those directories. For instance

  • dir1/a.jpg
  • dir1/b.jpg
  • etc

How do I get a generator of just the contents of dir1 without having to walk all of the other dirs? (I would also take a list or dictionary)

I tried adding /dir1 to the name of the top_level_container_name so it would be top_level_container_name = "top_container/dir1" but that didn’t work. I get back an error code azure.common.AzureHttpError: The requested URI does not represent any resource on the server. ErrorCode: InvalidUri

The docs do not seem to even have any info on BlockBlobService.list_blobs() https://learn.microsoft.com/en-us/python/api/azure.storage.blob.blockblobservice.blockblobservice?view=azure-python

Update:
list_blobs() comes from https://github.com/Azure/azure-storage-python/blob/ff51954d1b9d11cd7ecd19143c1c0652ef1239cb/azure-storage-blob/azure/storage/blob/baseblobservice.py#L1202

Asked By: aaron

||

Answers:

Please try something like:

generator = blob_service.list_blobs(top_level_container_name, prefix="dir1/")

This should list blobs and folders in dir1 virtual directory.

If you want to list all blobs inside dir1 virtual directory, please try something like:

generator = blob_service.list_blobs(top_level_container_name, prefix="dir1/", delimiter="")

For more information, please see this link.

Answered By: Gaurav Mantri

Not able to import BlockBlobService. Seems like BlobServiceClient is the new alternative.
Followed the official doc and found this:

from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

Create a Blob Storage Account client

connect_str = <connectionstring>
blob_service_client = BlobServiceClient.from_connection_string(connect_str)

Create a container client

container_name="dummy"
container_client=blob_service_client.get_container_client(container_name)

This will list all blobs in the container inside dir1 folder/directory

blob_list = container_client.list_blobs(name_starts_with="dir1/")
for blob in blob_list:
print("t" + blob.name)
Answered By: Prashant Babber

The module azurebatchload provides for this and more. You can filter on folder or filenames, plus choose to get the the result in various formats:

  • list
  • dictionary with extended info
  • pandas dataframe

1. List a whole container with just the filenames as a list.

from azurebatchload import Utils

list_blobs = Utils(container='containername').list_blobs()

2. List a whole container with just the filenames as a dataframe.

from azurebatchload import Utils

df_blobs = Utils(
   container='containername',
   dataframe=True
).list_blobs()

3. List a folder in a container.

from azurebatchload import Utils

list_blobs = Utils(
   container='containername',
   name_starts_with="foldername/"
).list_blobs()

4. Get extended information a folder.

from azurebatchload import Utils

dict_blobs = Utils(
   container='containername',
   name_starts_with="foldername/",
   extended_info=True
).list_blobs()

5. Get extended information a folder returned as a pandas dataframe.

from azurebatchload import Utils

df_blobs = Utils(
   container='containername',
   name_starts_with="foldername/",
   extended_info=True,
   dataframe=True
).list_blobs()

disclaimer: I am the author of the azurebatchload module.

Answered By: Erfan

To get the blob files inside dir or subdirectory as filepath

from azure.storage.blob import BlockBlobService
blob_service = BlockBlobService(account_name, account_key)
blobfile = []
generator = blob_service.list_blobs(container_name, prefix="filepath/", delimiter="")
for blob in generator:
    blobname = blob.name.split('/')[-1]
    blobfile.append(blobname)
    print("t Blob name: " + blob.name)
print(blobfile)

Replace delimiter="/" to get the blob as a folder in the above code

Answered By: Palash Mondal

the parameter is name_starts_with.
the code will look like this:
container.list_blobs(name_starts_with=prefix_value)

prefix="dir1/" inside the container.

please check the documentation
https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.containerclient?view=azure-python#azure-storage-blob-containerclient-list-blobs

Answered By: user3590035
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.