Include only blob 'name' property in list_blobs() response? – Azure Python SDK

Question:

Currently, I am using the list_blobs() function in the Azure Python SDK to list all of the blobs within a container. However, in terms of the metadata/info of the blobs, I only require the names of the blobs.

In my container, there are over 1M blobs, and executing the following to access the name of each blob is not very efficient, since list_blobs() has to retrieve a lot of info on each blob (1M+ total) in its response, and this process takes over 15 minutes to complete:

blobs = container_client.list_blobs()
for blob in blobs:
  print(blob.name)

I am looking to decrease the time it takes to execute this block of code, and I was wondering if there is any way to retrieve all of the blobs in the container using list_blobs(), but only retrieving the ‘name’ property of the blobs, rather than retrieving info about every single property of each blob in the response.

Asked By: pkd

||

Answers:

I am looking to decrease the time it takes to execute this block of
code, and I was wondering if there is any way to retrieve all of the
blobs in the container using list_blobs(), but only retrieving the
‘name’ property of the blobs, rather than retrieving info about every
single property of each blob in the response.

It is not possible to retrieve only some of the properties of the blob (like name). list_blobs method is the implementation of List Blobs REST API operation which does not support server-side projection.

Answered By: Gaurav Mantri

You can use container_client.list_blob_names() for this, it will return an iterator with the names of blobs in the container.

blobs = container_client.list_blob_names()
for blob_name in blobs:
  print(blob_name)

Or store it in a list:

blob_names = list(containter_client.list_blob_names())

https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.containerclient?view=azure-python#azure-storage-blob-containerclient-list-blob-names

Answered By: Lisa Weijers