How do I get the size of a boto3 Collection?
Question:
The way I have been using is to transform the Collection into a List and query the length:
s3 = boto3.resource('s3')
bucket = s3.Bucket('my_bucket')
size = len(list(bucket.objects.all()))
However, this forces resolution of the whole collection and obviates the benefits of using a Collection in the first place. Is there a better way to do this?
Answers:
There is no way to get the count of keys in a bucket without listing all the objects this is a limitation of AWS S3 (see https://forums.aws.amazon.com/thread.jspa?messageID=164220).
Getting the Object Summaries (HEAD) doesn’t get the actual data so should be a relatively inexpensive operation and if you are just discarding the list then you could do:
size = sum(1 for _ in bucket.objects.all())
Which will give you the number of objects without constructing a list.
Borrowing from a similar question, one option to retrieve the complete list of object keys from a bucket + prefix is to use recursion with the list_objects_v2 method.
This method will recursively retrieve the list of object keys, 1000 keys at a time.
Each request to list_objects_v2
uses the StartAfter
argument to continue listing keys after the last key from the previous request.
import boto3
if __name__ == '__main__':
client = boto3.client('s3',
aws_access_key_id = 'access_key',
aws_secret_access_key = 'secret_key'
)
def get_all_object_keys(bucket, prefix, start_after = '', keys = []):
response = client.list_objects_v2(
Bucket = bucket,
Prefix = prefix,
StartAfter = start_after
)
if 'Contents' not in response:
return keys
key_list = response['Contents']
last_key = key_list[-1]['Key']
keys.extend(key_list)
return get_all_object_keys(bucket, prefix, last_key, keys)
object_keys = get_all_object_keys('your_bucket', 'prefix/to/files')
print(len(object_keys))
For my use case, I just needed to know whether the folder is empty or not.
s3 = boto3.client('s3')
response = s3.list_objects(
Bucket='your-bucket',
Prefix='path/to/your/folder/',
)
print(len(response['Contents']))
This was enough to know whether the folder is empty. Note that a folder, if manually created in the S3 console, can count as a resource itself. In this case, if the length shown above is greater than 1, then the S3 "folder" is not empty.
The way I have been using is to transform the Collection into a List and query the length:
s3 = boto3.resource('s3')
bucket = s3.Bucket('my_bucket')
size = len(list(bucket.objects.all()))
However, this forces resolution of the whole collection and obviates the benefits of using a Collection in the first place. Is there a better way to do this?
There is no way to get the count of keys in a bucket without listing all the objects this is a limitation of AWS S3 (see https://forums.aws.amazon.com/thread.jspa?messageID=164220).
Getting the Object Summaries (HEAD) doesn’t get the actual data so should be a relatively inexpensive operation and if you are just discarding the list then you could do:
size = sum(1 for _ in bucket.objects.all())
Which will give you the number of objects without constructing a list.
Borrowing from a similar question, one option to retrieve the complete list of object keys from a bucket + prefix is to use recursion with the list_objects_v2 method.
This method will recursively retrieve the list of object keys, 1000 keys at a time.
Each request to list_objects_v2
uses the StartAfter
argument to continue listing keys after the last key from the previous request.
import boto3
if __name__ == '__main__':
client = boto3.client('s3',
aws_access_key_id = 'access_key',
aws_secret_access_key = 'secret_key'
)
def get_all_object_keys(bucket, prefix, start_after = '', keys = []):
response = client.list_objects_v2(
Bucket = bucket,
Prefix = prefix,
StartAfter = start_after
)
if 'Contents' not in response:
return keys
key_list = response['Contents']
last_key = key_list[-1]['Key']
keys.extend(key_list)
return get_all_object_keys(bucket, prefix, last_key, keys)
object_keys = get_all_object_keys('your_bucket', 'prefix/to/files')
print(len(object_keys))
For my use case, I just needed to know whether the folder is empty or not.
s3 = boto3.client('s3')
response = s3.list_objects(
Bucket='your-bucket',
Prefix='path/to/your/folder/',
)
print(len(response['Contents']))
This was enough to know whether the folder is empty. Note that a folder, if manually created in the S3 console, can count as a resource itself. In this case, if the length shown above is greater than 1, then the S3 "folder" is not empty.