Get count of objects in a specific S3 folder using Boto3

Question:

Trying to get count of objects in S3 folder

Current code

bucket='some-bucket'
File='someLocation/File/'

objs = boto3.client('s3').list_objects_v2(Bucket=bucket,Prefix=File)
fileCount = objs['KeyCount']

This gives me the count as 1+actual number of objects in S3.

Maybe it is counting “File” as a key too?

Asked By: ThatComputerGuy

||

Answers:

“Folders” do not actually exist in Amazon S3. Instead, all objects have their full path as their filename (‘Key’). I think you already know this.

However, it is possible to ‘create’ a folder by creating a zero-length object that has the same name as the folder. This causes the folder to appear in listings and is what happens if folders are created via the management console.

Thus, you could exclude zero-length objects from your count.

For an example, see: Determine if folder or file key – Boto

Answered By: John Rotenstein

Assuming you want to count the keys in a bucket and don’t want to hit the limit of 1000 using list_objects_v2. The below code worked for me but I’m wondering if there is a better faster way to do it! Tried looking if there’s a packaged function in boto3 s3 connector but there isn’t!

# connect to s3 - assuming your creds are all set up and you have boto3 installed
s3 = boto3.resource('s3')

# identify the bucket - you can use prefix if you know what your bucket name starts with
for bucket in s3.buckets.all():
    print(bucket.name)

# get the bucket
bucket = s3.Bucket('my-s3-bucket')

# use loop and count increment
count_obj = 0
for i in bucket.objects.all():
    count_obj = count_obj + 1
print(count_obj)
Answered By: vagabond

If there are more than 1000 entries, you need to use paginators, like this:

count = 0
client = boto3.client('s3')
paginator = client.get_paginator('list_objects')
for result in paginator.paginate(Bucket='your-bucket', Prefix='your-folder/', Delimiter='/'):
    count += len(result.get('CommonPrefixes'))
Answered By: matt burns

If you have credentials to access that bucket, then you can use this simple code. Below code will give you a list. List comprehension is used for more readability.

Filter is used to filter objects because in bucket to identify the files ,folder names are used. As explained by John Rotenstein concisely.

import boto3

bucket = "Sample_Bucket"
folder = "Sample_Folder"
s3 = boto3.resource("s3") 
s3_bucket = s3.Bucket(bucket)
files_in_s3 = [f.key.split(folder + "/")[1] for f in s3_bucket.objects.filter(Prefix=folder).all()]
Answered By: Anuj Sharma

The following code worked perfectly

def getNumberOfObjectsInBucket(bucketName,prefix):
    count = 0
    response = boto3.client('s3').list_objects_v2(Bucket=bucketName,Prefix=prefix)
    for object in response['Contents']:
        if object['Size'] != 0:
            #print(object['Key'])
            count+=1
    return count

object['Size'] == 0 will take you to folder names, if want to check them, object['Size'] != 0 will lead you to all non-folder keys.
Sample function call below:

getNumberOfObjectsInBucket('foo-test-bucket','foo-output/')
Answered By: Ehsan
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.