how to list files from a S3 bucket folder using python

Question:

I tried to list all files in a bucket. Here is my code

import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my_project')

for my_bucket_object in my_bucket.objects.all():
    print(my_bucket_object.key)

it works. I get all files’ names. However, when I tried to do the same thing on a folder, the code raise an error

import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my_project/data/') # add the folder name

for my_bucket_object in my_bucket.objects.all():
    print(my_bucket_object.key)

Here is the error:

botocore.exceptions.ParamValidationError: Parameter validation failed:

Invalid bucket name "carlos-cryptocurrency-research-project/data/": Bucket name must match the regex "^[a-zA-Z0-9.-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:(s3|s3-object-lambda):[a-z-0-9]*:[0-9]{12}:accesspoint[/:][a-zA-Z0-9-.]{1,63}$|^arn:(aws).*:s3-outposts:[a-z-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9-]{1,63}$"

I’m sure the folder name is correct and I tried replacing it with Amazon Resource Name (ARN) and S3 URI, but still get the error.

Asked By: Carlos

||

Answers:

You can’t indicate a prefix/folder in the Bucket constructor. Instead use the client-level API and call list_objects_v2 something like this:

import boto3

client = boto3.client('s3')

response = client.list_objects_v2(
    Bucket='my_bucket',
    Prefix='data/')

for content in response.get('Contents', []):
    print(content['Key'])

Note that this will yield at most 1000 S3 objects. You can use a paginator if needed.

Answered By: jarmod

Get all the list of files in specific folder in s3 Bucket

import boto3

s3 = boto3.resource('s3')
myBucket = s3.Bucket('bucketName')

for object_summary in myBucket.objects.filter(Prefix="path/"):
    print(object_summary.key)
Answered By: thrinadhn