Boto3 S3, sort bucket by last modified

Question:

I need to fetch a list of items from S3 using Boto3, but instead of returning default sort order (descending) I want it to return it via reverse order.

I know you can do it via awscli:

aws s3api list-objects --bucket mybucketfoo --query "reverse(sort_by(Contents,&LastModified))"

and its doable via the UI console (not sure if this is done client side or server side)

I cant seem to see how to do this in Boto3.

I am currently fetching all the files, and then sorting…but that seems overkill, especially if I only care about the 10 or so most recent files.

The filter system seems to only accept the Prefix for s3, nothing else.

Asked By: nate

||

Answers:

it seems that is no way to do the sort by using boto3. According to the documentation, boto3 only supports these methods for Collections:

all(), filter(**kwargs), page_size(**kwargs), limit(**kwargs)

Hope this help in some way.
https://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.ServiceResource.buckets

Answered By: Juan Diego Garcia

If there are not many objects in the bucket, you can use Python to sort it to your needs.

Define a lambda to get the last modified time:

get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))

Get all objects and sort them by last modified time.

s3 = boto3.client('s3')
objs = s3.list_objects_v2(Bucket='my_bucket')['Contents']
[obj['Key'] for obj in sorted(objs, key=get_last_modified)]

If you want to reverse the sort:

[obj['Key'] for obj in sorted(objs, key=get_last_modified, reverse=True)]
Answered By: helloV

I did a small variation of what @helloV posted below. its not 100% optimum, but it gets the job done with the limitations boto3 has as of this time.

s3 = boto3.resource('s3')
my_bucket = s3.Bucket('myBucket')
unsorted = []
for file in my_bucket.objects.filter():
   unsorted.append(file)

files = [obj.key for obj in sorted(unsorted, key=get_last_modified, 
    reverse=True)][0:9]
Answered By: nate
keys = []

kwargs = {'Bucket': 'my_bucket'}
while True:
    resp = s3.list_objects_v2(**kwargs)
    for obj in resp['Contents']:
        keys.append(obj['Key'])

    try:
        kwargs['ContinuationToken'] = resp['NextContinuationToken']
    except KeyError:
        break

this will get you all the keys in a sorted order

Answered By: Israelsofer

s3 = boto3.client('s3')

get_last_modified = lambda obj: int(obj['LastModified'].strftime('%Y%m%d%H%M%S'))

def sortFindLatest(bucket_name):
    resp = s3.list_objects(Bucket=bucket_name)
    if 'Contents' in resp:
        objs = resp['Contents']
        files = sorted(objs, key=get_last_modified)
        for key in files:
            file = key['Key']
            cx = s3.get_object(Bucket=bucket_name, Key=file)

This works for me to sort by date and time. I am using Python3 AWS lambda. Your mileage may vary. It can be optimized, I purposely made it discrete. As mentioned in an earlier post, ‘reverse=True’ can be added to change the sort order.

Answered By: Nelson

A simpler approach, using the python3 sorted() function:

import boto3
s3 = boto3.resource('s3')

myBucket = s3.Bucket('name')

def obj_last_modified(myobj):
    return myobj.last_modified

sortedObjects = sorted(myBucket.objects.all(), key=obj_last_modified, reverse=True)

you now have a reverse sorted list, sorted by the ‘last_modified’ attribute of each Object.

Answered By: weegolo

Slight improvement of above:

import boto3

s3 = boto3.resource('s3')
my_bucket = s3.Bucket('myBucket')
files = my_bucket.objects.filter()
files = [obj.key for obj in sorted(files, key=lambda x: x.last_modified, 
    reverse=True)]
Answered By: zalmane

To get the last modified files in a folder in S3:

import boto3

s3 = boto3.resource('s3')
my_bucket = s3.Bucket('bucket_name')
files = my_bucket.objects.filter(Prefix='folder_name/subfolder_name/')
files = [obj.key for obj in sorted(files, key=lambda x: x.last_modified,
    reverse=True)][0:2]

print(files)

To get the two files which are last modified:

files = [obj.key for obj in sorted(files, key=lambda x: x.last_modified,
    reverse=True)][0:2]
Answered By: mellifluous

So my answer can be used for last modified, but I thought that if you’ve come to this page, there is a chance that’d you like to be able to sort your files in some other manner. So to kill 2 birds with one stone:

In this thread you can find the built-in method sorted. If you read the docs or this article, you will see that you can create your own function to give priority to how objects should be sorted. So for example in my case. I had a bunch of files that had some number in front of them and potentially a letter. It looked like this:

1.svg
10.svg
100a.svg
11.svg
110.svg
...
2.svg
20b.svg
200.svg
...
10011b.svg
...
etc

I wanted it to be sorted by the number up front – I didn’t care about the letter behind the number, so I wrote this function:

def my_sort(x):
    try:
        # this will take the file name, split over the file type and take just the name, cast it to an int, and return it
        return int(x.split(".")[0])
    # if it couldn't do that
    except ValueError:
        # it will take the file name, split it over the extension, and take the name
        n = x.split(".")[0]
        s = ""
        # then for each character
        for e in n:
            # check to see if it is a digit and append it to a string if it is
            if e.isdigit():
                s += e
            # if its not a digit, it hit the character at the end of the name, so return it
            else:
                return int(s)

Which means now I can do this:

import boto3
s3r = boto3.resource('s3')
bucket = s3r.Bucket('my_bucket')
os = bucket.objects.filter(Prefix="my_prefix/")
os = [o.key.split("/")[-1] for o in os]
os = sorted(os, key=my_sort)

# do whatever with the sorted data

which will sort my files by the numerical suffix in their name.

Answered By: Shmack