Getting S3 objects' last modified datetimes with boto
Question:
I’m writing a Python script that uploads files to S3 using boto
librairy. I only want to upload changed files (which I can check by their "last modified" datetimes), but I can’t find the Boto API endpoint to get the last modified date.
Answers:
Here’s a snippet of Python/boto code that will print the last_modified attribute of all keys in a bucket:
>>> import boto
>>> s3 = boto.connect_s3()
>>> bucket = s3.lookup('mybucket')
>>> for key in bucket:
print key.name, key.size, key.last_modified
index.html 13738 2012-03-13T03:54:07.000Z
markdown.css 5991 2012-03-06T18:32:43.000Z
>>>
If you’re using Django and django-storages, you can an unofficial API in the s3boto backend:
>>> from storages.backends.s3boto import _parse_datestring
>>> _parse_datestring("Fri, 20 Jul 2012 16:57:27 GMT")
datetime.datetime(2012, 7, 21, 2, 57, 27)
Unfortunately as of django-storages 1.1.5, this gives a naive datetime. You need to use django.utils.timezone
to create an aware version:
>>> from django.utils import timezone
>>> naive = _parse_datestring("Fri, 20 Jul 2012 16:57:27 GMT")
>>> timezone.make_aware(naive, timezone.get_current_timezone())
datetime.datetime(2012, 7, 21, 2, 57, 27, tzinfo=<DstTzInfo 'Australia/Brisbane' EST+10:00:00 STD>)
Convert the last_modified attribute to struct_time as given below
import time
for key in bucket.get_all_keys():
time.strptime(key.last_modified[:19], "%Y-%m-%dT%H:%M:%S")
This will give a time.struct_time(tm_year, tm_mon, tm_mday, tm_hour, tm_min, tm_sec, tm_wday, tm_yday, tm_isdst) tuple for each key in the S3 bucket
this is working (tnx to jdennison from above):
after getting the key from s3:
import time
from time import mktime
from datetime import datetime
modified = time.strptime(key.last_modified, '%a, %d %b %Y %H:%M:%S %Z')
#convert to datetime
dt = datetime.fromtimestamp(mktime(modified))
Boto3 returns a datetime object for LastModified
when you use the the (S3) Object
python object:
You shouldn’t need to perform any tortuous string manipulations.
To compare LastModified
to today’s date (Python3):
import boto3
from datetime import datetime, timezone
today = datetime.now(timezone.utc)
s3 = boto3.client('s3', region_name='eu-west-1')
objects = s3.list_objects(Bucket='my_bucket')
for o in objects["Contents"]:
if o["LastModified"] == today:
print(o["Key"])
You just need to be aware that LastModifed
is timezone aware, so any date you compare with it must also be timezone aware, hence:
datetime.now(timezone.utc)
This is for recent s3 list_objectsv2. The boto3 client gives lastModifed in datetime.datetime format, and ways to convert it is as below
links: boto3 link
and
aws s3 listobj
import datetime
from dateutil.tz import tzutc
# node s3 response '2019-06-17T18:42:57.000Z'
# python boto3 s3 response datetime.datetime(2019, 10, 1, 22, 41, 55, tzinfo=tzutc())
''' {'ETag': '"c8ba0ad5003832f63690ea8ff9b66052"',
'Key': 'SOMEFILE',
'LastModified': datetime.datetime(2019, 10, 2, 18, 50, 47, tzinfo=tzutc()),
'Size': 6390623,
'StorageClass': 'STANDARD'}
'''
l = datetime.datetime(2019, 10, 1, 22, 41, 55, tzinfo=tzutc())
get_last_modified = int(l.strftime('%s'))
print(l)
print(get_last_modified)
Using a Resource, you can get an iterator of all objects and then retrieve the last_modified
attribute of an ObjectSummary
.
import boto3
s3 = boto3.resource('s3')
bk = s3.Bucket(bucket_name)
[obj.last_modified for obj in bk.objects.all()][:10]
returns
[datetime.datetime(2020, 4, 17, 13, 23, 37, tzinfo=tzlocal()),
datetime.datetime(2020, 4, 17, 13, 23, 37, tzinfo=tzlocal()),
datetime.datetime(2020, 4, 17, 13, 23, 38, tzinfo=tzlocal()),
datetime.datetime(2020, 4, 17, 13, 23, 38, tzinfo=tzlocal()),
datetime.datetime(2020, 4, 17, 13, 23, 38, tzinfo=tzlocal()),
datetime.datetime(2020, 4, 17, 13, 23, 37, tzinfo=tzlocal()),
datetime.datetime(2020, 4, 17, 13, 23, 37, tzinfo=tzlocal()),
datetime.datetime(2020, 4, 17, 13, 20, 20, tzinfo=tzlocal()),
datetime.datetime(2020, 4, 20, 8, 30, 2, tzinfo=tzlocal()),
datetime.datetime(2020, 3, 26, 15, 33, 58, tzinfo=tzlocal())]
For just one s3 object you can use boto client’s head_object()
method which is faster than list_objects_v2()
for one object as less content is returned. The returned value is datetime
similar to all boto responses and therefore easy to process.
head_object()
method comes with other features around modification time of the object which can be leveraged without further calls after list_objects()
result.
import boto3
s3 = boto3.client('s3')
response = client.head_object(Bucket, Key)
datetime_value = response["LastModified"]
import boto3
from boto3.session import Session
session = Session(aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY)
s3 = session.resource('s3')
my_bucket = s3.Bucket(BUCKET_NAME)
for obj in my_bucket.objects.all():
print('{} | {}'.format(obj.key, obj.last_modified))
You can get last object last modified date like that:
With resource
boto3.resource('s3').Object(<BUCKET_NAME>, <file_path>).last_modified
With client
boto3.client('s3').head_object(<BUCKET_NAME>, <file_path>)['LastModified']
You can try sorting the returned list of objects by LastModified
key
import boto3
s3_client = boto3.client('s3')
s3_response = s3_client.list_objects(Bucket=BUCKET_NAME)
sorted_contents = sorted(s3_response['Contents'], key=lambda d: d['LastModified'], reverse=True)
sorted_contents[0].get('Key')
You can remove reverse=True
flag in order to get the earliest modified object. You can also sort by Size
of the objects or any other properties you want.
Here is what I used for my lambda function…
Use s3.list_objects_v2(Bucket=Your_bucket_name)
to list the objects then get the key LastModified
from the contents.
import boto3
import json
import datetime
s3 = boto3.client('s3')
def lambda_handler(event, context):
bucket = Your-bucket-name
try:
listdata = s3.list_objects_v2(Bucket=bucket)
contents = listdata['Contents'] if "Contents" in listdata else []
for key in contents:
lastmodified = str(key['LastModified'])
print("lastmodified:", lastmodified)
I’m writing a Python script that uploads files to S3 using boto
librairy. I only want to upload changed files (which I can check by their "last modified" datetimes), but I can’t find the Boto API endpoint to get the last modified date.
Here’s a snippet of Python/boto code that will print the last_modified attribute of all keys in a bucket:
>>> import boto
>>> s3 = boto.connect_s3()
>>> bucket = s3.lookup('mybucket')
>>> for key in bucket:
print key.name, key.size, key.last_modified
index.html 13738 2012-03-13T03:54:07.000Z
markdown.css 5991 2012-03-06T18:32:43.000Z
>>>
If you’re using Django and django-storages, you can an unofficial API in the s3boto backend:
>>> from storages.backends.s3boto import _parse_datestring
>>> _parse_datestring("Fri, 20 Jul 2012 16:57:27 GMT")
datetime.datetime(2012, 7, 21, 2, 57, 27)
Unfortunately as of django-storages 1.1.5, this gives a naive datetime. You need to use django.utils.timezone
to create an aware version:
>>> from django.utils import timezone
>>> naive = _parse_datestring("Fri, 20 Jul 2012 16:57:27 GMT")
>>> timezone.make_aware(naive, timezone.get_current_timezone())
datetime.datetime(2012, 7, 21, 2, 57, 27, tzinfo=<DstTzInfo 'Australia/Brisbane' EST+10:00:00 STD>)
Convert the last_modified attribute to struct_time as given below
import time
for key in bucket.get_all_keys():
time.strptime(key.last_modified[:19], "%Y-%m-%dT%H:%M:%S")
This will give a time.struct_time(tm_year, tm_mon, tm_mday, tm_hour, tm_min, tm_sec, tm_wday, tm_yday, tm_isdst) tuple for each key in the S3 bucket
this is working (tnx to jdennison from above):
after getting the key from s3:
import time
from time import mktime
from datetime import datetime
modified = time.strptime(key.last_modified, '%a, %d %b %Y %H:%M:%S %Z')
#convert to datetime
dt = datetime.fromtimestamp(mktime(modified))
Boto3 returns a datetime object for LastModified
when you use the the (S3) Object
python object:
You shouldn’t need to perform any tortuous string manipulations.
To compare LastModified
to today’s date (Python3):
import boto3
from datetime import datetime, timezone
today = datetime.now(timezone.utc)
s3 = boto3.client('s3', region_name='eu-west-1')
objects = s3.list_objects(Bucket='my_bucket')
for o in objects["Contents"]:
if o["LastModified"] == today:
print(o["Key"])
You just need to be aware that LastModifed
is timezone aware, so any date you compare with it must also be timezone aware, hence:
datetime.now(timezone.utc)
This is for recent s3 list_objectsv2. The boto3 client gives lastModifed in datetime.datetime format, and ways to convert it is as below
links: boto3 link
and
aws s3 listobj
import datetime
from dateutil.tz import tzutc
# node s3 response '2019-06-17T18:42:57.000Z'
# python boto3 s3 response datetime.datetime(2019, 10, 1, 22, 41, 55, tzinfo=tzutc())
''' {'ETag': '"c8ba0ad5003832f63690ea8ff9b66052"',
'Key': 'SOMEFILE',
'LastModified': datetime.datetime(2019, 10, 2, 18, 50, 47, tzinfo=tzutc()),
'Size': 6390623,
'StorageClass': 'STANDARD'}
'''
l = datetime.datetime(2019, 10, 1, 22, 41, 55, tzinfo=tzutc())
get_last_modified = int(l.strftime('%s'))
print(l)
print(get_last_modified)
Using a Resource, you can get an iterator of all objects and then retrieve the last_modified
attribute of an ObjectSummary
.
import boto3
s3 = boto3.resource('s3')
bk = s3.Bucket(bucket_name)
[obj.last_modified for obj in bk.objects.all()][:10]
returns
[datetime.datetime(2020, 4, 17, 13, 23, 37, tzinfo=tzlocal()),
datetime.datetime(2020, 4, 17, 13, 23, 37, tzinfo=tzlocal()),
datetime.datetime(2020, 4, 17, 13, 23, 38, tzinfo=tzlocal()),
datetime.datetime(2020, 4, 17, 13, 23, 38, tzinfo=tzlocal()),
datetime.datetime(2020, 4, 17, 13, 23, 38, tzinfo=tzlocal()),
datetime.datetime(2020, 4, 17, 13, 23, 37, tzinfo=tzlocal()),
datetime.datetime(2020, 4, 17, 13, 23, 37, tzinfo=tzlocal()),
datetime.datetime(2020, 4, 17, 13, 20, 20, tzinfo=tzlocal()),
datetime.datetime(2020, 4, 20, 8, 30, 2, tzinfo=tzlocal()),
datetime.datetime(2020, 3, 26, 15, 33, 58, tzinfo=tzlocal())]
For just one s3 object you can use boto client’s head_object()
method which is faster than list_objects_v2()
for one object as less content is returned. The returned value is datetime
similar to all boto responses and therefore easy to process.
head_object()
method comes with other features around modification time of the object which can be leveraged without further calls after list_objects()
result.
import boto3
s3 = boto3.client('s3')
response = client.head_object(Bucket, Key)
datetime_value = response["LastModified"]
import boto3
from boto3.session import Session
session = Session(aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY)
s3 = session.resource('s3')
my_bucket = s3.Bucket(BUCKET_NAME)
for obj in my_bucket.objects.all():
print('{} | {}'.format(obj.key, obj.last_modified))
You can get last object last modified date like that:
With resource
boto3.resource('s3').Object(<BUCKET_NAME>, <file_path>).last_modified
With client
boto3.client('s3').head_object(<BUCKET_NAME>, <file_path>)['LastModified']
You can try sorting the returned list of objects by LastModified
key
import boto3
s3_client = boto3.client('s3')
s3_response = s3_client.list_objects(Bucket=BUCKET_NAME)
sorted_contents = sorted(s3_response['Contents'], key=lambda d: d['LastModified'], reverse=True)
sorted_contents[0].get('Key')
You can remove reverse=True
flag in order to get the earliest modified object. You can also sort by Size
of the objects or any other properties you want.
Here is what I used for my lambda function…
Use s3.list_objects_v2(Bucket=Your_bucket_name)
to list the objects then get the key LastModified
from the contents.
import boto3
import json
import datetime
s3 = boto3.client('s3')
def lambda_handler(event, context):
bucket = Your-bucket-name
try:
listdata = s3.list_objects_v2(Bucket=bucket)
contents = listdata['Contents'] if "Contents" in listdata else []
for key in contents:
lastmodified = str(key['LastModified'])
print("lastmodified:", lastmodified)