How to delete file(s) from source s3 bucket after lambda successfully copies file(s) to destination s3 bucket?

Question:

I have the following the lambda function which uses an s3 trigger to copy files from a source to a destination bucket. This is working fine.

import os
import logging
import boto3

LOGGER = logging.getLogger()
LOGGER.setLevel(logging.INFO)

DST_BUCKET = os.environ.get('DST_BUCKET')
REGION = os.environ.get('REGION')

s3 = boto3.resource('s3', region_name=REGION)

def handler(event, context):
   LOGGER.info('Event structure: %s', event)
   LOGGER.info('DST_BUCKET: %s', DST_BUCKET)

   for record in event['Records']:
       src_bucket = record['s3']['bucket']['name']
       src_key = record['s3']['object']['key']

       copy_source = {
           'Bucket': src_bucket,
           'Key': src_key
       }
       LOGGER.info('copy_source: %s', copy_source)
       bucket = s3.Bucket(DST_BUCKET)
       bucket.copy(copy_source, src_key)

   return {
       'status': 'ok'
   }

What I’m wanting to do now is to modify the code above to delete the file(s) (not the folder) from the source bucket after successful upload to the destination bucket.

Use case: user uploads three files, two legit csv file and one corrupted csv file. Lambda triggers on source bucket, begins the copying of those files. Lambda loops through the files outputting successful or true when done and false if there were issues along with the filename, then deletes those successfully uploaded from the source bucket.

I’ve tried various try/catch blocks for this but it ends up either deleting the entire folder or there’s issues synchronizing the buckets where the file is deleted from the source folder before successful upload, etc.

I dont want to do away with the loop above so that if multiple files are uploaded it will loop through all of them and similarly delete all of them when successfully uploaded to the other bucket. Unsure if a simple boolean would be sufficient for this use case or another flag of some sort. The flag would have to keep track of the specific key, though, so that it knows which was successful and not.

Asked By: stonewalker747

||

Answers:

Before removing the file from the source bucket, you can verify that it was uploaded correctly using s3.Object(DST_BUCKET, src_key).load():

import os
import logging
import boto3

LOGGER = logging.getLogger()
LOGGER.setLevel(logging.INFO)

DST_BUCKET = os.environ.get('DST_BUCKET')
REGION = os.environ.get('REGION')

s3 = boto3.resource('s3', region_name=REGION)

def handler(event, context):
    LOGGER.info(f'Event structure: {event}')
    LOGGER.info(f'DST_BUCKET: {DST_BUCKET}')

    for record in event['Records']:
        src_bucket = record['s3']['bucket']['name']
        src_key = record['s3']['object']['key']

        copy_source = {
                    'Bucket': src_bucket,
                    'Key': src_key
                        }
        LOGGER.info(f'copy_source: {copy_source}')
        bucket = s3.Bucket(DST_BUCKET)
        bucket.copy(copy_source, src_key)

        try:
            #Check file
            s3.Object(DST_BUCKET, src_key).load()
            LOGGER.info(f"File {src_key} uploaded to Bucket {DST_BUCKET}")
            # Delete the file from the source bucket
            s3.Object(src_bucket, src_key).delete()
            LOGGER.info(f"File {src_key} deleted from Bucket {src_bucket}")
    
        except Exception as e:
            return {"error":str(e)}

    return {'status': 'ok'}

I’ve tested it with files in two different regions and worked great for me.

Answered By: Pedro Rocha
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.