Downloading a file from an s3 Bucket to the USERS computer

Question:

Goal

Download file from s3 Bucket to users computer.

Context

I am working on a Python/Flask API for a React app. When the user clicks the Download button on the Front-End, I want to download the appropriate file to their machine.

What I’ve tried

import boto3
s3 = boto3.resource('s3')
s3.Bucket('mybucket').download_file('hello.txt', '/tmp/hello.txt')

I am currently using some code that finds the path of the downloads folder and then plugging that path into download_file() as the second parameter, along with the file on the bucket that they are trying to download.

This worked locally, and tests ran fine, but I run into a problem once it is deployed. The code will find the downloads path of the SERVER, and download the file there.

Question

What is the best way to approach this? I have researched and cannot find a good solution for being able to download a file from the s3 bucket to the users downloads folder. Any help/advice is greatly appreciated.

Asked By: Aric Liesenfelt

||

Answers:

You should not need to save the file to the server. You can just download the file into memory, and then build a Response object containing the file.

from flask import Flask, Response
from boto3 import client

app = Flask(__name__)


def get_client():
    return client(
        's3',
        'us-east-1',
        aws_access_key_id='id',
        aws_secret_access_key='key'
    )


@app.route('/blah', methods=['GET'])
def index():
    s3 = get_client()
    file = s3.get_object(Bucket='blah-test1', Key='blah.txt')
    return Response(
        file['Body'].read(),
        mimetype='text/plain',
        headers={"Content-Disposition": "attachment;filename=test.txt"}
    )


app.run(debug=True, port=8800)

This is ok for small files, there won’t be any meaningful wait time for the user. However with larger files, this well affect UX. The file will need to be completely downloaded to the server, then download to the user. So to fix this issue, use the Range keyword argument of the get_object method:

from flask import Flask, Response
from boto3 import client

app = Flask(__name__)


def get_client():
    return client(
        's3',
        'us-east-1',
        aws_access_key_id='id',
        aws_secret_access_key='key'
    )


def get_total_bytes(s3):
    result = s3.list_objects(Bucket='blah-test1')
    for item in result['Contents']:
        if item['Key'] == 'blah.txt':
            return item['Size']


def get_object(s3, total_bytes):
    if total_bytes > 1000000:
        return get_object_range(s3, total_bytes)
    return s3.get_object(Bucket='blah-test1', Key='blah.txt')['Body'].read()


def get_object_range(s3, total_bytes):
    offset = 0
    while total_bytes > 0:
        end = offset + 999999 if total_bytes > 1000000 else ""
        total_bytes -= 1000000
        byte_range = 'bytes={offset}-{end}'.format(offset=offset, end=end)
        offset = end + 1 if not isinstance(end, str) else None
        yield s3.get_object(Bucket='blah-test1', Key='blah.txt', Range=byte_range)['Body'].read()


@app.route('/blah', methods=['GET'])
def index():
    s3 = get_client()
    total_bytes = get_total_bytes(s3)

    return Response(
        get_object(s3, total_bytes),
        mimetype='text/plain',
        headers={"Content-Disposition": "attachment;filename=test.txt"}
    )


app.run(debug=True, port=8800)

This will download the file in 1MB chunks and send them to the user as they are downloaded. Both of these have been tested with a 40MB .txt file.

Answered By: Allie Fitter

A better way to solve this problem is to create presigned url. This gives you a temporary URL that’s valid up to a certain amount of time. It also removes your flask server as a proxy between the AWS s3 bucket which reduces download time for the user.

def get_attachment_url():
   bucket = 'BUCKET_NAME'
   key = 'FILE_KEY'

   client: boto3.s3 = boto3.client(
     's3',
     aws_access_key_id=YOUR_AWS_ACCESS_KEY,
     aws_secret_access_key=YOUR_AWS_SECRET_KEY
   )

   return client.generate_presigned_url('get_object',
                                     Params={'Bucket': bucket, 'Key': key},
                                     ExpiresIn=60) `
Answered By: Rajat Soni