Serving large files ( with high loads ) in Django
Question:
I’ve been using a method for serving downloads but since it was not secure i decided to change that . ( the method was a link to the original file in storage , but the risk was that everyone with the link could have downloaded the file ! ) so i now serve the file via my views , that way only users with permission can download the file , but i’m noticing a high load on server while there is many simultaneous download requests for the files. here’s part of my code that handles downloads for users ( Consider the file is an image )
image = Image.open ("the path to file")
response = HttpResponse(mimetype = 'image/png' )
response['Content-Disposition'] = 'attachment: filename=%s.png' % filename
image.save(response , "png")
return response
is there any better ways for serving files while keeping the security and lowering server side load ?
thanks in advance 🙂
Answers:
You can use the ‘sendfile’ method as described in this answer.
Practically you need this (c&p):
response = HttpResponse(mimetype='application/force-download')
response['Content-Disposition'] = 'attachment; filename=%s' % smart_str(file_name)
response['X-Sendfile'] = smart_str(path_to_file)
# It's usually a good idea to set the 'Content-Length' header too.
# You can also set any other required headers: Cache-Control, etc.
return response
This requires mod_xsendfile (which is also supported by nginx or lighty)
Your opening of the image loads it in memory and this is what causes the increase in load under heavy use. As posted by Martin the real solution is to serve the file directly.
Here is another approach, which will stream your file in chunks without loading it in memory.
import os
import mimetypes
from wsgiref.util import FileWrapper
from django.http import StreamingHttpResponse
def download_file(request):
the_file = "/some/file/name.png"
filename = os.path.basename(the_file)
chunk_size = 8192
response = StreamingHttpResponse(
FileWrapper(
open(the_file, "rb"),
chunk_size,
),
content_type=mimetypes.guess_type(the_file)[0],
)
response["Content-Length"] = os.path.getsize(the_file)
response["Content-Disposition"] = f"attachment; filename={filename}"
return response
Unless you are going to be serving very very small number of such requests, any solution that requires serving your content via django won’t be scalable. For anything to scale in future, you’ll probably want to move your content storage and serving to to a separate server and then this won’t work.
The recommended way would be to keep static content served through a lighter server (such as nginx). To add security, pass the static server a token from django by setting the cookie or via the get parameters.
The token should have following values: timestamp, filename, userid. It should be signed via some key by the django app.
Next, write a small nginx module which checks the token and that the user has indeed access to the file. It should also check that token isn’t old enough by checking the timestamp.
FileWrapper won’t work when GZipMiddleware is installed (Django 1.4 and below):
https://code.djangoproject.com/ticket/6027
If using GZipMiddleware, a practical solution is to write a subclass of FileWrapper like so:
from wsgiref.util import FileWrapper
class FixedFileWrapper(FileWrapper):
def __iter__(self):
self.filelike.seek(0)
return self
import mimetypes, os
my_file = '/some/path/xy.ext'
response = HttpResponse(FixedFileWrapper(open(my_file, 'rb')), content_type=mimetypes.guess_type(my_file)[0])
response['Content-Length'] = os.path.getsize(my_file)
response['Content-Disposition'] = "attachment; filename=%s" % os.path.basename(my_file)
return response
As of Python 2.5, there’s no need to import FileWrapper from Django.
It’s better to use FileRespose, is a subclass of StreamingHttpResponse optimized for binary files. It uses wsgi.file_wrapper if provided by the wsgi server, otherwise it streams the file out in small chunks.
import os
from django.http import FileResponse
from django.core.servers.basehttp import FileWrapper
def download_file(request):
_file = '/folder/my_file.zip'
filename = os.path.basename(_file)
response = FileResponse(FileWrapper(file(filename, 'rb')), content_type='application/x-zip-compressed')
response['Content-Disposition'] = "attachment; filename=%s" % _file
return response
Here’s another working solution which loads the file in memory instead in the filesystem.
from wsgiref.util import FileWrapper
from django.http import FileResponse
def download(request):
buffer = io.BytesIO()
zip_obj = zipfile.ZipFile(buffer, "w")
# put files in the memory-stored zip file
# add_to_zip is a list of dictionaries, consisting of 2 entries each.
# [{"filename": "foo.jpg", "path": "absolute/path/to/foo.jpg"}, ...]
for data in add_to_zip:
filename = data["filename"]
with open(data["path"], "rb") as f:
data = f.read()
b = bytearray(data)
zip_obj.writestr(filename, b)
zip_obj.close()
buffer.seek(0)
response = FileResponse(FileWrapper(buffer))
response["Content-Length"] = len(buffer.getvalue())
response["Content-Disposition"] = "attachment; filename=the_zip.zip"
response["Content-Type"] = "application/x-zip-compressed"
return response
I’ve been using a method for serving downloads but since it was not secure i decided to change that . ( the method was a link to the original file in storage , but the risk was that everyone with the link could have downloaded the file ! ) so i now serve the file via my views , that way only users with permission can download the file , but i’m noticing a high load on server while there is many simultaneous download requests for the files. here’s part of my code that handles downloads for users ( Consider the file is an image )
image = Image.open ("the path to file")
response = HttpResponse(mimetype = 'image/png' )
response['Content-Disposition'] = 'attachment: filename=%s.png' % filename
image.save(response , "png")
return response
is there any better ways for serving files while keeping the security and lowering server side load ?
thanks in advance 🙂
You can use the ‘sendfile’ method as described in this answer.
Practically you need this (c&p):
response = HttpResponse(mimetype='application/force-download')
response['Content-Disposition'] = 'attachment; filename=%s' % smart_str(file_name)
response['X-Sendfile'] = smart_str(path_to_file)
# It's usually a good idea to set the 'Content-Length' header too.
# You can also set any other required headers: Cache-Control, etc.
return response
This requires mod_xsendfile (which is also supported by nginx or lighty)
Your opening of the image loads it in memory and this is what causes the increase in load under heavy use. As posted by Martin the real solution is to serve the file directly.
Here is another approach, which will stream your file in chunks without loading it in memory.
import os
import mimetypes
from wsgiref.util import FileWrapper
from django.http import StreamingHttpResponse
def download_file(request):
the_file = "/some/file/name.png"
filename = os.path.basename(the_file)
chunk_size = 8192
response = StreamingHttpResponse(
FileWrapper(
open(the_file, "rb"),
chunk_size,
),
content_type=mimetypes.guess_type(the_file)[0],
)
response["Content-Length"] = os.path.getsize(the_file)
response["Content-Disposition"] = f"attachment; filename={filename}"
return response
Unless you are going to be serving very very small number of such requests, any solution that requires serving your content via django won’t be scalable. For anything to scale in future, you’ll probably want to move your content storage and serving to to a separate server and then this won’t work.
The recommended way would be to keep static content served through a lighter server (such as nginx). To add security, pass the static server a token from django by setting the cookie or via the get parameters.
The token should have following values: timestamp, filename, userid. It should be signed via some key by the django app.
Next, write a small nginx module which checks the token and that the user has indeed access to the file. It should also check that token isn’t old enough by checking the timestamp.
FileWrapper won’t work when GZipMiddleware is installed (Django 1.4 and below):
https://code.djangoproject.com/ticket/6027
If using GZipMiddleware, a practical solution is to write a subclass of FileWrapper like so:
from wsgiref.util import FileWrapper
class FixedFileWrapper(FileWrapper):
def __iter__(self):
self.filelike.seek(0)
return self
import mimetypes, os
my_file = '/some/path/xy.ext'
response = HttpResponse(FixedFileWrapper(open(my_file, 'rb')), content_type=mimetypes.guess_type(my_file)[0])
response['Content-Length'] = os.path.getsize(my_file)
response['Content-Disposition'] = "attachment; filename=%s" % os.path.basename(my_file)
return response
As of Python 2.5, there’s no need to import FileWrapper from Django.
It’s better to use FileRespose, is a subclass of StreamingHttpResponse optimized for binary files. It uses wsgi.file_wrapper if provided by the wsgi server, otherwise it streams the file out in small chunks.
import os
from django.http import FileResponse
from django.core.servers.basehttp import FileWrapper
def download_file(request):
_file = '/folder/my_file.zip'
filename = os.path.basename(_file)
response = FileResponse(FileWrapper(file(filename, 'rb')), content_type='application/x-zip-compressed')
response['Content-Disposition'] = "attachment; filename=%s" % _file
return response
Here’s another working solution which loads the file in memory instead in the filesystem.
from wsgiref.util import FileWrapper
from django.http import FileResponse
def download(request):
buffer = io.BytesIO()
zip_obj = zipfile.ZipFile(buffer, "w")
# put files in the memory-stored zip file
# add_to_zip is a list of dictionaries, consisting of 2 entries each.
# [{"filename": "foo.jpg", "path": "absolute/path/to/foo.jpg"}, ...]
for data in add_to_zip:
filename = data["filename"]
with open(data["path"], "rb") as f:
data = f.read()
b = bytearray(data)
zip_obj.writestr(filename, b)
zip_obj.close()
buffer.seek(0)
response = FileResponse(FileWrapper(buffer))
response["Content-Length"] = len(buffer.getvalue())
response["Content-Disposition"] = "attachment; filename=the_zip.zip"
response["Content-Type"] = "application/x-zip-compressed"
return response