How to encode UTF8 filename for HTTP headers? (Python, Django)

Question:

I have problem with HTTP headers, they’re encoded in ASCII and I want to provided a view for downloading files that names can be non ASCII.

response['Content-Disposition'] = 'attachment; filename="%s"' % (vo.filename.encode("ASCII","replace"), )

I don’t want to use static files serving for same issue with non ASCII file names but in this case there would be a problem with File system and it’s file name encoding. (I don’t know target os.)

I’ve already tried urllib.quote(), but it raises KeyError exception.

Possibly I’m doing something wrong but maybe it’s impossible.

Asked By: Chris Ciesielski

||

Answers:

This is a FAQ.

There is no interoperable way to do this. Some browsers implement proprietary extensions (IE, Chrome), other implement RFC 2231 (Firefox, Opera).

See test cases at http://greenbytes.de/tech/tc2231/.

Update: as of November 2012, all current desktop browsers support the encoding defined in RFC 6266 and RFC 5987 (Safari >= 6, IE >= 9, Chrome, Firefox, Opera, Konqueror).

Answered By: Julian Reschke

Don’t send a filename in Content-Disposition. There is no way to make non-ASCII header parameters work cross-browser(*).

Instead, send just “Content-Disposition: attachment”, and leave the filename as a URL-encoded UTF-8 string in the trailing (PATH_INFO) part of your URL, for the browser to pick up and use by default. UTF-8 URLs are handled much more reliably by browsers than anything to do with Content-Disposition.

(*: actually, there’s not even a current standard that says how it should be done as the relationships between RFCs 2616, 2231 and 2047 are pretty dysfunctional, something that Julian is trying to get cleared up at a spec level. Consistent browser support is in the distant future.)

Answered By: bobince

A hack:

if (Request.UserAgent.Contains("IE"))
{
  // IE will accept URL encoding, but spaces don't need to be, and since they're so common..
  filename = filename.Replace("%", "%25").Replace(";", "%3B").Replace("#", "%23").Replace("&", "%26");
}
Answered By: anon

Note that in 2011, RFC 6266 (especially Appendix D) weighed in on this issue and has specific recommendations to follow.

Namely, you can issue a filename with only ASCII characters, followed by filename* with a RFC 5987-formatted filename for those agents that understand it.

Typically this will look like filename="my-resume.pdf"; filename*=UTF-8''My%20R%C3%A9sum%C3%A9.pdf, where the Unicode filename ("My Résumé.pdf") is encoded into UTF-8 and then percent-encoded (note, do NOT use + for spaces).

Please do actually read RFC 6266 and RFC 5987 (or use a robust and tested library that abstracts this for you), as my summary here is lacking in important detail.

Answered By: Alan H.

I can say that I’ve had success using the newer (RFC 5987) format of specifying a header encoded with the e-mail form (RFC 2231). I came up with the following solution which is based on code from the django-sendfile project.

import unicodedata
from django.utils.http import urlquote

def rfc5987_content_disposition(file_name):
    ascii_name = unicodedata.normalize('NFKD', file_name).encode('ascii','ignore').decode()
    header = 'attachment; filename="{}"'.format(ascii_name)
    if ascii_name != file_name:
        quoted_name = urlquote(file_name)
        header += '; filename*=UTF-8''{}'.format(quoted_name)

    return header

# e.g.
  # request['Content-Disposition'] = rfc5987_content_disposition(file_name)

I have only tested my code on Python 3.4 with Django 1.8. So the similar solution in django-sendfile may suite you better.

There’s a long standing ticket in Django’s tracker which acknowledges this but no patches have yet been proposed afaict. So unfortunately this is as close to using a robust tested library as I could find, please let me know if there’s a better solution.

Answered By: Will S

Starting with Django 2.1 (see issue #16470), you can use FileResponse, which will correctly set the Content-Disposition header for attachments. Starting with Django 3.0 (issue #30196) it will also set it correctly for inline files.

For example, to return a file named my_img.jpg with MIME type image/jpeg as an HTTP response:

response = FileResponse(open("my_img.jpg", 'rb'), as_attachment=True, content_type="image/jpeg")
return response

Or, if you can’t use FileResponse, you can use the relevant part from FileResponse‘s source to set the Content-Disposition header yourself. Here’s what that source currently looks like:

from urllib.parse import quote

disposition = 'attachment' if as_attachment else 'inline'
try:
    filename.encode('ascii')
    file_expr = 'filename="{}"'.format(filename)
except UnicodeEncodeError:
    file_expr = "filename*=utf-8''{}".format(quote(filename))
response.headers['Content-Disposition'] = '{}; {}'.format(disposition, file_expr)
Answered By: Mark Chackerian

The escape_uri_path function from Django is the solution that worked for me.

Read the Django Docs here to see which RFC standards are currently specified.

from django.utils.encoding import escape_uri_path

file = "response.zip"
response = HttpResponse(content_type='application/zip')
response['Content-Disposition'] = f"attachment; filename*=utf-8''{escape_uri_path(file)}"
Answered By: Joe Web