Set Unicode filename in Flask response header

Question:

I am trying to set the Content-Disposition header to send a file to the client. The file name is Unicode. When I try to set the header, it fails with a UnicodeEncodeError. I tried various combinations of encode and decode but couldn’t get it to work. How can I send a file with a Unicode filename?

destination_file = 'python_report.html'
response.headers['Content-Disposition'] = 'attachment; filename=' + destination_file
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/server.py", line 495, in send_header
    ("%s: %srn" % (keyword, value)).encode('latin-1', 'strict'))
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 41-42: ordinal not in range(256)
Asked By: Murtuza Z

||

Answers:

RFC 2231 section 4 describes how to specify an encoding to use instead of ASCII for a header value. Use the header option filename*=UTF-8''..., where ... is the url-encoded name. You can also include the filename option to provide an ASCII fallback.

Flask >= 1.0 supports calling send_from_directory and send_file with Unicode filenames. You can use send_from_directory with as_attachment=True and a Unicode filename.

from flask import send_from_directory

@app.route("/send-python-report")
def send_python_report():
    return send_from_directory("reports", "python_report.html", as_attachment=True)

For security, ensure that you use send_from_directory and not send_file, if the filename is provided by user input.


Prior to Flask 1.0, you can construct the header manually using the same process Flask uses.

import unicodedata
from urllib.parse import quote
from flask import send_from_directory

@app.route('/send-python-report')
def send_python_report():
    filename = "python_report.html"
    rv = send_from_directory("reports", filename)

    try:
        filename.encode("ascii")
    except UnicodeEncodeError:
        simple = unicodedata.normalize("NFKD", filename)
        simple = simple.encode("ascii", "ignore").decode("ascii")
        # safe = RFC 5987 attr-char
        quoted = quote(filename, safe="!#$&+-.^_`|~")
        names = {"filename": simple, "filename*": f"UTF-8''{quoted}"}
    else:
        names = {"filename": filename}

    rv.headers.set("Content-Disposition", "attachment", **names)
    return rv

Until relatively recently (before 2017), browsers did not consistently support this. This page has some metrics on browser support. Notably, IE8 will ignore the UTF-8 option, and will fail completely if the UTF-8 option comes before the ASCII option.

Answered By: davidism
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.