Flask Mutating Request Path

Question:

Problem Statement

When path contains %25, flask seems to be mutating the incoming path to treat %25 as % instead of preserving the original request path. Here are the request and path variable:

  • Request: : GET http://localhost:5000/Files/dir %a/test %25a.txt
  • Flask request.base_url: http://localhost:5000/Files/dir%20%25a/test%20%25a.txt
  • Debug: 127.0.0.1 - - [14/Feb/2023 12:00:49] "GET /Files/dir%20%a/test%20%25a.txt HTTP/1.1" 200 -

Specifically the test %25a.txt seems to be encoded as test%20%25a.txt instead of test%20%2525a.txt.

Environment

  • Python 3
  • Ubuntu 20.04
  • Flask 2.2.x

Things Tried

Help Needed

  • Is %25 indeed not allowed to be in the request path ?
  • For apps that allow files to be named with %25 what would be a good way to handle this ?
Asked By: PseudoAj

||

Answers:

https://www.rfc-editor.org/rfc/rfc7230 ยง 2.7 explains
that the path is comprised of
pchars,
which (roughly) are unreserved or pct-encoded.
Your favorite character definitely does not fall into this
or the similar delim category:

      unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

So that leaves us with a percent-encoded %25,
which the spec
treats further.

Because the percent ("%") character serves as the indicator for percent-encoded octets, it must be percent-encoded as "%25" for that octet to be used as data within a URI. Implementations must not percent-encode or decode the same string more than once, as decoding an already decoded string might lead to misinterpreting a percent data octet as the beginning of a percent-encoding, …

And that is where things went south for you.


Now, one can tilt at windmills until Don Quixote brings the cows home,
but the fact of the matter is that software is made of bugs,
and they can be hard to isolate and get folks to fix.

The usual Pragmatic approach to sending a "forbidden" character
such as percent is to disguise it as it makes it way through
a software stack. Here’s two common techniques.

  1. Pick a seldom used character, perhaps ~ tilde. Map percent to tilde and vice-versa. Prohibit tilde in pathnames, or use percent-encoded %7E for it.
  2. Base64 encode the pathname, and decode on the other end.

This tends to leave your URLs a bit uglier, a bit less informative,
than they would have been.
Given a pathname p, either it contains a percent or it doesn’t.
Prepend 0 if it doesn’t, and now it survives untouched, in a form that can be grep‘d.
Prepend 1 if it does, and then use base64 or whatever.
Strip the leading digit on the other end and process appropriately.

Answered By: J_H