Why doesn't Url Decode convert + to space?

Question:

Why are the + not converted to spaces:

>>> import urllib
>>> url = 'Q=Who+am+I%3F'
>>> urllib.unquote(url)
'Q=Who+am+I?'
>>>
Asked By: Old Geezer

||

Answers:

There are two variants; urllib.unquote() and urllib.unquote_plus(). Use the latter:

>>> import urllib
>>> url = 'Q=Who+am+I%3F'
>>> urllib.unquote_plus(url)
'Q=Who am I?'

That’s because there are two variants of URL quoting; one for URL path segments, and one for URL query parameters; the latter uses a different specification. See Wikipedia:

When data that has been entered into HTML forms is submitted, the form field names and values are encoded and sent to the server in an HTTP request message using method GET or POST, or, historically, via email. The encoding used by default is based on a very early version of the general URI percent-encoding rules, with a number of modifications such as newline normalization and replacing spaces with "+" instead of "%20".

So forms using the application/x-www-form-urlencoded mime type in a GET or POST request use slightly different rules, one where spaces are encoded to +, but when encoding characters in a URL, %20 is used. When decoding you need to pick the right variant. You have form data (from the query part of the URL) so you need to use unquote_plus().

Now, if you are parsing a query string, you may want to use the urlparse.parse_qs() or urlparse.parse_qsl() functions; these not only will use the right unquote*() function, but parse out the parameters into a dictionary or list of key-value pairs as well:

>>> import urlparse
>>> urlparse.parse_qs(url)
{'Q': ['Who am I?']}
>>> urlparse.parse_qsl(url)
[('Q', 'Who am I?')]
Answered By: Martijn Pieters
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.