urllib.quote() throws KeyError

Question:

To encode the URI, I used urllib.quote("schönefeld") but when some non-ascii characters exists in string, it thorws

KeyError: u'xe9'
Code: return ''.join(map(quoter, s))

My input strings are köln, brønshøj, schönefeld etc.

When I tried just printing statements in windows(Using python2.7, pyscripter IDE). But in linux it raises exception (I guess platform doesn’t matter).

This is what I am trying:

from commands import getstatusoutput
queryParams = "schönefeld";
cmdString = "http://baseurl" + quote(queryParams)
print getstatusoutput(cmdString)

Exploring the issue reason:
in urllib.quote(), actually exception being throwin at return ''.join(map(quoter, s)).

The code in urllib is:

def quote(s, safe='/'):
    if not s:
        if s is None:
            raise TypeError('None object cannot be quoted')
        return s
     cachekey = (safe, always_safe)
     try:
         (quoter, safe) = _safe_quoters[cachekey]
     except KeyError:
         safe_map = _safe_map.copy()
         safe_map.update([(c, c) for c in safe])
         quoter = safe_map.__getitem__
         safe = always_safe + safe
         _safe_quoters[cachekey] = (quoter, safe)
      if not s.rstrip(safe):
         return s
      return ''.join(map(quoter, s))

The reason for exception is in ''.join(map(quoter, s)), for every element in s, quoter function will be called and finally the list will be joined by ” and returned.

For non-ascii char è, the equivalent key will be %E8 which presents in _safe_map variable. But when I am calling quote(‘è’), it searches for the key xe8. So that the key does not exist and exception thrown.

So, I just modifed s = [el.upper().replace("\X","%") for el in s] before calling ''.join(map(quoter, s)) within try-except block. Now it works fine.

But I am annoying what I have done is correct approach or it will create any other issue?
And also I do have 200+ instances of linux which is very tough to deploy this fix in all instances.

Asked By: Garfield

||

Answers:

You are trying to quote Unicode data, so you need to decide how to turn that into URL-safe bytes.

Encode the string to bytes first. UTF-8 is often used:

>>> import urllib
>>> urllib.quote(u'schxe9nefeld')
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py:1268: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  return ''.join(map(quoter, s))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 1268, in quote
    return ''.join(map(quoter, s))
KeyError: u'xe9'
>>> urllib.quote(u'schxe9nefeld'.encode('utf8'))
'sch%C3%A9nefeld'

However, the encoding depends on what the server will accept. It’s best to stick to the encoding the original form was sent with.

Answered By: Martijn Pieters

By just converting the string to unicode I resolved the issue.

here is the snippet:

try:
    unicode(mystring, "ascii")
except UnicodeError:
    mystring = unicode(mystring, "utf-8")
else:
    pass

Detailed description of solution can be found at http://effbot.org/pyfaq/what-does-unicodeerror-ascii-decoding-encoding-error-ordinal-not-in-range-128-mean.htm

Answered By: Garfield

I had the exact same error as @underscore but in my case the problem was that map(quoter,s) tried to look for the key u'xe9' which was not in the _safe_map. However xe9 was, so I solved the issue by replacing u'xe9' by xe9 in s.

Moreover, shouldn’t the return statement be within the try/except ? I also had to change this to completely solve the problem.

Answered By: Sebastien