Equivalent JavaScript functions for Python's urllib.parse.quote() and urllib.parse.unquote()

Question:

Are there any equivalent JavaScript functions for Python’s urllib.parse.quote() and urllib.parse.unquote()?

The closest I’ve come across are encodeURI()/encodeURIComponent() and escape() (and their corresponding un-encoding functions), but they don’t encode/decode the same set of special characters as far as I can tell.

Asked By: Cameron

||

Answers:

OK, I think I’m going to go with a hybrid custom set of functions:

Encode: Use encodeURIComponent(), then put slashes back in.
Decode: Decode any %hex values found.

Here’s a more complete variant of what I ended up using (it handles Unicode properly, too):

function quoteUrl(url, safe) {
    if (typeof(safe) !== 'string') {
        safe = '/';    // Don't escape slashes by default
    }

    url = encodeURIComponent(url);

    // Unescape characters that were in the safe list
    toUnencode = [  ];
    for (var i = safe.length - 1; i >= 0; --i) {
        var encoded = encodeURIComponent(safe[i]);
        if (encoded !== safe.charAt(i)) {    // Ignore safe char if it wasn't escaped
            toUnencode.push(encoded);
        }
    }

    url = url.replace(new RegExp(toUnencode.join('|'), 'ig'), decodeURIComponent);

    return url;
}


var unquoteUrl = decodeURIComponent;    // Make alias to have symmetric function names

Note that if you don’t need “safe” characters when encoding ('/' by default in Python), then you can just use the built-in encodeURIComponent() and decodeURIComponent() functions directly.

Also, if there are Unicode characters (i.e. characters with codepoint >= 128) in the string, then to maintain compatibility with JavaScript’s encodeURIComponent(), the Python quote_url() would have to be:

def quote_url(url, safe):
    """URL-encodes a string (either str (i.e. ASCII) or unicode);
    uses de-facto UTF-8 encoding to handle Unicode codepoints in given string.
    """
    return urllib.quote(unicode(url).encode('utf-8'), safe)

And unquote_url() would be:

def unquote_url(url):
    """Decodes a URL that was encoded using quote_url.
    Returns a unicode instance.
    """
    return urllib.unquote(url).decode('utf-8')
Answered By: Cameron

Try a regex. Something like this:

mystring.replace(/[xFF-xFFFF]/g, "%" + "$&".charCodeAt(0));

That will replace any character above ordinal 255 with its corresponding %HEX representation.

Answered By: jiggy

Python: urllib.quote

Javascript:unescape

I haven’t done extensive testing but for my purposes it works most of the time. I guess you have some specific characters that don’t work. Maybe if I use some Asian text or something it will break 🙂

This came up when I googled so I put this in for all the others, if not specifically for the original question.

Answered By: Luke Stanley
JavaScript               |  Python
----------------------------------- 
encodeURI(str)           |  urllib.parse.quote(str, safe='~@#$&()*!+=:;,?/'');
-----------------------------------
encodeURIComponent(str)  |  urllib.parse.quote(str, safe='~()*!'')

On Python 3.7+ you can remove ~ from safe=.

Answered By: mjhm

The requests library is a bit more popular if you don’t mind the extra dependency

from requests.utils import quote
quote(str)
Answered By: Milimetric

decodeURIComponent() is similar to unquote

const unquote = decodeURIComponent
const unquote_plus = (s) => decodeURIComponent(s.replace(/+/g, ' '))

except that Python is much more forgiving. If one of the two characters after a % is not a hex digit (or there’s not two characters after a %), JavaScript will throw a URIError: URI malformed error, whereas Python will just leave the % as is.

encodeURIComponent() is not quite the same as quote, you need to percent encode a few more characters and un-escape /:

const quoteChar = (c) => '%' + c.charCodeAt(0).toString(16).padStart(2, '0').toUpperCase()
const quote = (s) => encodeURIComponent(s).replace(/[()*!']/g, quoteChar).replace(/%2F/g, '/')

const quote_plus = (s) => quote(s).replace(/%20/g, '+')

The characters that Python’s quote doesn’t escape is documented here and is listed as (on Python 3.7+) "Letters, digits, and the characters '_.-~' are never quoted. By default, this function is intended for quoting the path section of a URL. The optional safe parameter specifies additional ASCII characters that should not be quoted — its default value is '/'"

The characters that JavaScript’s encodeURIComponent doesn’t encode is documented here and is listed as uriAlpha (upper and lowercase ASCII letters), DecimalDigit and uriMark, which are - _ . ! ~ * ' ( ).

Answered By: Boris Verkhovskiy

Here are implementations based on a implementation on github repo purescript-python:

import urllib.parse as urllp
def encodeURI(s): return urllp.quote(s, safe="~@#$&()*!+=:;,.?/'")
def decodeURI(s): return urllp.unquote(s, errors="strict")
def encodeURIComponent(s): return urllp.quote(s, safe="~()*!.'")
def decodeURIComponent(s): return urllp.unquote(s, errors="strict")
Answered By: Timothy C. Quinn

I am passing text files back and forth between Python and JavaScript.

Although urllib.parse.quote (Python side) and decodeURIComponent (JavaScript side) seems to work OK, it may not work for every character correctly.

So I wrote my own function that should be 100% reliable, regardless of the characters in the text file.

On the Python side I use xxd to encode the file. xxd is a linux utility that converts the binary file to a string of 2 hex digits for each binary byte. The Python code to encode the file to a string of of hex codes from Python is:

mystring = os.popen("xxd -p "+your_file_name_here).read().replace('n','')

If you want to do the xxd conversion in Python instead of using the external program, you can use these functions. They only work with text files, though. If you need to work with binary, stick with the external xxd program.


def doxxd(s):
  xd=""
  c=""
  for i in range(0,len(s)):
    if (ord(s[i]))<16:
      c=hex( ord(s[i]) ).replace('0x','0')
    else: 
      c=hex( ord(s[i]) ).replace('0x','')
    xd+=c 
  return xd

def unxxd(x):
  s=""
  #get two chars at a time
  for i in range(0,len(x),2):
    s+=chr(int('0x'+x[i:i+2],16)) 
  return s

On the JavaScript side this function restores the hex code file back to the original text string:

function unxxd(str){
var s=""
//get two chars at a time
  for (i=0;i<str.length;i=i+2){
    s+=String.fromCharCode(parseInt("0x"+str.substr(i,2)))
  }
  return s
}
Answered By: Ken H
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.