Python encode spaces in url only and not other special characters
Question:
I know this question has been asked many times but I can’t seem to find the variation that I’m looking for specifically.
I have a url, lets say its:
https://somethingA/somethingB/somethingC/some spaces here
I want to convert it to:
https://somethingA/somethingB/somethingC/some%20spaces%20here
I know I can do it with the replace
function like below:
url = https://somethingA/somethingB/somethingC/some spaces here
url.replace(' ', '%20')
But I have a feeling that the best practice is probably to use the urllib.parse
library. The problem is that when I use it, it encodes other special characters like :
too.
So if I do:
url = https://somethingA/somethingB/somethingC/some spaces here
urllib.parse.quote(url)
I get:
https%3A//somethingA/somethingB/somethingC/some%20spaces%20here
Notice the :
also gets converted to %3A
. So my question is, is there a way I can achieve the same thing as replace with urllib
? I think I would rather use a tried and tested library that is designed specifically to encode URLs instead of reinventing the wheel, which I may or may not be missing something leading to a security loop hole. Thank you.
Answers:
So quote()
there is built to work on just the path portion of a url. So you need to break things up a bit like this:
from urllib.parse import urlparse
url = 'https://somethingA/somethingB/somethingC/some spaces here'
parts = urlparse(url)
fixed_url = f"{parts.scheme}://{parts.netloc}/{urllib.parse.quote(parts.path)}"
I know this question has been asked many times but I can’t seem to find the variation that I’m looking for specifically.
I have a url, lets say its:
https://somethingA/somethingB/somethingC/some spaces here
I want to convert it to:
https://somethingA/somethingB/somethingC/some%20spaces%20here
I know I can do it with the replace
function like below:
url = https://somethingA/somethingB/somethingC/some spaces here
url.replace(' ', '%20')
But I have a feeling that the best practice is probably to use the urllib.parse
library. The problem is that when I use it, it encodes other special characters like :
too.
So if I do:
url = https://somethingA/somethingB/somethingC/some spaces here
urllib.parse.quote(url)
I get:
https%3A//somethingA/somethingB/somethingC/some%20spaces%20here
Notice the :
also gets converted to %3A
. So my question is, is there a way I can achieve the same thing as replace with urllib
? I think I would rather use a tried and tested library that is designed specifically to encode URLs instead of reinventing the wheel, which I may or may not be missing something leading to a security loop hole. Thank you.
So quote()
there is built to work on just the path portion of a url. So you need to break things up a bit like this:
from urllib.parse import urlparse
url = 'https://somethingA/somethingB/somethingC/some spaces here'
parts = urlparse(url)
fixed_url = f"{parts.scheme}://{parts.netloc}/{urllib.parse.quote(parts.path)}"