Redact and remove password from URL
Question:
I have an URL like this:
https://user:[email protected]/path?key=value#hash
The result should be:
https://user:[email protected]/path?key=value#hash
I could use a regex, but instead I would like to parse the URL a high level data structure, then operate on this data structure, then serializing to a string.
Is this possible with Python?
Answers:
You can use the built in urlparse
to query out the password from a url. It is available in both Python 2 and 3, but under different locations.
Python 2 import urlparse
Python 3 from urllib.parse import urlparse
Example
from urllib.parse import urlparse
parsed = urlparse("https://user:[email protected]/path?key=value#hash")
parsed.password # 'password'
replaced = parsed._replace(netloc="{}:{}@{}".format(parsed.username, "???", parsed.hostname))
replaced.geturl() # 'https://user:[email protected]/path?key=value#hash'
See also this question: Changing hostname in a url
from urllib.parse import urlparse
def redact_url(url: str) -> str:
url_components = urlparse(url)
if url_components.username or url_components.password:
url_components = url_components._replace(
netloc=f"{url_components.username}:???@{url_components.hostname}",
)
return url_components.geturl()
I have an URL like this:
https://user:[email protected]/path?key=value#hash
The result should be:
https://user:[email protected]/path?key=value#hash
I could use a regex, but instead I would like to parse the URL a high level data structure, then operate on this data structure, then serializing to a string.
Is this possible with Python?
You can use the built in urlparse
to query out the password from a url. It is available in both Python 2 and 3, but under different locations.
Python 2 import urlparse
Python 3 from urllib.parse import urlparse
Example
from urllib.parse import urlparse
parsed = urlparse("https://user:[email protected]/path?key=value#hash")
parsed.password # 'password'
replaced = parsed._replace(netloc="{}:{}@{}".format(parsed.username, "???", parsed.hostname))
replaced.geturl() # 'https://user:[email protected]/path?key=value#hash'
See also this question: Changing hostname in a url
from urllib.parse import urlparse
def redact_url(url: str) -> str:
url_components = urlparse(url)
if url_components.username or url_components.password:
url_components = url_components._replace(
netloc=f"{url_components.username}:???@{url_components.hostname}",
)
return url_components.geturl()