Redact and remove password from URL

Question:

I have an URL like this:

https://user:[email protected]/path?key=value#hash

The result should be:

https://user:[email protected]/path?key=value#hash

I could use a regex, but instead I would like to parse the URL a high level data structure, then operate on this data structure, then serializing to a string.

Is this possible with Python?

Asked By: guettli

||

Answers:

You can use the built in urlparse to query out the password from a url. It is available in both Python 2 and 3, but under different locations.

Python 2 import urlparse

Python 3 from urllib.parse import urlparse

Example

from urllib.parse import urlparse

parsed = urlparse("https://user:[email protected]/path?key=value#hash")
parsed.password # 'password'

replaced = parsed._replace(netloc="{}:{}@{}".format(parsed.username, "???", parsed.hostname))
replaced.geturl() # 'https://user:[email protected]/path?key=value#hash'

See also this question: Changing hostname in a url

Answered By: alxwrd
from urllib.parse import urlparse

def redact_url(url: str) -> str:
    url_components = urlparse(url)
    if url_components.username or url_components.password:
        url_components = url_components._replace(
            netloc=f"{url_components.username}:???@{url_components.hostname}",
        )

    return url_components.geturl()
Answered By: scottsome
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.