Strip A specific part from a url string in python

Question:

Im passing through some urls and I’d like to strip a part of it which dynamically changes so I don’t know it firsthand.
An example url is:

https://...?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2

And I’d like to strip the gid=lostchapter part without any of the rest.

How do I do that?

Asked By: haduki

||

Answers:

We can try doing a regex replacement:

url = "https://...?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2"
output = re.sub(r'(?<=[?&])gid=lostchapter&?', '', url)
print(output)  # https://...?pid=2&lang=en_GB&practice=1&channel=desktop&demo=2

For a more generic replacement, match on the following regex pattern:

(?<=[?&])gid=w+&?
Answered By: Tim Biegeleisen

Here is the simple way to strip them

urls = "https://...?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2"

# Import the `urlparse` and `urlunparse` methods
from urllib.parse import urlparse, urlunparse

# Parse the URL
url = urlparse(urls)

# Convert the `urlparse` object back into a URL string
url = urlunparse(url)

# Strip the string
url = url.split("?")[1]
url = url.split("&")[1]
# Print the new URL
print(url) # Prints "gid=lostchapter"
Answered By: Hansen Idden

You can use urllib to convert the query string into a Python dict and access the desired item:

In [1]: from urllib import parse

In [2]: s = "https://...?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2"

In [3]: q = parse.parse_qs(parse.urlsplit(s).query)

In [4]: q
Out[4]:
{'pid': ['2'],
 'gid': ['lostchapter'],
 'lang': ['en_GB'],
 'practice': ['1'],
 'channel': ['desktop'],
 'demo': ['2']}

In [5]: q["gid"]
Out[5]: ['lostchapter']
Answered By: ddejohn

Using string slicing (I’m assuming there will be an ‘&’ after gid=lostchapter)

url = r'https://...?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2'
start = url.find('gid')
end = start + url[url.find('gid'):].find('&')
url = url[start:] + url[:end-1]
print(url)

output

gid=lostchapter

What I’m trying to do here is:

  • find index of occurrence of "gid"
  • find the first "&" after "gid" is found
  • concatenate the parts of the url after"gid" and before "&"
Answered By: MUSTANGBOSS8055

Method 1: Using UrlParsers

from urllib.parse import urlparse
p = urlparse('https://.../?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2')
param: list[str] = [i for i in p.query.split('&') if i.startswith('gid=')]

Output: gid=lostchapter

Method 2: Using Regex

param: str = re.search(r'gid=.*&', 'https://.../?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2').group()[:-1]

you can change the regex pattern to appropriate pattern to match the expected outputs. currently it will extract any value.

Answered By: anmol_gorakshakar
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.