How to split on a delimiter in python preserving the delimiter

Question:

so what i wanna do here is basically i have a file with a list of url endpoints, and i wanna split the links in the file on the slash delimter, basically generating sub-endpoints of endpoints, example:

https://www.somesite.com/path1/path2/path3

and i would want to get this:

https://www.somesite.com/path1/
https://www.somesite.com/path1/path2/
https://www.somesite.com/path1/path2/path3

i know how to achieve this in bash, but not with python, i tried using split function but it’s very limited in my hands. i hope i can get some help here, thank you

Asked By: FozenOption

||

Answers:

For the generic "split", but keeping the delimiter, you can use the str.partition method: https://docs.python.org/3/library/stdtypes.html#str.partition

Now, for your specific use case, where you want the full intermediate strings as a list, you can write some code, starting with the urllib.parse to get the URL initia;l part, without worrying about corner cases, and them manipulate the path with for, split and join.


url = "https://www.somesite.com/path1/path2/path3"
from urllib.parse import urlparse, urlunparse

path = (components:= list(urlparse(a)))[2]
path_comps_str = ""
path_comps = [path_comps_str:= path_comps_str + f"/{comp}" for comp in path.split("/")[1:]]
for path in path_comps:
    url_parts = components[:]
    url_parts[2] = path
    all_urls.append(urlunparse(url_parts))
Answered By: jsbueno

One option is to split by a /, then slice the result and join back:

>>> url = 'https://www.somesite.com/path1/path2/path3'
>>> parts = url.split('/')
>>> ['/'.join(parts[:p+1]) for p in range(3, len(parts))]
['https://www.somesite.com/path1', 'https://www.somesite.com/path1/path2', 'https://www.somesite.com/path1/path2/path3']
Answered By: gog

Try something like this:

link = "https://www.somesite.com/path1/path2/path3"
splitted = link.split('/')
newLink = splitted[0] + "//" + splitted[2] + "/"
for i in range(3, len(splitted)):
    newLink += splitted[i]
    if i != len(splitted)-1:
        newLink += "/"
    print(newLink)

The output code is:

https://www.somesite.com/path1/
https://www.somesite.com/path1/path2/
https://www.somesite.com/path1/path2/path3

But the last / of links is not needed so you could write it as:

link = "https://www.somesite.com/path1/path2/path3"
splitted = link.split('/')
newLink = splitted[0] + "//" + splitted[2]
for i in range(3, len(splitted)):
    newLink += "/" + splitted[i]
    print(newLink)
Answered By: Tugamer89
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.