split URL python

Question:

I have a URL https://muk05119.us-east-1.snowflakecomputing.com and I want to retrieve only muk05119.us-east-1 from this.

Instead of splitting the string and retrieving the above, what is the best way to accomplish this?

Asked By: Mukul Kumar

||

Answers:

url = ‘https://muk05119.us-east-1.snowflakecomputing.com’
‘.’.join(url.split(‘.’)[0:2]).split(‘/’)[-1]enter code here

Output:

'muk05119.us-east-1'
Answered By: René

Your example is clear by itself, but it’s unclear what rule underlies it. Do you want the first two parts of the domain? All but the last two parts of the domain? Do you want everything before a main domain name and the top level domain (e.g. before .google.com but also before .australia.gov.au)? Or some other rule still?

The first two parts:

from urllib.parse import urlparse

url = 'https://muk05119.us-east-1.snowflakecomputing.com'
netloc = urlparse(url).netloc

print(netloc[:netloc.index('.', netloc.index('.')+1)])

Or:

print('.'.join(netloc.split('.')[:2]))

All but the last two parts:

print('.'.join(netloc.split('.')[:-2]))

For everything before the main and top-level domain, have a look at https://pypi.org/project/publicsuffixlist/ and use that with some of the above.

Answered By: Grismar

You can use builtin library to extract hostname by using urllib.parse.

But you have to split string to extract subdomain after all.

from urllib.parse import urlparse

URL = "https://muk05119.us-east-1.snowflakecomputing.com"
parsed = urlparse(URL)

host = parsed.netloc  # => muk05119.us-east-1.snowflakecomputing.com
subdomain = '.'.join(host.split('.')[:2])
Answered By: pu2x

You can use urlparse

from urllib.parse import urlparse
url = urlparse('https://muk05119.us-east-1.snowflakecomputing.com')
subdomain = url.hostname.split('.')[0] + '.' + url.hostname.split('.')[1]

where url.hostname.split('.')[x] where x indicates the subdomain. in your case, the first two subdomains need to be used, so 0 and 1

Documentation

Answered By: NicoCaldo
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.