Split a URL into its components in Python
Question:
I have a huge list of URLs that are all like this:
http://www.example.com/site/section1/VAR1/VAR2
Where VAR1 and VAR2 are the dynamic elements of the URL. I want to extract only the VAR1 from this URL string. I’ve tried to use urlparse, but the output look like this:
ParseResult(scheme='http', netloc='www.example.com', path='/site/section1/VAR1/VAR2', params='', query='', fragment='')
Answers:
Alternatively, you can apply the split()
method:
>>> url = "http://www.example.com/site/section1/VAR1/VAR2"
>>> url.split("/")[-2:]
['VAR1', 'VAR2']
You can remember this in general. Different sections of the URL can be obtained using urlparse
. Here you can obtain the path
by urlparse(url).path
and then obtain the desired variable by split()
function
>>> from urlparse import urlparse
>>> url = 'http://www.example.com/site/section1/VAR1/VAR2'
>>> urlparse(url)
ParseResult(scheme='http', netloc='www.example.com', path='/site/section1/VAR1/VAR2', params='', query='', fragment='')
>>> urlparse(url).path
'/site/section1/VAR1/VAR2'
>>> urlparse(url).path.split('/')[-2]
'VAR1'
I would simply try
url = 'http://www.example.com/site/section1/VAR1/VAR2'
var1 = url.split('/')[-2]
Check this one. It is quite efficient, because it starts from the end of the string. With the maxsplit option, we can stop the number of splits.
Finally, you can use indexing to get the last two parts of the URL:
>>> url.rsplit('/',2)[1:]
['VAR1', 'VAR2']
I have a huge list of URLs that are all like this:
http://www.example.com/site/section1/VAR1/VAR2
Where VAR1 and VAR2 are the dynamic elements of the URL. I want to extract only the VAR1 from this URL string. I’ve tried to use urlparse, but the output look like this:
ParseResult(scheme='http', netloc='www.example.com', path='/site/section1/VAR1/VAR2', params='', query='', fragment='')
Alternatively, you can apply the split()
method:
>>> url = "http://www.example.com/site/section1/VAR1/VAR2"
>>> url.split("/")[-2:]
['VAR1', 'VAR2']
You can remember this in general. Different sections of the URL can be obtained using urlparse
. Here you can obtain the path
by urlparse(url).path
and then obtain the desired variable by split()
function
>>> from urlparse import urlparse
>>> url = 'http://www.example.com/site/section1/VAR1/VAR2'
>>> urlparse(url)
ParseResult(scheme='http', netloc='www.example.com', path='/site/section1/VAR1/VAR2', params='', query='', fragment='')
>>> urlparse(url).path
'/site/section1/VAR1/VAR2'
>>> urlparse(url).path.split('/')[-2]
'VAR1'
I would simply try
url = 'http://www.example.com/site/section1/VAR1/VAR2'
var1 = url.split('/')[-2]
Check this one. It is quite efficient, because it starts from the end of the string. With the maxsplit option, we can stop the number of splits.
Finally, you can use indexing to get the last two parts of the URL:
>>> url.rsplit('/',2)[1:]
['VAR1', 'VAR2']