How can I remove the fragment identifier from a URL?
Question:
I have a string containing a link. The link often has the form:
http://www.address.com/something#something
Is there a function in python that can remove “#something” from a link?
Answers:
Try this:
>>> s="http://www.address.com/something#something"
>>> s1=s.split("#")[0]
>>> s1
'http://www.address.com/something'
Just use split()
>>> foo = "http://www.address.com/something#something"
>>> foo = foo.split('#')[0]
>>> foo
'http://www.address.com/something'
>>>
For Python 2 use urlparse.urldefrag:
>>> urlparse.urldefrag("http://www.address.com/something#something")
('http://www.address.com/something', 'something')
In Python 3, the urldefrag
function is now part of urllib.parse
:
from urllib.parse import urldefrag
unfragmented = urldefrag("http://www.address.com/something#something")
Result:
('http://www.address.com/something', 'something')
You can assign away the unwanted part like so
fixed, throwaway = urldefrag(url)
where url is the fragmented address. This is a bit nicer than a split. I have not checked if it is faster or more efficient though.
I have a string containing a link. The link often has the form:
http://www.address.com/something#something
Is there a function in python that can remove “#something” from a link?
Try this:
>>> s="http://www.address.com/something#something"
>>> s1=s.split("#")[0]
>>> s1
'http://www.address.com/something'
Just use split()
>>> foo = "http://www.address.com/something#something"
>>> foo = foo.split('#')[0]
>>> foo
'http://www.address.com/something'
>>>
For Python 2 use urlparse.urldefrag:
>>> urlparse.urldefrag("http://www.address.com/something#something")
('http://www.address.com/something', 'something')
In Python 3, the urldefrag
function is now part of urllib.parse
:
from urllib.parse import urldefrag
unfragmented = urldefrag("http://www.address.com/something#something")
Result:
('http://www.address.com/something', 'something')
You can assign away the unwanted part like so
fixed, throwaway = urldefrag(url)
where url is the fragmented address. This is a bit nicer than a split. I have not checked if it is faster or more efficient though.