Python regular expression again – match URL

Question:

I have such a regular expression:

 re.compile(r"((https?):((//)|(\\))+[wd:#@%/;$()~_?+-=\.&]*)", re.MULTILINE|re.UNICODE)

But that doesn’t include hashbangs (#!). What do I need to change to get it working? I know I can add ! to a group with #@%, etc., but that will select something like

Check this out: http://example.com/something/!!!

And I want to avoid that.

Asked By: ThomK

||

Answers:

Don’t try to make your own regular expression for matching URLs. Use someone else’s who has already solved such problems, like this one.

Answered By: kindall

I’ll admit that I’m a little bit worried about an application that requires a regex like that to match URLs. That said, this seems to work for me:

((https?):((//)|(\\))+([wd:#@%/;$()~_?+-=\.&](#!)?)*)
Answered By: tsm

This is a common problem. Use default libraries.

For Python, use urlparse.

Answered By: estani

It could be very long but in practice mine works pretty good. Please try this one
((http|https)://)?[a-zA-Z0-9./?:@-_=#]+.([a-zA-Z]){2,6}([a-zA-Z0-9.&/?:@-_=#])*

It matches all of the example below

http://wwww.stackoverflow.com
abc.com
http://test.test-75.1474.stackoverflow.com/
stackoverflow.com/
stackoverflow.com
[email protected]
http://www.example.com/etcetc
www.example.com/etcetc
example.com/etcetc
user:[email protected]/etcetc
(www.itmag.com)
example.com/etcetc?query=aasd
example.com/etcetc?query=aasd&dest=asds
http://stackoverflow.com/questions/6427530/regular-expression-pattern-to-
match-url-with
www/[email protected]
[email protected].
[email protected]
[email protected]
Answered By: Asad

Based on this link, we can use the library validators.

For example:

import validators

valid = validators.url('https://codespeedy.com/')
if valid == True:
    print("URL is valid")
else:
    print("Invalid URL")
Answered By: Alireza Mazochi

This is the most complete pattern I use:

URL_PATTERN = r'[A-Za-z0-9]+://[A-Za-z0-9%-_]+(/[A-Za-z0-9%-_])*(#|\?)[A-Za-z0-9%-_&=]*'
Answered By: Leto Atreides

I use this to search for all HTTP and HTTPS URLs. It works like a charm.

URL_PATTERN = "http[s]*S+"
Answered By: Rafey Rana
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.