Regex for URL without path


I know there are many solutions, articles and libraries for this case, but couldn’t find one to match my case. I’m trying to write a regex to extract a URL(which represent the website) from a text (a signature of a person in an email), and has multiple cases:

  • Could contain http(s):// , or not
  • Could contain www. , or not
  • Could have multiple TLD such as ""

Here are some examples:

I’ve come up with the following regex:


But there are two main problems with this, because the signature can contain an email address:

  1. It (wrongly) capture the TLDs of emails like this one: [email protected]
  2. It doesn’t capture URLS in the middle of a line, and if I remove the $ sign at the end, it captures the name.surname part of the last example

For (1) I tried using negative lookbehind, adding this (?<!@) to the beginning, the problem is that now it captures instead of not matching it at all.

Asked By: sagi



I think you could use b (boundary) instead of $ (and at the beginning as well) and exclude @ in negative lookbehind and lookahead:


Edit: exclude the dot (and all non alphanumeric characters likely to occur in an URL/email address) in your lookarounds to avoid matching name.middlename in [email protected] or in [email protected]. See this answer for the list of characters

Answered By: Tranbi
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.