Return "Error" if no match found by regex

Question:

I have a string:

link = "http://www.this_is_my_perfect_url.com/blah_blah/blah_blah?=trololo"

And I have a function which return the domain name from that url or if it not found, returns '':

def get_domain(url):
    domain_regex = re.compile("://(.*?)/|$")
    return re.findall(domain_regex, str(url))[0].replace('www.', '')

get_domain(link)

returned result:

this_is_my_perfect_url.com

|$ returns '' if regex matches nothing.

Is there a way to implement the default value Error inside regex so I do not have to do any check inside the fuction?

So if link = "there_is_no_domain_in_here" then the fuction returns Error instead of ''.

Asked By: milka1117

||

Answers:

As mentioned in the comments above, you cannot set anything in regex to do that for you, but you can check if the output returned by re.findall after applying the extra formatting is empty or not, and if it is empty, which means that no matches were found, return Error

import re
link = "http://www.this_is_my_perfect_url.com/blah_blah/blah_blah?=trololo"

def get_domain(url):
    domain_regex = re.compile("://(.*?)/|$")

    #Get regex matches into a list after data massaging
    matches = re.findall(domain_regex, str(url))[0].replace('www.', '')

    #Return the match or Error if output is empty
    return matches or 'Error'

print(get_domain(link))
print(get_domain('there_is_no_domain_in_here'))

The output will be

this_is_my_perfect_url.com
Error
Answered By: Devesh Kumar Singh

Just to put my two cents in – the lazy quantifier (.*?) in combination with an alternation (|$) is very ineffective. You can vastly ameliorate your expression to:

://[^/]+

Additionally, as of Python 3.8 you could use the walrus operator as in

if (m := re.search("://[^/]+", your_string)) is not None:
    # found sth.
else
    return "Error"

And no – with regular expressions alone you cannot get sth. out of a string which is not there in the first place.

Answered By: Jan

why not use urlparse to get domain?

# env python 2
# import urlparse
# python 3
from urllib.parse import urlparse


def get_domain(url):
    parsed_uri = urlparse(url)
    domain = parsed_uri.netloc
    return domain or "ERROR"

url = 'there_is_no_domain_in_here'
print(get_domain(url))
Answered By: Stephen
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.