Return "Error" if no match found by regex
Question:
I have a string:
link = "http://www.this_is_my_perfect_url.com/blah_blah/blah_blah?=trololo"
And I have a function which return the domain name from that url or if it not found, returns ''
:
def get_domain(url):
domain_regex = re.compile("://(.*?)/|$")
return re.findall(domain_regex, str(url))[0].replace('www.', '')
get_domain(link)
returned result:
this_is_my_perfect_url.com
|$
returns ''
if regex matches nothing.
Is there a way to implement the default value Error
inside regex so I do not have to do any check inside the fuction?
So if link = "there_is_no_domain_in_here"
then the fuction returns Error
instead of ''
.
Answers:
As mentioned in the comments above, you cannot set anything in regex to do that for you, but you can check if the output returned by re.findall
after applying the extra formatting is empty or not, and if it is empty, which means that no matches were found, return Error
import re
link = "http://www.this_is_my_perfect_url.com/blah_blah/blah_blah?=trololo"
def get_domain(url):
domain_regex = re.compile("://(.*?)/|$")
#Get regex matches into a list after data massaging
matches = re.findall(domain_regex, str(url))[0].replace('www.', '')
#Return the match or Error if output is empty
return matches or 'Error'
print(get_domain(link))
print(get_domain('there_is_no_domain_in_here'))
The output will be
this_is_my_perfect_url.com
Error
Just to put my two cents in – the lazy quantifier (.*?
) in combination with an alternation (|$
) is very ineffective. You can vastly ameliorate your expression to:
://[^/]+
Additionally, as of Python 3.8
you could use the walrus operator as in
if (m := re.search("://[^/]+", your_string)) is not None:
# found sth.
else
return "Error"
And no – with regular expressions alone you cannot get sth. out of a string which is not there in the first place.
why not use urlparse to get domain?
# env python 2
# import urlparse
# python 3
from urllib.parse import urlparse
def get_domain(url):
parsed_uri = urlparse(url)
domain = parsed_uri.netloc
return domain or "ERROR"
url = 'there_is_no_domain_in_here'
print(get_domain(url))
I have a string:
link = "http://www.this_is_my_perfect_url.com/blah_blah/blah_blah?=trololo"
And I have a function which return the domain name from that url or if it not found, returns ''
:
def get_domain(url):
domain_regex = re.compile("://(.*?)/|$")
return re.findall(domain_regex, str(url))[0].replace('www.', '')
get_domain(link)
returned result:
this_is_my_perfect_url.com
|$
returns ''
if regex matches nothing.
Is there a way to implement the default value Error
inside regex so I do not have to do any check inside the fuction?
So if link = "there_is_no_domain_in_here"
then the fuction returns Error
instead of ''
.
As mentioned in the comments above, you cannot set anything in regex to do that for you, but you can check if the output returned by re.findall
after applying the extra formatting is empty or not, and if it is empty, which means that no matches were found, return Error
import re
link = "http://www.this_is_my_perfect_url.com/blah_blah/blah_blah?=trololo"
def get_domain(url):
domain_regex = re.compile("://(.*?)/|$")
#Get regex matches into a list after data massaging
matches = re.findall(domain_regex, str(url))[0].replace('www.', '')
#Return the match or Error if output is empty
return matches or 'Error'
print(get_domain(link))
print(get_domain('there_is_no_domain_in_here'))
The output will be
this_is_my_perfect_url.com
Error
Just to put my two cents in – the lazy quantifier (.*?
) in combination with an alternation (|$
) is very ineffective. You can vastly ameliorate your expression to:
://[^/]+
Additionally, as of Python 3.8
you could use the walrus operator as in
if (m := re.search("://[^/]+", your_string)) is not None:
# found sth.
else
return "Error"
And no – with regular expressions alone you cannot get sth. out of a string which is not there in the first place.
why not use urlparse to get domain?
# env python 2
# import urlparse
# python 3
from urllib.parse import urlparse
def get_domain(url):
parsed_uri = urlparse(url)
domain = parsed_uri.netloc
return domain or "ERROR"
url = 'there_is_no_domain_in_here'
print(get_domain(url))