Find email domain in address with regular expressions
Question:
I know I’m an idiot, but I can’t pull the domain out of this email address:
'[email protected]'
My desired output:
'@gmail.com'
My current output:
.
(it’s just a period character)
Here’s my code:
import re
test_string = '[email protected]'
domain = re.search('@*?.', test_string)
print domain.group()
Here’s what I think my regular expression says (‘@*?.’, test_string):
' # begin to define the pattern I'm looking for (also tell python this is a string)
@ # find all patterns beginning with the at symbol ("@")
* # find all characters after ampersand
? # find the last character before the period
# breakout (don't use the next character as a wild card, us it is a string character)
. # find the "." character
' # end definition of the pattern I'm looking for (also tell python this is a string)
, test string # run the preceding search on the variable "test_string," i.e., '[email protected]'
I’m basing this off the definitions here:
http://docs.activestate.com/komodo/4.4/regex-intro.html
Also, I searched but other answers were a bit too difficult for me to get my head around.
Help is much appreciated, as usual. Thanks.
My stuff if it matters:
Windows 7 Pro (64 bit)
Python 2.6 (64 bit)
PS. StackOverflow quesiton: My posts don’t include new lines unless I hit “return” twice in between them. For example (these are all on a different line when I’m posting):
@ – find all patterns beginning with the at symbol (“@”)
* – find all characters after ampersand
? – find the last character before the period
– breakout (don’t use the next character as a wild card, us it is a string character)
. – find the “.” character
, test string – run the preceding search on the variable “test_string,” i.e., ‘[email protected]’
That’s why I got a blank line b/w every line above. What am I doing wrong? Thx.
Answers:
Ok, so why not use split? (or partition )
"@"+'[email protected]'.split("@")[-1]
Or you can use other string methods like find
>>> s="[email protected]"
>>> s[ s.find("@") : ]
'@gmail.com'
>>>
and if you are going to extract out email addresses from some other text
f=open("file")
for line in f:
words= line.split()
if "@" in words:
print "@"+words.split("@")[-1]
f.close()
Using regular expressions:
>>> re.search('@.*', test_string).group()
'@gmail.com'
A different way:
>>> '@' + test_string.split('@')[1]
'@gmail.com'
Here’s something I think might help
import re
s = 'My name is Conrad, and [email protected] is my email.'
domain = re.search("@[w.]+", s)
print domain.group()
outputs
@gmail.com
How the regex works:
@
– scan till you see this character
[w.]
a set of characters to potentially match, so w
is all alphanumeric characters, and the trailing period .
adds to that set of characters.
+
one or more of the previous set.
Because this regex is matching the period character and every alphanumeric after an @
, it’ll match email domains even in the middle of sentences.
Just wanted to point out that chrisaycock’s method would match invalid email addresses of the form
herp@
to correctly ensure you’re just matching a possibly valid email with domain you need to alter it slightly
Using regular expressions:
>>> re.search('@.+', test_string).group()
'@gmail.com'
Using the below regular expression you can extract any domain like .com or .in.
import re
s = 'my first email is [email protected] second email is enter code [email protected] and third email is [email protected]'
print(re.findall('@+S+[.in|.com|]',s))
output
['@gmail.com', '@yahoo.in']
Here is another method using the index function:
email_addr = '[email protected]'
# Find the location of @ sign
index = email_addr.index("@")
# extract the domain portion starting from the index
email_domain = email_addr[index:]
print(email_domain)
#------------------
# Output:
@gmail.com
You can try using urllib
from urllib import parse
email = '[email protected]'
domain = parse.splituser(email)[1]
Output will be
'mydomain.com'
I know I’m an idiot, but I can’t pull the domain out of this email address:
'[email protected]'
My desired output:
'@gmail.com'
My current output:
.
(it’s just a period character)
Here’s my code:
import re
test_string = '[email protected]'
domain = re.search('@*?.', test_string)
print domain.group()
Here’s what I think my regular expression says (‘@*?.’, test_string):
' # begin to define the pattern I'm looking for (also tell python this is a string)
@ # find all patterns beginning with the at symbol ("@")
* # find all characters after ampersand
? # find the last character before the period
# breakout (don't use the next character as a wild card, us it is a string character)
. # find the "." character
' # end definition of the pattern I'm looking for (also tell python this is a string)
, test string # run the preceding search on the variable "test_string," i.e., '[email protected]'
I’m basing this off the definitions here:
http://docs.activestate.com/komodo/4.4/regex-intro.html
Also, I searched but other answers were a bit too difficult for me to get my head around.
Help is much appreciated, as usual. Thanks.
My stuff if it matters:
Windows 7 Pro (64 bit)
Python 2.6 (64 bit)
PS. StackOverflow quesiton: My posts don’t include new lines unless I hit “return” twice in between them. For example (these are all on a different line when I’m posting):
@ – find all patterns beginning with the at symbol (“@”)
* – find all characters after ampersand
? – find the last character before the period
– breakout (don’t use the next character as a wild card, us it is a string character)
. – find the “.” character
, test string – run the preceding search on the variable “test_string,” i.e., ‘[email protected]’
That’s why I got a blank line b/w every line above. What am I doing wrong? Thx.
Ok, so why not use split? (or partition )
"@"+'[email protected]'.split("@")[-1]
Or you can use other string methods like find
>>> s="[email protected]"
>>> s[ s.find("@") : ]
'@gmail.com'
>>>
and if you are going to extract out email addresses from some other text
f=open("file")
for line in f:
words= line.split()
if "@" in words:
print "@"+words.split("@")[-1]
f.close()
Using regular expressions:
>>> re.search('@.*', test_string).group()
'@gmail.com'
A different way:
>>> '@' + test_string.split('@')[1]
'@gmail.com'
Here’s something I think might help
import re
s = 'My name is Conrad, and [email protected] is my email.'
domain = re.search("@[w.]+", s)
print domain.group()
outputs
@gmail.com
How the regex works:
@
– scan till you see this character
[w.]
a set of characters to potentially match, so w
is all alphanumeric characters, and the trailing period .
adds to that set of characters.
+
one or more of the previous set.
Because this regex is matching the period character and every alphanumeric after an @
, it’ll match email domains even in the middle of sentences.
Just wanted to point out that chrisaycock’s method would match invalid email addresses of the form
herp@
to correctly ensure you’re just matching a possibly valid email with domain you need to alter it slightly
Using regular expressions:
>>> re.search('@.+', test_string).group()
'@gmail.com'
Using the below regular expression you can extract any domain like .com or .in.
import re
s = 'my first email is [email protected] second email is enter code [email protected] and third email is [email protected]'
print(re.findall('@+S+[.in|.com|]',s))
output
['@gmail.com', '@yahoo.in']
Here is another method using the index function:
email_addr = '[email protected]'
# Find the location of @ sign
index = email_addr.index("@")
# extract the domain portion starting from the index
email_domain = email_addr[index:]
print(email_domain)
#------------------
# Output:
@gmail.com
You can try using urllib
from urllib import parse
email = '[email protected]'
domain = parse.splituser(email)[1]
Output will be
'mydomain.com'