Regex to catch email addresses in email header
Question:
I’m trying to parse a To
email header with a regex. If there are no <>
characters then I want the whole string otherwise I want what is inside the <>
pair.
import re
re_destinatario = re.compile(r'^.*?<?(?P<to>.*)>?')
addresses = [
'XKYDF/ABC (Caixa Corporativa)',
'Fulano de Tal | Atlantica Beans <[email protected]>'
]
for address in addresses:
m = re_destinatario.search(address)
print(m.groups())
print(m.group('to'))
But the regex is wrong:
('XKYDF/ABC (Caixa Corporativa)',)
XKYDF/ABC (Caixa Corporativa)
('Fulano de Tal | Atlantica Beans <[email protected]>',)
Fulano de Tal | Atlantica Beans <[email protected]>
What am I missing?
Answers:
You may use this regex:
<?(?P<to>[^<>]+)>?$
RegEx Demo:
<?
: Match an optional <
(?P<to>[^<>]+)
: Named capture group to
to match 1+ of any characters that are not <
and >
>?
: Match an optional >
$
: End
Code:
import re
re_destinatario = re.compile(r'<?(?P<to>[^<>]+)>?$')
addresses = [
'XKYDF/ABC (Caixa Corporativa)',
'Fulano de Tal | Atlantica Beans <[email protected]>'
]
for address in addresses:
m = re_destinatario.search(address)
print(m.group('to'))
Output:
XKYDF/ABC (Caixa Corporativa)
[email protected]
You should not make the angle brackets optional, but the whole angle bracket match.
^.*?(?:<(?P<to>.*)>)?$
Explanation
^
Start of string
.*?
Match any character, as few as possible
(?:
Non capture group to match as a whole part
<(?P<to>.*)>
Match <
, then capture in named group to any character and then match >
(note that .*
can also cross matching <
and >
)
)?
Close the non capture group and make it optional
$
End of string
For example:
import re
re_destinatario = re.compile(r'^.*?(?:<(?P<to>[^<>n]*)>)?$')
addresses = [
'XKYDF/ABC (Caixa Corporativa)',
'Fulano de Tal | Atlantica Beans <[email protected]>'
]
for address in addresses:
m = re_destinatario.search(address)
if m:
if m.group('to'):
print(m.group('to'))
else:
print(m.group())
Output:
XKYDF/ABC (Caixa Corporativa)
[email protected]
If you don’t want to cross matching the angle brackets or a newline:
^.*?(?:<(?P<to>[^<>n]*)>)?$
I’m trying to parse a To
email header with a regex. If there are no <>
characters then I want the whole string otherwise I want what is inside the <>
pair.
import re
re_destinatario = re.compile(r'^.*?<?(?P<to>.*)>?')
addresses = [
'XKYDF/ABC (Caixa Corporativa)',
'Fulano de Tal | Atlantica Beans <[email protected]>'
]
for address in addresses:
m = re_destinatario.search(address)
print(m.groups())
print(m.group('to'))
But the regex is wrong:
('XKYDF/ABC (Caixa Corporativa)',)
XKYDF/ABC (Caixa Corporativa)
('Fulano de Tal | Atlantica Beans <[email protected]>',)
Fulano de Tal | Atlantica Beans <[email protected]>
What am I missing?
You may use this regex:
<?(?P<to>[^<>]+)>?$
RegEx Demo:
<?
: Match an optional<
(?P<to>[^<>]+)
: Named capture groupto
to match 1+ of any characters that are not<
and>
>?
: Match an optional>
$
: End
Code:
import re
re_destinatario = re.compile(r'<?(?P<to>[^<>]+)>?$')
addresses = [
'XKYDF/ABC (Caixa Corporativa)',
'Fulano de Tal | Atlantica Beans <[email protected]>'
]
for address in addresses:
m = re_destinatario.search(address)
print(m.group('to'))
Output:
XKYDF/ABC (Caixa Corporativa)
[email protected]
You should not make the angle brackets optional, but the whole angle bracket match.
^.*?(?:<(?P<to>.*)>)?$
Explanation
^
Start of string.*?
Match any character, as few as possible(?:
Non capture group to match as a whole part<(?P<to>.*)>
Match<
, then capture in named group to any character and then match>
(note that.*
can also cross matching<
and>
)
)?
Close the non capture group and make it optional$
End of string
For example:
import re
re_destinatario = re.compile(r'^.*?(?:<(?P<to>[^<>n]*)>)?$')
addresses = [
'XKYDF/ABC (Caixa Corporativa)',
'Fulano de Tal | Atlantica Beans <[email protected]>'
]
for address in addresses:
m = re_destinatario.search(address)
if m:
if m.group('to'):
print(m.group('to'))
else:
print(m.group())
Output:
XKYDF/ABC (Caixa Corporativa)
[email protected]
If you don’t want to cross matching the angle brackets or a newline:
^.*?(?:<(?P<to>[^<>n]*)>)?$