Regex to catch email addresses in email header

Question:

I’m trying to parse a To email header with a regex. If there are no <> characters then I want the whole string otherwise I want what is inside the <> pair.

import re
re_destinatario = re.compile(r'^.*?<?(?P<to>.*)>?')
addresses = [
    'XKYDF/ABC (Caixa Corporativa)',
    'Fulano de Tal | Atlantica Beans <[email protected]>'
]
for address in addresses:
    m = re_destinatario.search(address)
    print(m.groups())
    print(m.group('to'))

But the regex is wrong:

('XKYDF/ABC (Caixa Corporativa)',)
XKYDF/ABC (Caixa Corporativa)
('Fulano de Tal | Atlantica Beans <[email protected]>',)
Fulano de Tal | Atlantica Beans <[email protected]>

What am I missing?

Asked By: Clodoaldo Neto

||

Answers:

You may use this regex:

<?(?P<to>[^<>]+)>?$

RegEx Demo

RegEx Demo:

  • <?: Match an optional <
  • (?P<to>[^<>]+): Named capture group to to match 1+ of any characters that are not < and >
  • >?: Match an optional >
  • $: End

Code Demo

Code:

import re
re_destinatario = re.compile(r'<?(?P<to>[^<>]+)>?$')
addresses = [
    'XKYDF/ABC (Caixa Corporativa)',
    'Fulano de Tal | Atlantica Beans <[email protected]>'
]
for address in addresses:
    m = re_destinatario.search(address)
    print(m.group('to'))

Output:

XKYDF/ABC (Caixa Corporativa)
[email protected]
Answered By: anubhava

You should not make the angle brackets optional, but the whole angle bracket match.

^.*?(?:<(?P<to>.*)>)?$

Explanation

  • ^ Start of string
  • .*? Match any character, as few as possible
  • (?: Non capture group to match as a whole part
    • <(?P<to>.*)> Match <, then capture in named group to any character and then match > (note that .* can also cross matching < and >)
  • )? Close the non capture group and make it optional
  • $ End of string

Regex demo

For example:

import re

re_destinatario = re.compile(r'^.*?(?:<(?P<to>[^<>n]*)>)?$')
addresses = [
    'XKYDF/ABC (Caixa Corporativa)',
    'Fulano de Tal | Atlantica Beans <[email protected]>'
]

for address in addresses:
    m = re_destinatario.search(address)
    if m:
        if m.group('to'):
            print(m.group('to'))
        else:
            print(m.group())

Output:

XKYDF/ABC (Caixa Corporativa)
[email protected]

If you don’t want to cross matching the angle brackets or a newline:

^.*?(?:<(?P<to>[^<>n]*)>)?$

Regex demo

Answered By: The fourth bird