Python split by regular expression

Question

In Python, I am extracting emails from a string like so:

split = re.split(" ", string)
emails = []

pattern = re.compile("^[a-zA-Z0-9_.-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+$");

for bit in split:
    result = pattern.match(bit)

    if(result != None):
        emails.append(bit)

And this works, as long as there is a space in between the emails. But this might not always be the case. For example:

Hello, [email protected]

would return:

[email protected]

but, take the following string:

I know my best friend mailto:[email protected]!

This would return null. So the question is: how can I make it so that a regex is the delimiter to split? I would want to get

[email protected]

in all cases, regardless of punctuation next to it. Is this possible in Python?

By "splitting by regex" I mean that if the program encounters the pattern in a string, it will extract that part and put it into a list.

Asked By: user569322

||

Source

Answer 1

I’d say you’re looking for re.findall:

>>> email_reg = re.compile(r'[a-zA-Z0-9_.-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+')
>>> email_reg.findall('I know my best friend mailto:[email protected]!')
['[email protected]']

Notice that findall can handle more than one email address:

>>> email_reg.findall('Text text [email protected], text text, [email protected]!')
['[email protected]', '[email protected]']

Answered By: Rik Poggi

Answer 2

Use re.search or re.findall.
You also need to escape your expression properly (. needs to be escaped outside of character classes, not inside) and remove/replace the anchors ^ and $ (for example with b), eg:

r"b[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+b"

Answered By: Qtax

Answer 3

The problem I see in your regex is your use of ^ which matches the start of a string and $ which matches the end of your string. If you remove it and then run it with your sample test case it will work

>>> re.findall("[A-Za-z0-9._-]+@[A-Za-z0-9-]+.[A-Za-z0-9-.]+","I know my best friend mailto:[email protected]!")
['[email protected]']
>>> re.findall("[A-Za-z0-9._-]+@[A-Za-z0-9-]+.[A-Za-z0-9-.]+","Hello, [email protected]")
['[email protected]']
>>>

Answered By: Abhijit

Python split by regular expression

Question:

Answers: