Using regex to find email address and other values in Python String

Question:

So I have a string that possesses data I need to extract for my main program.
It looks something like this:

string = "[email:[email protected]][days:90]"

From this string I want to extract the data within the brackets and be able to split email and the email address by the colon so that I can store the word email and the email address separately to get something like this:

string = "[email:[email protected]]"
... some regex here ...
param_type = "email"
param_value = "[email protected]"

if param_type == 'email':
   ... my code to send an email to param_value ...

The string could ultimately have at most 2 pairs of brackets for different parameter types so that I can specify what functions to handle:

string = "[email:[email protected]] [days:90]"
...regex to split by bracket group ....
param_type1 = "email"
param1 = "[email protected]"

param_type2 = "days"
param2 = "90"

if param_type1 != "":
   ... email code ...
if param_type2 != "":
   ... run other code for the specified number of days ...

The main program already has default values for these 2 param_types, but I want there to be the option to specify the email address, days, both, or neither. If anything, I mainly need to know how to retrieve the email address as the online examples don’t work for my situation.

Asked By: finman69

||

Answers:

So, in this case, you can just use a regex to extract what is between the brackets, then split on a colon character to get the param type and param, something like:

[s.split(":") for s in re.findall(r"[(.+?)]", string)]

So your code would be something like:

import re

string = "[email:[email protected]][days:90]"
type_and_param_pairs = [s.split(":") for s in re.findall(r"[(.+?)]", string)]
for param_type, param in type_and_param_pairs:
    if param_type == "email":
        # do something
    elif param_type == "days":
        # do something else
    ...
Answered By: juanpa.arrivillaga

You could use

[([^:]*):([^]]*)]*]

That regular expression matches any [attribute:value] substring, with subexpression for the attribute and the value part.

It searches for a [, then for a few chars that are not :, then for a :, then for a few chars that are not ], then for a ].
And it encloses the part between [ and : and the one between : and ] into parenthesis.

So that if you use findall on this regex, it returns a list of all pairs [attribute:value] found in the string.

Example:

import re

string = "[email:[email protected]] [days:90]"
pairs=re.findall(r'[([^:]*):([^]]*)]*]', s)
# pairs = [('email', '[email protected]'), ('days', '90')]
for attr,val in pairs:
    if attr=='email':
        doSomethingWithEmail(val)
    elif attr=='days':
        doSomethingWidhDays(val)
Answered By: chrslg
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.