Regular expression to return all characters between two special characters
Question:
How would I go about using regx to return all characters between two brackets.
Here is an example:
foobar['infoNeededHere']ddd
needs to return infoNeededHere
I found a regex to do it between curly brackets but all attempts at making it work with square brackets have failed. Here is that regex: (?<={)[^}]*(?=})
and here is my attempt to hack it
(?<=[)[^}]*(?=])
Final Solution:
import re
str = "foobar['InfoNeeded'],"
match = re.match(r"^.*['(.*)'].*$",str)
print match.group(1)
Answers:
^.*['(.*)'].*$
will match a line and capture what you want in a group.
You have to escape the [
and ]
with
The documentation at the rubular.com proof link will explain how the expression is formed.
If there’s only one of these [.....]
tokens per line, then you don’t need to use regular expressions at all:
In [7]: mystring = "Bacon, [eggs], and spam"
In [8]: mystring[ mystring.find("[")+1 : mystring.find("]") ]
Out[8]: 'eggs'
If there’s more than one of these per line, then you’ll need to modify Jarrod’s regex ^.*['(.*)'].*$
to match multiple times per line, and to be non greedy. (Use the .*?
quantifier instead of the .*
quantifier.)
In [15]: mystring = "[Bacon], [eggs], and [spam]."
In [16]: re.findall(r"[(.*?)]",mystring)
Out[16]: ['Bacon', 'eggs', 'spam']
If you’re new to REG(gular) EX(pressions) you learn about them at Python Docs. Or, if you want a gentler introduction, you can check out the HOWTO. They use Perl-style syntax.
Regex
The expression that you need is .*?[(.*)].*
. The group that you want will be 1
.
– .*?
: .
matches any character but a newline. *
is a meta-character and means Repeat this 0 or more times. ?
makes the *
non-greedy, i.e., .
will match up as few chars as possible before hitting a ‘[‘.
– [
:
escapes special meta-characters, which in this case, is [
. If we didn’t do that, [
would do something very weird instead.
– (.*)
: Parenthesis ‘groups’ whatever is inside it and you can later retrieve the groups by their numeric IDs or names (if they’re given one).
– ].*
: You should know enough by now to know what this means.
Implementation
First, import the re
module — it’s not a built-in — to where-ever you want to use the expression.
Then, use re.search(regex_pattern, string_to_be_tested)
to search for the pattern in the string to be tested. This will return a MatchObject
which you can store to a temporary variable. You should then call it’s group()
method and pass 1 as an argument (to see the ‘Group 1’ we captured using parenthesis earlier). I should now look like:
>>> import re
>>> pat = r'.*?[(.*)].*' #See Note at the bottom of the answer
>>> s = "foobar['infoNeededHere']ddd"
>>> match = re.search(pat, s)
>>> match.group(1)
"'infoNeededHere'"
An Alternative
You can also use findall()
to find all the non-overlapping matches by modifying the regex to (?>=[).+?(?=])
.
– (?<=[)
: (?<=)
is called a look-behind assertion and checks for an expression preceding the actual match.
– .+?
: +
is just like *
except that it matches one or more repititions. It is made non-greedy by ?
.
– (?=])
: (?=)
is a look-ahead assertion and checks for an expression following the match w/o capturing it.
Your code should now look like:
>>> import re
>>> pat = r'(?<=[).+?(?=])' #See Note at the bottom of the answer
>>> s = "foobar['infoNeededHere']ddd[andHere] [andOverHereToo[]"
>>> re.findall(pat, s)
["'infoNeededHere'", 'andHere', 'andOverHereToo[']
Note: Always use raw Python strings by adding an ‘r’ before the string (E.g.: r'blah blah blah'
).
10x for reading! I wrote this answer when there were no accepted ones yet, but by the time I finished it, 2 ore came up and one got accepted. 🙁 x<
How would I go about using regx to return all characters between two brackets.
Here is an example:
foobar['infoNeededHere']ddd
needs to return infoNeededHere
I found a regex to do it between curly brackets but all attempts at making it work with square brackets have failed. Here is that regex: (?<={)[^}]*(?=})
and here is my attempt to hack it
(?<=[)[^}]*(?=])
Final Solution:
import re
str = "foobar['InfoNeeded'],"
match = re.match(r"^.*['(.*)'].*$",str)
print match.group(1)
^.*['(.*)'].*$
will match a line and capture what you want in a group.
You have to escape the [
and ]
with
The documentation at the rubular.com proof link will explain how the expression is formed.
If there’s only one of these [.....]
tokens per line, then you don’t need to use regular expressions at all:
In [7]: mystring = "Bacon, [eggs], and spam"
In [8]: mystring[ mystring.find("[")+1 : mystring.find("]") ]
Out[8]: 'eggs'
If there’s more than one of these per line, then you’ll need to modify Jarrod’s regex ^.*['(.*)'].*$
to match multiple times per line, and to be non greedy. (Use the .*?
quantifier instead of the .*
quantifier.)
In [15]: mystring = "[Bacon], [eggs], and [spam]."
In [16]: re.findall(r"[(.*?)]",mystring)
Out[16]: ['Bacon', 'eggs', 'spam']
If you’re new to REG(gular) EX(pressions) you learn about them at Python Docs. Or, if you want a gentler introduction, you can check out the HOWTO. They use Perl-style syntax.
Regex
The expression that you need is .*?[(.*)].*
. The group that you want will be 1
.
– .*?
: .
matches any character but a newline. *
is a meta-character and means Repeat this 0 or more times. ?
makes the *
non-greedy, i.e., .
will match up as few chars as possible before hitting a ‘[‘.
– [
: escapes special meta-characters, which in this case, is
[
. If we didn’t do that, [
would do something very weird instead.
– (.*)
: Parenthesis ‘groups’ whatever is inside it and you can later retrieve the groups by their numeric IDs or names (if they’re given one).
– ].*
: You should know enough by now to know what this means.
Implementation
First, import the re
module — it’s not a built-in — to where-ever you want to use the expression.
Then, use re.search(regex_pattern, string_to_be_tested)
to search for the pattern in the string to be tested. This will return a MatchObject
which you can store to a temporary variable. You should then call it’s group()
method and pass 1 as an argument (to see the ‘Group 1’ we captured using parenthesis earlier). I should now look like:
>>> import re
>>> pat = r'.*?[(.*)].*' #See Note at the bottom of the answer
>>> s = "foobar['infoNeededHere']ddd"
>>> match = re.search(pat, s)
>>> match.group(1)
"'infoNeededHere'"
An Alternative
You can also use findall()
to find all the non-overlapping matches by modifying the regex to (?>=[).+?(?=])
.
– (?<=[)
: (?<=)
is called a look-behind assertion and checks for an expression preceding the actual match.
– .+?
: +
is just like *
except that it matches one or more repititions. It is made non-greedy by ?
.
– (?=])
: (?=)
is a look-ahead assertion and checks for an expression following the match w/o capturing it.
Your code should now look like:
>>> import re
>>> pat = r'(?<=[).+?(?=])' #See Note at the bottom of the answer
>>> s = "foobar['infoNeededHere']ddd[andHere] [andOverHereToo[]"
>>> re.findall(pat, s)
["'infoNeededHere'", 'andHere', 'andOverHereToo[']
Note: Always use raw Python strings by adding an ‘r’ before the string (E.g.: r'blah blah blah'
).
10x for reading! I wrote this answer when there were no accepted ones yet, but by the time I finished it, 2 ore came up and one got accepted. 🙁 x<