Exact search of a string that has parenthesis using regex
Question:
I am new to regexes.
I have the following string : n(941)n364nShacklen(941)nRivetn105nTop
Out of this string, I want to extract Rivet
and I already have (941)
as a string in a variable.
My thought process was like this:
- Find all the
(941)
s
- filter the results by checking if the string after
(941)
is followed by n, followed by a word, and ending with n
- I made a regex for the 2nd part:
n[ws'd-/.]+$n
.
The problem I am facing is that because of the parenthesis in (941)
the regex is taking 941 as a group. In the 3rd step the regex may be wrong, which I can fix later, but 1st I needed help in finding the 2nd (941)
so then I can apply the 3rd step on that.
PS.
- I know I can use python string methods like find and then loop over the searches, but I wanted to see if this can be done directly using regex only.
- I have tried the following regex:
(?:...)
, (941){1}
and the make regex literal character
like this (941)
with no useful results. Maybe I am using them wrong.
Just wanted to know if it is possible to be done using regex. Though it might be useful for others too or a good share for future viewers.
Thanks!
Answers:
Assuming:
- You want to avoid matching only digits;
- Want to match a substring made of word-characters (thus including possible digits);
Try to escape the variable and use it in the regular expression through f-string:
import re
s = 'n(941)n364nShacklen(941)nRivetn105nTop'
var1 = '(941)'
var2 = re.escape(var1)
m = re.findall(fr'{var2}n(?!d+n)(w+)', s)[0]
print(m)
Prints:
Rivet
If you have text in a variable that should be matched exactly, use re.escape()
to escape it when substituting into the regexp.
s = 'n(941)n364nShacklen(941)nRivetn105nTop'
num = '(941)'
re.findall(rf'(?<=n{re.escape(num)}n)[ws'd-/.]+(?=n)', s)
This puts (941)n
in a lookbehind, so it’s not included in the match. This avoids a problem with the n
at the end of one match overlapping with the n
at the beginning of the next.
I am new to regexes.
I have the following string : n(941)n364nShacklen(941)nRivetn105nTop
Out of this string, I want to extract Rivet
and I already have (941)
as a string in a variable.
My thought process was like this:
- Find all the
(941)
s - filter the results by checking if the string after
(941)
is followed by n, followed by a word, and ending with n - I made a regex for the 2nd part:
n[ws'd-/.]+$n
.
The problem I am facing is that because of the parenthesis in (941)
the regex is taking 941 as a group. In the 3rd step the regex may be wrong, which I can fix later, but 1st I needed help in finding the 2nd (941)
so then I can apply the 3rd step on that.
PS.
- I know I can use python string methods like find and then loop over the searches, but I wanted to see if this can be done directly using regex only.
- I have tried the following regex:
(?:...)
,(941){1}
and the make regex literal characterlike this
(941)
with no useful results. Maybe I am using them wrong.
Just wanted to know if it is possible to be done using regex. Though it might be useful for others too or a good share for future viewers.
Thanks!
Assuming:
- You want to avoid matching only digits;
- Want to match a substring made of word-characters (thus including possible digits);
Try to escape the variable and use it in the regular expression through f-string:
import re
s = 'n(941)n364nShacklen(941)nRivetn105nTop'
var1 = '(941)'
var2 = re.escape(var1)
m = re.findall(fr'{var2}n(?!d+n)(w+)', s)[0]
print(m)
Prints:
Rivet
If you have text in a variable that should be matched exactly, use re.escape()
to escape it when substituting into the regexp.
s = 'n(941)n364nShacklen(941)nRivetn105nTop'
num = '(941)'
re.findall(rf'(?<=n{re.escape(num)}n)[ws'd-/.]+(?=n)', s)
This puts (941)n
in a lookbehind, so it’s not included in the match. This avoids a problem with the n
at the end of one match overlapping with the n
at the beginning of the next.