regex match returning none
Question:
I know similar questions like this have already been asked on the platform but I checked them and did not find the help I needed.
I have some String such as :
path = "most popular data structure
in OOP lists/5438_133195_9917949_1218833?
povid=racking these benchmarks"
path = "activewear/2356_15890_9397775?
povid=ApparelNavpopular data structure
you to be informed when a regression"
I have a function :
def extract_id(path):
pattern = re.compile(r"([0-9]+(_[0-9]+)+)", re.IGNORECASE)
return pattern.match(path)
The expected results are 5438_133195_9917949_1218833 and 2356_15890_9397775. I tested the function online, and it seems to produce the expected result but my it’s returning None in my app. What am I doing wrong?
Thanks.
Answers:
match
is used to match an entire statement. What you want is search
. You have to use group
to retrieve matches from a search
. You don’t need re.IGNORECASE
if you are looking for characters that don’t have a case. You should compile
your regex only once. Compiling a pattern that never changes, every time a function is called, is not optimal.
You could simplify your expression to ((d+_?)+)?
, which will find a repeating sequence of one or more d
igits that may be followed by an underscore, and is ultimately ended with a question mark
example:
import re
#do this once
pathid = re.compile(r'((d+_?)+)?')
def extract_id(path:str) -> str:
if m := pathid.search(path): #make sure there is a match
return m.group(1) #return match from group 1 `((d+_?)+)`
return None #no match
#use
path = "thingsbefore/5438_133195_9917949_1218833?thingsafter"
result = extract_id(path)
#proof
print(result) #5438_133195_9917949_1218833
Your id comes after the last /
and before the ?
. The below solution will likely be much faster. This doesn’t search by pattern, it prunes by position.
def extract_id(path:str) -> str:
#right of the last / to left of the ?
return path.split('/')[-1].split('?')[0]
#use
path = "thingsbefore/5438_133195_9917949_1218833?thingsafter"
result = extract_id(path)
#proof
print(result) #5438_133195_9917949_1218833
You don’t need any capture groups, you can get a match only and return .group()
using re.seach:
bd+(?:_d+)+b
b
A word boundary
d+
Match 1+ digits
(?:_d+)+
Repeat 1+ times _
and 1+ digits
b
A word boundary
import re
path = "most popular data structure in OOP lists/5438_133195_9917949_1218833? povid=racking these benchmarks"
pattern = re.compile(r"bd+(?:_d+)+b")
def extract_id(path):
return pattern.search(path).group()
print(extract_id(path))
Output
5438_133195_9917949_1218833
I know similar questions like this have already been asked on the platform but I checked them and did not find the help I needed.
I have some String such as :
path = "most popular data structure
in OOP lists/5438_133195_9917949_1218833?
povid=racking these benchmarks"
path = "activewear/2356_15890_9397775?
povid=ApparelNavpopular data structure
you to be informed when a regression"
I have a function :
def extract_id(path):
pattern = re.compile(r"([0-9]+(_[0-9]+)+)", re.IGNORECASE)
return pattern.match(path)
The expected results are 5438_133195_9917949_1218833 and 2356_15890_9397775. I tested the function online, and it seems to produce the expected result but my it’s returning None in my app. What am I doing wrong?
Thanks.
match
is used to match an entire statement. What you want is search
. You have to use group
to retrieve matches from a search
. You don’t need re.IGNORECASE
if you are looking for characters that don’t have a case. You should compile
your regex only once. Compiling a pattern that never changes, every time a function is called, is not optimal.
You could simplify your expression to ((d+_?)+)?
, which will find a repeating sequence of one or more d
igits that may be followed by an underscore, and is ultimately ended with a question mark
example:
import re
#do this once
pathid = re.compile(r'((d+_?)+)?')
def extract_id(path:str) -> str:
if m := pathid.search(path): #make sure there is a match
return m.group(1) #return match from group 1 `((d+_?)+)`
return None #no match
#use
path = "thingsbefore/5438_133195_9917949_1218833?thingsafter"
result = extract_id(path)
#proof
print(result) #5438_133195_9917949_1218833
Your id comes after the last /
and before the ?
. The below solution will likely be much faster. This doesn’t search by pattern, it prunes by position.
def extract_id(path:str) -> str:
#right of the last / to left of the ?
return path.split('/')[-1].split('?')[0]
#use
path = "thingsbefore/5438_133195_9917949_1218833?thingsafter"
result = extract_id(path)
#proof
print(result) #5438_133195_9917949_1218833
You don’t need any capture groups, you can get a match only and return .group()
using re.seach:
bd+(?:_d+)+b
b
A word boundaryd+
Match 1+ digits(?:_d+)+
Repeat 1+ times_
and 1+ digitsb
A word boundary
import re
path = "most popular data structure in OOP lists/5438_133195_9917949_1218833? povid=racking these benchmarks"
pattern = re.compile(r"bd+(?:_d+)+b")
def extract_id(path):
return pattern.search(path).group()
print(extract_id(path))
Output
5438_133195_9917949_1218833