How to match my string? Do I need negative lookbehind?
Question:
Let’s assume we have the following string:
This thing costs $5000.
I’m trying to match up $5000 with negative lookbehind:
(?<!([:;]))$?([0-9]+)
So that it doesn’t find a match if it has ";" or ‘:’ behind $5000, eg. ;$5000 or ;5000.
First string:
This thing costs $5000
or
5000
Desired output:
$5000
or
5000
Second string:
This thing costs ;$5000
or
;5000
Desired output:
None
Answers:
Sometimes, matching what you do want rather than what you don’t want is easier. As far as I can tell, what you’re really looking for is an integer that optionally start with $
. But, you still want to capture the $
if it’s there. The ;
and :
are just red-herrings.
import re
values = ['This thing costs $5000.', # $5000
'This thing costs ;$5000.', # None
'This thing costs 5000.', # 5000
'This thing costs ;5000.', # None
'This thing costs8000'] # None
pattern = r'.*s($?d+)'
for value in values:
# If a match is made, we want group 1 from that match.
if match := re.match(pattern, value):
print(match.group(1))
else:
print(match)
Output:
$5000
None
5000
None
None
See https://regexr.com/6sqbs for an in-depth explanation of my pattern .*s($?d+)
Your implementation is great but there’s a single flaw: you can match from the middle of the digits or from the $
sign.
Add d
and $
to the negative lookback and it’ll work:
(?<![;:d$])$?([0-9]+)
Examples:
>>> re.findall("(?<![;:d$])$?([0-9]+)", "This thing costs ;$5000")
[]
>>> re.findall("(?<![;:d$])$?([0-9]+)", "This thing costs $5000")
['5000']
Keep in mind I do suggest matching the number instead of dealing with negative lookbacks like so:
re.findall(r"s$?([0-9]+)", "This thing costs ;$5000")
Let’s assume we have the following string:
This thing costs $5000.
I’m trying to match up $5000 with negative lookbehind:
(?<!([:;]))$?([0-9]+)
So that it doesn’t find a match if it has ";" or ‘:’ behind $5000, eg. ;$5000 or ;5000.
First string:
This thing costs $5000
or
5000
Desired output:
$5000
or
5000
Second string:
This thing costs ;$5000
or
;5000
Desired output:
None
Sometimes, matching what you do want rather than what you don’t want is easier. As far as I can tell, what you’re really looking for is an integer that optionally start with $
. But, you still want to capture the $
if it’s there. The ;
and :
are just red-herrings.
import re
values = ['This thing costs $5000.', # $5000
'This thing costs ;$5000.', # None
'This thing costs 5000.', # 5000
'This thing costs ;5000.', # None
'This thing costs8000'] # None
pattern = r'.*s($?d+)'
for value in values:
# If a match is made, we want group 1 from that match.
if match := re.match(pattern, value):
print(match.group(1))
else:
print(match)
Output:
$5000
None
5000
None
None
See https://regexr.com/6sqbs for an in-depth explanation of my pattern .*s($?d+)
Your implementation is great but there’s a single flaw: you can match from the middle of the digits or from the $
sign.
Add d
and $
to the negative lookback and it’ll work:
(?<![;:d$])$?([0-9]+)
Examples:
>>> re.findall("(?<![;:d$])$?([0-9]+)", "This thing costs ;$5000")
[]
>>> re.findall("(?<![;:d$])$?([0-9]+)", "This thing costs $5000")
['5000']
Keep in mind I do suggest matching the number instead of dealing with negative lookbacks like so:
re.findall(r"s$?([0-9]+)", "This thing costs ;$5000")