Python regex for number with or without decimals using a dot or comma as separator?
Question:
I’m just learning regex and now I’m trying to match a number which more or less represents this:
[zero or more numbers][possibly a dot or comma][zero or more numbers]
No dot or comma is also okay. So it should match the following:
1
123
123.
123.4
123.456
.456
123, # From here it's the same but with commas instead of dot separators
123,4
123,456
,456
But it should not match the following:
0.,1
0a,1
0..1
1.1.2
100,000.99 # I know this and the one below are valid in many languages, but I simply want to reject these
100.000,99
So far I’ve come up with [0-9]*[.,][0-9]*
, but it doesn’t seem to work so well:
>>> import re
>>> r = re.compile("[0-9]*[.,][0-9]*")
>>> if r.match('0.1.'): print 'it matches!'
...
it matches!
>>> if r.match('0.abc'): print 'it matches!'
...
it matches!
I have the feeling I’m doing two things wrong: I don’t use match correctly AND my regex is not correct. Could anybody enlighten me on what I’m doing wrong? All tips are welcome!
Answers:
You need to make [.,]
part as optional by adding ?
after that character class and also don’t forget to add anchors. ^
asserts that we are at the start and $
asserts that we are at the end.
^d*[.,]?d*$
>>> import re
>>> r = re.compile(r"^d*[.,]?d*$")
>>> if r.match('0.1.'): print 'it matches!'
...
>>> if r.match('0.abc'): print 'it matches!'
...
>>> if r.match('0.'): print 'it matches!'
...
it matches!
If you don’t want to allow a single comma or dot then use a lookahead.
^(?=.*?d)d*[.,]?d*$
How about:
(?:^|[^d,.])d*(?:[,.]d+)?(?:$|[^d,.])
If you don’t want empty string:
(?:^|[^d,.])d+(?:[,.]d+)?(?:$|[^d,.])
The problem is that you are asking for a partial match, as long as it starts at the beginning.
One way around this is to end the regex in Z
(optionally $
).
Z
Matches only at the end of the string.
and the other is to use re.fullmatch
instead.
import re
help(re.match)
#>>> Help on function match in module re:
#>>>
#>>> match(pattern, string, flags=0)
#>>> Try to apply the pattern at the start of the string, returning
#>>> a match object, or None if no match was found.
#>>>
vs
import re
help(re.fullmatch)
#>>> Help on function fullmatch in module re:
#>>>
#>>> fullmatch(pattern, string, flags=0)
#>>> Try to apply the pattern to all of the string, returning
#>>> a match object, or None if no match was found.
#>>>
Note that fullmatch
is new in 3.4.
You should also make the [.,]
part optional, so append a ?
to that.
'?'
Causes the resulting RE to match 0 or 1 repetitions of the preceding RE. ab? will match either ‘a’ or ‘ab’.
Eg.
import re
r = re.compile("[0-9]*[.,]?[0-9]*Z")
bool(r.match('0.1.'))
#>>> False
bool(r.match('0.abc'))
#>>> False
bool(r.match('0123'))
#>>> True
^(?=.?d)(?!(.*?.){2,})[d.]+$|^(?=.?d)(?!(.*?,){2,})[d,]+$
Try this.Validates all cases.See demo.
Some ideas for verifying a non-empty match:
1.) Use of a lookahead to check for at least one digit:
^(?=.?d)d*[.,]?d*$
- From
^
start to $
end.
(?=.?d)
matches if ,1
, 1
,…
d*[.,]?d*
Allowed sequence: d*
any amount of digits, followed by one [.,]
, d*
- Note, that the first
.
inside the lookahead is a metacharacter that stands for any character, whereas the other inside the character class [.,]
matches a literal .
Instead of the positive lookahead also a negative one could be used: ^(?!D*$)d*[.,]?d*$
2.) Use 2 different patterns:
^(?:d+[.,]d*|[.,]?d+)$
(?:
Starts a non-capture group for the alternation.
d+[.,]d*
for matching 1.
, 1,1
,… |
OR
[.,]?d+
for matching 1
, ,1
…
Your regex would work fine if you just add the ^ at the front and the $ at the back so that system knows how your string would begin and end.
Try this
^[0-9]*[.,]{0,1}[0-9]*$
import re
checklist = ['1', '123', '123.', '123.4', '123.456', '.456', '123,', '123,4', '123,456', ',456', '0.,1', '0a,1', '0..1', '1.1.2', '100,000.99', '100.000,99', '0.1.', '0.abc']
pat = re.compile(r'^[0-9]*[.,]{0,1}[0-9]*$')
for c in checklist:
if pat.match(c):
print '%s : it matches' % (c)
else:
print '%s : it does not match' % (c)
1 : it matches
123 : it matches
123. : it matches
123.4 : it matches
123.456 : it matches
.456 : it matches
123, : it matches
123,4 : it matches
123,456 : it matches
,456 : it matches
0.,1 : it does not match
0a,1 : it does not match
0..1 : it does not match
1.1.2 : it does not match
100,000.99 : it does not match
100.000,99 : it does not match
0.1. : it does not match
0.abc : it does not match
If the two decimal places are mandatory, you could use the following:
^((d){1,3},*){1,5}.(d){2}$
This will match the following pattern:
- 1.00
- 10.00
- 100.00
- 1,000.00
- 10,000.00
- 100,000.00
- 1,000,000.00
More generic method can be as follows
import re
r=re.compile(r"^dd*[,]?d*[,]?d*[.,]?d*d$")
print(bool(r.match('100,000.00')))
This will match the following pattern:
- This will match the following pattern:
- 100
- 1,000
- 100.00
- 1,000.00
- 1,00,000
- 1,00,000.00
-
This will not match the following pattern:
- .100
- ..100
- 100.100.00
- ,100
- 100,
- 100.
ok, the regex that I use to check for integers with thousands seperators, that may or may not include a decimal part, and then one without a decimal part, goes like this:
(this is python 3.10.8 I’m using, not sure which version regex, thoough.)
r"^(?:-)?(d{1,3}(?:(?:.(?=d.+,?)|,(?=d.+.?))d{3})*(.d+)?|d+.d+|d+)$"
I hope this helps.
I’m just learning regex and now I’m trying to match a number which more or less represents this:
[zero or more numbers][possibly a dot or comma][zero or more numbers]
No dot or comma is also okay. So it should match the following:
1
123
123.
123.4
123.456
.456
123, # From here it's the same but with commas instead of dot separators
123,4
123,456
,456
But it should not match the following:
0.,1
0a,1
0..1
1.1.2
100,000.99 # I know this and the one below are valid in many languages, but I simply want to reject these
100.000,99
So far I’ve come up with [0-9]*[.,][0-9]*
, but it doesn’t seem to work so well:
>>> import re
>>> r = re.compile("[0-9]*[.,][0-9]*")
>>> if r.match('0.1.'): print 'it matches!'
...
it matches!
>>> if r.match('0.abc'): print 'it matches!'
...
it matches!
I have the feeling I’m doing two things wrong: I don’t use match correctly AND my regex is not correct. Could anybody enlighten me on what I’m doing wrong? All tips are welcome!
You need to make [.,]
part as optional by adding ?
after that character class and also don’t forget to add anchors. ^
asserts that we are at the start and $
asserts that we are at the end.
^d*[.,]?d*$
>>> import re
>>> r = re.compile(r"^d*[.,]?d*$")
>>> if r.match('0.1.'): print 'it matches!'
...
>>> if r.match('0.abc'): print 'it matches!'
...
>>> if r.match('0.'): print 'it matches!'
...
it matches!
If you don’t want to allow a single comma or dot then use a lookahead.
^(?=.*?d)d*[.,]?d*$
How about:
(?:^|[^d,.])d*(?:[,.]d+)?(?:$|[^d,.])
If you don’t want empty string:
(?:^|[^d,.])d+(?:[,.]d+)?(?:$|[^d,.])
The problem is that you are asking for a partial match, as long as it starts at the beginning.
One way around this is to end the regex in Z
(optionally $
).
Z
Matches only at the end of the string.
and the other is to use re.fullmatch
instead.
import re
help(re.match)
#>>> Help on function match in module re:
#>>>
#>>> match(pattern, string, flags=0)
#>>> Try to apply the pattern at the start of the string, returning
#>>> a match object, or None if no match was found.
#>>>
vs
import re
help(re.fullmatch)
#>>> Help on function fullmatch in module re:
#>>>
#>>> fullmatch(pattern, string, flags=0)
#>>> Try to apply the pattern to all of the string, returning
#>>> a match object, or None if no match was found.
#>>>
Note that fullmatch
is new in 3.4.
You should also make the [.,]
part optional, so append a ?
to that.
'?'
Causes the resulting RE to match 0 or 1 repetitions of the preceding RE. ab? will match either ‘a’ or ‘ab’.
Eg.
import re
r = re.compile("[0-9]*[.,]?[0-9]*Z")
bool(r.match('0.1.'))
#>>> False
bool(r.match('0.abc'))
#>>> False
bool(r.match('0123'))
#>>> True
^(?=.?d)(?!(.*?.){2,})[d.]+$|^(?=.?d)(?!(.*?,){2,})[d,]+$
Try this.Validates all cases.See demo.
Some ideas for verifying a non-empty match:
1.) Use of a lookahead to check for at least one digit:
^(?=.?d)d*[.,]?d*$
- From
^
start to$
end. (?=.?d)
matches if,1
,1
,…d*[.,]?d*
Allowed sequence:d*
any amount of digits, followed by one[.,]
,d*
- Note, that the first
.
inside the lookahead is a metacharacter that stands for any character, whereas the other inside the character class[.,]
matches a literal.
Instead of the positive lookahead also a negative one could be used: ^(?!D*$)d*[.,]?d*$
2.) Use 2 different patterns:
^(?:d+[.,]d*|[.,]?d+)$
(?:
Starts a non-capture group for the alternation.d+[.,]d*
for matching1.
,1,1
,…|
OR[.,]?d+
for matching1
,,1
…
Your regex would work fine if you just add the ^ at the front and the $ at the back so that system knows how your string would begin and end.
Try this
^[0-9]*[.,]{0,1}[0-9]*$
import re
checklist = ['1', '123', '123.', '123.4', '123.456', '.456', '123,', '123,4', '123,456', ',456', '0.,1', '0a,1', '0..1', '1.1.2', '100,000.99', '100.000,99', '0.1.', '0.abc']
pat = re.compile(r'^[0-9]*[.,]{0,1}[0-9]*$')
for c in checklist:
if pat.match(c):
print '%s : it matches' % (c)
else:
print '%s : it does not match' % (c)
1 : it matches
123 : it matches
123. : it matches
123.4 : it matches
123.456 : it matches
.456 : it matches
123, : it matches
123,4 : it matches
123,456 : it matches
,456 : it matches
0.,1 : it does not match
0a,1 : it does not match
0..1 : it does not match
1.1.2 : it does not match
100,000.99 : it does not match
100.000,99 : it does not match
0.1. : it does not match
0.abc : it does not match
If the two decimal places are mandatory, you could use the following:
^((d){1,3},*){1,5}.(d){2}$
This will match the following pattern:
- 1.00
- 10.00
- 100.00
- 1,000.00
- 10,000.00
- 100,000.00
- 1,000,000.00
More generic method can be as follows
import re
r=re.compile(r"^dd*[,]?d*[,]?d*[.,]?d*d$")
print(bool(r.match('100,000.00')))
This will match the following pattern:
- This will match the following pattern:
- 100
- 1,000
- 100.00
- 1,000.00
- 1,00,000
- 1,00,000.00
-
This will not match the following pattern:
- .100
- ..100
- 100.100.00
- ,100
- 100,
- 100.
ok, the regex that I use to check for integers with thousands seperators, that may or may not include a decimal part, and then one without a decimal part, goes like this:
(this is python 3.10.8 I’m using, not sure which version regex, thoough.)
r"^(?:-)?(d{1,3}(?:(?:.(?=d.+,?)|,(?=d.+.?))d{3})*(.d+)?|d+.d+|d+)$"
I hope this helps.