Python regex for number with or without decimals using a dot or comma as separator?

Question:

I’m just learning regex and now I’m trying to match a number which more or less represents this:

[zero or more numbers][possibly a dot or comma][zero or more numbers]

No dot or comma is also okay. So it should match the following:

1
123
123.
123.4
123.456
.456
123,  # From here it's the same but with commas instead of dot separators
123,4
123,456
,456

But it should not match the following:

0.,1
0a,1
0..1
1.1.2
100,000.99  # I know this and the one below are valid in many languages, but I simply want to reject these
100.000,99

So far I’ve come up with [0-9]*[.,][0-9]*, but it doesn’t seem to work so well:

>>> import re
>>> r = re.compile("[0-9]*[.,][0-9]*")
>>> if r.match('0.1.'): print 'it matches!'
...
it matches!
>>> if r.match('0.abc'): print 'it matches!'
...
it matches!

I have the feeling I’m doing two things wrong: I don’t use match correctly AND my regex is not correct. Could anybody enlighten me on what I’m doing wrong? All tips are welcome!

Asked By: kramer65

||

Answers:

You need to make [.,] part as optional by adding ? after that character class and also don’t forget to add anchors. ^ asserts that we are at the start and $ asserts that we are at the end.

^d*[.,]?d*$

DEMO

>>> import re
>>> r = re.compile(r"^d*[.,]?d*$")
>>> if r.match('0.1.'): print 'it matches!'
... 
>>> if r.match('0.abc'): print 'it matches!'
... 
>>> if r.match('0.'): print 'it matches!'
... 
it matches!

If you don’t want to allow a single comma or dot then use a lookahead.

^(?=.*?d)d*[.,]?d*$

DEMO

Answered By: Avinash Raj

How about:

(?:^|[^d,.])d*(?:[,.]d+)?(?:$|[^d,.])

If you don’t want empty string:

(?:^|[^d,.])d+(?:[,.]d+)?(?:$|[^d,.])
Answered By: Toto

The problem is that you are asking for a partial match, as long as it starts at the beginning.

One way around this is to end the regex in Z (optionally $).

Z Matches only at the end of the string.

and the other is to use re.fullmatch instead.

import re
help(re.match)
#>>> Help on function match in module re:
#>>>
#>>> match(pattern, string, flags=0)
#>>>     Try to apply the pattern at the start of the string, returning
#>>>     a match object, or None if no match was found.
#>>>

vs

import re
help(re.fullmatch)
#>>> Help on function fullmatch in module re:
#>>>
#>>> fullmatch(pattern, string, flags=0)
#>>>     Try to apply the pattern to all of the string, returning
#>>>     a match object, or None if no match was found.
#>>>

Note that fullmatch is new in 3.4.

You should also make the [.,] part optional, so append a ? to that.

'?' Causes the resulting RE to match 0 or 1 repetitions of the preceding RE. ab? will match either ‘a’ or ‘ab’.

Eg.

import re
r = re.compile("[0-9]*[.,]?[0-9]*Z")

bool(r.match('0.1.'))
#>>> False

bool(r.match('0.abc'))
#>>> False

bool(r.match('0123'))
#>>> True
Answered By: Veedrac
^(?=.?d)(?!(.*?.){2,})[d.]+$|^(?=.?d)(?!(.*?,){2,})[d,]+$

Try this.Validates all cases.See demo.

http://regex101.com/r/hS3dT7/9

Answered By: vks

Some ideas for verifying a non-empty match:

1.) Use of a lookahead to check for at least one digit:

^(?=.?d)d*[.,]?d*$
  • From ^ start to $ end.
  • (?=.?d) matches if ,1, 1,…
  • d*[.,]?d* Allowed sequence: d* any amount of digits, followed by one [.,], d*
  • Note, that the first . inside the lookahead is a metacharacter that stands for any character, whereas the other inside the character class [.,] matches a literal .

Instead of the positive lookahead also a negative one could be used: ^(?!D*$)d*[.,]?d*$

Test at regex101, Regex FAQ


2.) Use 2 different patterns:

^(?:d+[.,]d*|[.,]?d+)$
  • (?: Starts a non-capture group for the alternation.
  • d+[.,]d* for matching 1., 1,1,… | OR
  • [.,]?d+ for matching 1, ,1

Test at regex101

Answered By: Jonny 5

Your regex would work fine if you just add the ^ at the front and the $ at the back so that system knows how your string would begin and end.

Try this

^[0-9]*[.,]{0,1}[0-9]*$

import re

checklist = ['1', '123', '123.', '123.4', '123.456', '.456', '123,', '123,4', '123,456', ',456', '0.,1', '0a,1', '0..1', '1.1.2', '100,000.99', '100.000,99', '0.1.', '0.abc']

pat = re.compile(r'^[0-9]*[.,]{0,1}[0-9]*$')

for c in checklist:
   if pat.match(c):
      print '%s : it matches' % (c)
   else:
      print '%s : it does not match' % (c)

1 : it matches
123 : it matches
123. : it matches
123.4 : it matches
123.456 : it matches
.456 : it matches
123, : it matches
123,4 : it matches
123,456 : it matches
,456 : it matches
0.,1 : it does not match
0a,1 : it does not match
0..1 : it does not match
1.1.2 : it does not match
100,000.99 : it does not match
100.000,99 : it does not match
0.1. : it does not match
0.abc : it does not match
Answered By: thisisshantzz

If the two decimal places are mandatory, you could use the following:

^((d){1,3},*){1,5}.(d){2}$

This will match the following pattern:

  • 1.00
  • 10.00
  • 100.00
  • 1,000.00
  • 10,000.00
  • 100,000.00
  • 1,000,000.00
Answered By: Irshu

More generic method can be as follows

import re
r=re.compile(r"^dd*[,]?d*[,]?d*[.,]?d*d$")
print(bool(r.match('100,000.00')))

This will match the following pattern:

  1. This will match the following pattern:
    • 100
    • 1,000
    • 100.00
    • 1,000.00
    • 1,00,000
    • 1,00,000.00
  2. This will not match the following pattern:

    • .100
    • ..100
    • 100.100.00
    • ,100
    • 100,
    • 100.
Answered By: Safvan CK

ok, the regex that I use to check for integers with thousands seperators, that may or may not include a decimal part, and then one without a decimal part, goes like this:

(this is python 3.10.8 I’m using, not sure which version regex, thoough.)

r"^(?:-)?(d{1,3}(?:(?:.(?=d.+,?)|,(?=d.+.?))d{3})*(.d+)?|d+.d+|d+)$"

I hope this helps.

Answered By: kaptblasto
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.