Python Regex: Find integer with possible zeros after comma

Question:

I have the following case:

Test (2.00001) Test (2.000) Test 2.1 Test (2,0001) Test 2,000 Test 2,1000 test 2

I try to use regex to find only the integers:

  1. 2.000
  2. 2,000
  3. 2

but not the other float numbers.
I tried different things:

re.search('(?<![0-9.])2(?![.,]?[1-9])(?=[.,]*[0]*)(?![1-9]),...)

but this returns true for:

  1. 2.00001
  2. 2.000
  3. 2,000
  4. 2,0001
  5. 2

What have I to do?

UPDATE
I have updated the question and it should also find an integer without any comma and point, too (2).

Asked By: Code Pope

||

Answers:

I would use:

import re

text = 'Test (2.00001) Test (2.000) Test 2.1 Test (2,0001) Test 2,000 Test 2,1000'

re.findall(r'(d+[.,]0+)(?!d)', text)

Output:

['2.000', '2,000']

Regex:

(        # start capturing
d+      # match digit(s)
[.,]     # match . or ,
0+       # match one or more zeros
)        # stop capturing
(?!d)   # ensure the last zero is not followed by a digit

regex demo

If you also want to match "intergers" alone, surrounded by spaces or parentheses/brackets:

import re

text = 'Test (2.00001) Test (2.000) Test 2.1 Test (2,0001) Test 2,000 Test 2,1000 2'

re.findall(r'(?:^|[(s[])(d+(?:[.,]0+(?!d))?)(?=[]s)]|$)', text)

Regex:

(?:^|[(s[])      # match the start of string or [ or ( or space
(                 # start capturing
d+               # match digit(s)
(?:[.,]0+(?!d))? # optionally match . or , with only zeros
)                 # stop capturing
(?=[]s)]|$)      # match the end of string or ] or ) or space

regex demo

Answered By: mozway

Without the need for regex, you can also consider using is_integer() after trying to conver the values into their respective numeric formats. While a little bit harder to read, it removes the need for regex and should be robust for further use cases given the string structure you provide:

[x for x in string.split() if float((pd.to_numeric(x.replace(r'(','').replace(r')','').replace(r',','.'),errors='coerce'))).is_integer()]

Returning the former values in the list:

['(2.000)', '2,000', '2']

Or if you’d like them cleaned:

[x for x in string.replace(r'(','').replace(r')','').replace(r',','.').split() if float((pd.to_numeric(x,errors='coerce'))).is_integer()]

Returning:

['2.000', '2.000', '2']
Answered By: Celius Stingher

This should be easy – just get a number and check "is this an int value?".
Meaby something like this…

import re

text = 'Test (2.00001) Test (2.000) Test 2.1 Test (2,0001) Test 2,000 Test 2,1000 test 2'
out_ints = []
for x in  re.findall(r'([0-9.,]+)', text):
    possible_int = x.replace(',', '.')
    is_int = int(float(possible_int)) == float(possible_int)
    if is_int:
        out_ints.append(int(float(possible_int)))

print(out_ints)

Output:

[2, 2, 2]

Or am i missing something?

Answered By: RobertG

You can use

re.findall(r'b(?<!d[.,])d+(?:[.,]0+)?b(?![,.]d)', text)

See the regex demo. Details:

  • b – a word boundary
  • (?<!d[.,]) – no digit followed with . or , immediately on the left
  • d+ – one or more digits
  • (?:[.,]0+)? – an optional sequence of . or , and then one or more zeros
  • b – a word boundary
  • (?![,.]d) – no , or . and a digit allowed immediately to the right.

If you need to support thousand separators:

pattern = r'b(?<!d[.,])(?:d{1,3}(?:(?=([.,]))(?:1d{3})+)?|d{4,})(?:(?!1)[.,]0+)?b(?![,.]d)'
matches = [x.group() for x in re.finditer(pattern, text)]

See this regex demo.

Answered By: Wiktor Stribiżew
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.