Python Regex: Find integer with possible zeros after comma
Question:
I have the following case:
Test (2.00001) Test (2.000) Test 2.1 Test (2,0001) Test 2,000 Test 2,1000 test 2
I try to use regex to find only the integers:
2.000
2,000
2
but not the other float numbers.
I tried different things:
re.search('(?<![0-9.])2(?![.,]?[1-9])(?=[.,]*[0]*)(?![1-9]),...)
but this returns true for:
2.00001
2.000
2,000
2,0001
2
What have I to do?
UPDATE
I have updated the question and it should also find an integer without any comma and point, too (2
).
Answers:
I would use:
import re
text = 'Test (2.00001) Test (2.000) Test 2.1 Test (2,0001) Test 2,000 Test 2,1000'
re.findall(r'(d+[.,]0+)(?!d)', text)
Output:
['2.000', '2,000']
Regex:
( # start capturing
d+ # match digit(s)
[.,] # match . or ,
0+ # match one or more zeros
) # stop capturing
(?!d) # ensure the last zero is not followed by a digit
If you also want to match "intergers" alone, surrounded by spaces or parentheses/brackets:
import re
text = 'Test (2.00001) Test (2.000) Test 2.1 Test (2,0001) Test 2,000 Test 2,1000 2'
re.findall(r'(?:^|[(s[])(d+(?:[.,]0+(?!d))?)(?=[]s)]|$)', text)
Regex:
(?:^|[(s[]) # match the start of string or [ or ( or space
( # start capturing
d+ # match digit(s)
(?:[.,]0+(?!d))? # optionally match . or , with only zeros
) # stop capturing
(?=[]s)]|$) # match the end of string or ] or ) or space
Without the need for regex, you can also consider using is_integer()
after trying to conver the values into their respective numeric formats. While a little bit harder to read, it removes the need for regex and should be robust for further use cases given the string structure you provide:
[x for x in string.split() if float((pd.to_numeric(x.replace(r'(','').replace(r')','').replace(r',','.'),errors='coerce'))).is_integer()]
Returning the former values in the list:
['(2.000)', '2,000', '2']
Or if you’d like them cleaned:
[x for x in string.replace(r'(','').replace(r')','').replace(r',','.').split() if float((pd.to_numeric(x,errors='coerce'))).is_integer()]
Returning:
['2.000', '2.000', '2']
This should be easy – just get a number and check "is this an int value?".
Meaby something like this…
import re
text = 'Test (2.00001) Test (2.000) Test 2.1 Test (2,0001) Test 2,000 Test 2,1000 test 2'
out_ints = []
for x in re.findall(r'([0-9.,]+)', text):
possible_int = x.replace(',', '.')
is_int = int(float(possible_int)) == float(possible_int)
if is_int:
out_ints.append(int(float(possible_int)))
print(out_ints)
Output:
[2, 2, 2]
Or am i missing something?
You can use
re.findall(r'b(?<!d[.,])d+(?:[.,]0+)?b(?![,.]d)', text)
See the regex demo. Details:
b
– a word boundary
(?<!d[.,])
– no digit followed with .
or ,
immediately on the left
d+
– one or more digits
(?:[.,]0+)?
– an optional sequence of .
or ,
and then one or more zeros
b
– a word boundary
(?![,.]d)
– no ,
or .
and a digit allowed immediately to the right.
If you need to support thousand separators:
pattern = r'b(?<!d[.,])(?:d{1,3}(?:(?=([.,]))(?:1d{3})+)?|d{4,})(?:(?!1)[.,]0+)?b(?![,.]d)'
matches = [x.group() for x in re.finditer(pattern, text)]
See this regex demo.
I have the following case:
Test (2.00001) Test (2.000) Test 2.1 Test (2,0001) Test 2,000 Test 2,1000 test 2
I try to use regex to find only the integers:
2.000
2,000
2
but not the other float numbers.
I tried different things:
re.search('(?<![0-9.])2(?![.,]?[1-9])(?=[.,]*[0]*)(?![1-9]),...)
but this returns true for:
2.00001
2.000
2,000
2,0001
2
What have I to do?
UPDATE
I have updated the question and it should also find an integer without any comma and point, too (2
).
I would use:
import re
text = 'Test (2.00001) Test (2.000) Test 2.1 Test (2,0001) Test 2,000 Test 2,1000'
re.findall(r'(d+[.,]0+)(?!d)', text)
Output:
['2.000', '2,000']
Regex:
( # start capturing
d+ # match digit(s)
[.,] # match . or ,
0+ # match one or more zeros
) # stop capturing
(?!d) # ensure the last zero is not followed by a digit
If you also want to match "intergers" alone, surrounded by spaces or parentheses/brackets:
import re
text = 'Test (2.00001) Test (2.000) Test 2.1 Test (2,0001) Test 2,000 Test 2,1000 2'
re.findall(r'(?:^|[(s[])(d+(?:[.,]0+(?!d))?)(?=[]s)]|$)', text)
Regex:
(?:^|[(s[]) # match the start of string or [ or ( or space
( # start capturing
d+ # match digit(s)
(?:[.,]0+(?!d))? # optionally match . or , with only zeros
) # stop capturing
(?=[]s)]|$) # match the end of string or ] or ) or space
Without the need for regex, you can also consider using is_integer()
after trying to conver the values into their respective numeric formats. While a little bit harder to read, it removes the need for regex and should be robust for further use cases given the string structure you provide:
[x for x in string.split() if float((pd.to_numeric(x.replace(r'(','').replace(r')','').replace(r',','.'),errors='coerce'))).is_integer()]
Returning the former values in the list:
['(2.000)', '2,000', '2']
Or if you’d like them cleaned:
[x for x in string.replace(r'(','').replace(r')','').replace(r',','.').split() if float((pd.to_numeric(x,errors='coerce'))).is_integer()]
Returning:
['2.000', '2.000', '2']
This should be easy – just get a number and check "is this an int value?".
Meaby something like this…
import re
text = 'Test (2.00001) Test (2.000) Test 2.1 Test (2,0001) Test 2,000 Test 2,1000 test 2'
out_ints = []
for x in re.findall(r'([0-9.,]+)', text):
possible_int = x.replace(',', '.')
is_int = int(float(possible_int)) == float(possible_int)
if is_int:
out_ints.append(int(float(possible_int)))
print(out_ints)
Output:
[2, 2, 2]
Or am i missing something?
You can use
re.findall(r'b(?<!d[.,])d+(?:[.,]0+)?b(?![,.]d)', text)
See the regex demo. Details:
b
– a word boundary(?<!d[.,])
– no digit followed with.
or,
immediately on the leftd+
– one or more digits(?:[.,]0+)?
– an optional sequence of.
or,
and then one or more zerosb
– a word boundary(?![,.]d)
– no,
or.
and a digit allowed immediately to the right.
If you need to support thousand separators:
pattern = r'b(?<!d[.,])(?:d{1,3}(?:(?=([.,]))(?:1d{3})+)?|d{4,})(?:(?!1)[.,]0+)?b(?![,.]d)'
matches = [x.group() for x in re.finditer(pattern, text)]
See this regex demo.