Validate number with multiple points
Question:
I want to remove a version number for a string that has multiple dots, for instance app-9.6.0 should return 9.6.0 and app-960 should return None. I tried with the code bellow but it returns numbers without the dots either.
import re
re.findall(r'[.d]+', 'app-960')
How can I implement a parser or regex for this case?
Answers:
Try this:
import re
str_for_search = 'app-9.6.0'
search = re.search(r'w+-(d+.d+.d+)', str_for_search)
if search:
version = search.group(1)
else:
version = None
print(version)
is it always 2 dots?
if so you can do this
import re
re.findall(r'd+.d+.d+', 'app-960')
re.findall(r'd+.d+.d+', 'app-9.6.0')
if no dots is ok then what you got already works
if you want atleast 1 dot you can do:
re.findall(r'd.[.d]+', 'app-9.6.0')
edit:
you can do this to avoid the multiple dots in a row problem:
import re
re.findall(r'd(?:.d)+', 'app-9.6.0 app=9..9 app1.1.1.1.1.1 app1')
If app-
is always going to be a consistent string, it’s probably going to be faster and easier to do this without a regex:
def version_number(full_version):
version_num = full_version.replace("app-", "", 1)
# If I'm interpreting your question right, you also
# want to validate that the version contains at least one dot
if "." in version_num:
return version_num
else:
return None
If you need to do this with a regex for some reason, others have given examples that should work.
If you want to match app-
and digits with 1 or more dots in between, you can use a capture group.
Start the capture with matching digits and then repeat 1 or more times matching a hyphen and again 1 or more digits.
Example
import re
pattern = r'app-(d+(?:.d+)+)'
s = 'app-1 app-9.6.0, app-1.0, app-1.1.0, app-1.10.0, app-1.1.1.0'
print (re.findall(pattern, s))
Output
['9.6.0', '1.0', '1.1.0', '1.10.0', '1.1.1.0']
A broader variant matching 1+ non whitespace chars with S+
before the hyphen:
pattern = r'S+-(d+(?:.d+)+)'
To be more general and avoid to be sticked to "app-" , i suggest this old way of programming with a simple algorithm:
As long as the end of the string is made of numbers or point, collect it.
At the end, check if a point. That’s all.
def old_geek(someString):
version = ""
pointFound = False
# read in reverse order
for i in range(len(someString)-1,-1,-1):
c = someString[i]
if not c in ".0123456789":
break
version = c + version
pointFound = pointFound or (c == '.')
if not pointFound:
version = None
return version
I test it against the regexp of 555Russich which is not sticked to app- (but has still some default with thisIsMyApp ):
print("------ old geek -----")
for test in tests:
print(test,' :',old_geek(test))
print("------- 555 Rushich")
for test in tests:
print(test,' :',reg1(test))
------ old geek -----
app-9.6.0 : 9.6.0
app-960 : None
thisIsMyApp-123456.0 : 123456.0
appli.bat-200.0.1 : 200.0.1
app-1.1.1.1.1.0 : 1.1.1.1.1.0
------- 555 Rushich
app-9.6.0 : 9.6.0
app-960 : None
thisIsMyApp-123456.0 : None
appli.bat-200.0.1 : 200.0.1
app-1.1.1.1.1.0 : 1.1.1
A question i asked me : what’s the performance between old way of programming and regexp ?
loop : 10000
check_oldGeek --- 0.03163409233093262 seconds
loop : 100000
check_oldGeek --- 0.31208014488220215 seconds
loop : 1000000
check_oldGeek --- 3.101634979248047 seconds
loop : 10000
reg1 --- 0.03866410255432129 seconds
loop : 100000
reg1 --- 0.3761019706726074 seconds
loop : 1000000
reg1 --- 3.765758991241455 seconds
old geek wins : 3.10s against 3.76s … for 1 million loops . not too much.
Hope you enjoy as i do 🙂
I want to remove a version number for a string that has multiple dots, for instance app-9.6.0 should return 9.6.0 and app-960 should return None. I tried with the code bellow but it returns numbers without the dots either.
import re
re.findall(r'[.d]+', 'app-960')
How can I implement a parser or regex for this case?
Try this:
import re
str_for_search = 'app-9.6.0'
search = re.search(r'w+-(d+.d+.d+)', str_for_search)
if search:
version = search.group(1)
else:
version = None
print(version)
is it always 2 dots?
if so you can do this
import re
re.findall(r'd+.d+.d+', 'app-960')
re.findall(r'd+.d+.d+', 'app-9.6.0')
if no dots is ok then what you got already works
if you want atleast 1 dot you can do:
re.findall(r'd.[.d]+', 'app-9.6.0')
edit:
you can do this to avoid the multiple dots in a row problem:
import re
re.findall(r'd(?:.d)+', 'app-9.6.0 app=9..9 app1.1.1.1.1.1 app1')
If app-
is always going to be a consistent string, it’s probably going to be faster and easier to do this without a regex:
def version_number(full_version):
version_num = full_version.replace("app-", "", 1)
# If I'm interpreting your question right, you also
# want to validate that the version contains at least one dot
if "." in version_num:
return version_num
else:
return None
If you need to do this with a regex for some reason, others have given examples that should work.
If you want to match app-
and digits with 1 or more dots in between, you can use a capture group.
Start the capture with matching digits and then repeat 1 or more times matching a hyphen and again 1 or more digits.
Example
import re
pattern = r'app-(d+(?:.d+)+)'
s = 'app-1 app-9.6.0, app-1.0, app-1.1.0, app-1.10.0, app-1.1.1.0'
print (re.findall(pattern, s))
Output
['9.6.0', '1.0', '1.1.0', '1.10.0', '1.1.1.0']
A broader variant matching 1+ non whitespace chars with S+
before the hyphen:
pattern = r'S+-(d+(?:.d+)+)'
To be more general and avoid to be sticked to "app-" , i suggest this old way of programming with a simple algorithm:
As long as the end of the string is made of numbers or point, collect it.
At the end, check if a point. That’s all.
def old_geek(someString):
version = ""
pointFound = False
# read in reverse order
for i in range(len(someString)-1,-1,-1):
c = someString[i]
if not c in ".0123456789":
break
version = c + version
pointFound = pointFound or (c == '.')
if not pointFound:
version = None
return version
I test it against the regexp of 555Russich which is not sticked to app- (but has still some default with thisIsMyApp ):
print("------ old geek -----")
for test in tests:
print(test,' :',old_geek(test))
print("------- 555 Rushich")
for test in tests:
print(test,' :',reg1(test))
------ old geek -----
app-9.6.0 : 9.6.0
app-960 : None
thisIsMyApp-123456.0 : 123456.0
appli.bat-200.0.1 : 200.0.1
app-1.1.1.1.1.0 : 1.1.1.1.1.0
------- 555 Rushich
app-9.6.0 : 9.6.0
app-960 : None
thisIsMyApp-123456.0 : None
appli.bat-200.0.1 : 200.0.1
app-1.1.1.1.1.0 : 1.1.1
A question i asked me : what’s the performance between old way of programming and regexp ?
loop : 10000
check_oldGeek --- 0.03163409233093262 seconds
loop : 100000
check_oldGeek --- 0.31208014488220215 seconds
loop : 1000000
check_oldGeek --- 3.101634979248047 seconds
loop : 10000
reg1 --- 0.03866410255432129 seconds
loop : 100000
reg1 --- 0.3761019706726074 seconds
loop : 1000000
reg1 --- 3.765758991241455 seconds
old geek wins : 3.10s against 3.76s … for 1 million loops . not too much.
Hope you enjoy as i do 🙂