Python regex to extract version from a string
Question:
The string looks like this: (n
used to break the line)
MySQL-vm
Version 1.0.1
WARNING:: NEVER EDIT/DELETE THIS SECTION
What I want is only 1.0.1 .
I am trying re.search(r"Version+'([^']*)'", my_string, re.M).group(1)
but it is not working.
re.findall(r'd+', version)
is giving me an array of the numbers which again I have to append.
How can I improve the regex ?
Answers:
Use the below regex and get the version number from group index 1.
Versions*([d.]+)
>>> import re
>>> s = """MySQL-vm
... Version 1.0.1
...
... WARNING:: NEVER EDIT/DELETE THIS SECTION"""
>>> re.search(r'Versions*([d.]+)', s).group(1)
'1.0.1'
Explanation:
Version 'Version'
s* whitespace (n, r, t, f, and " ") (0 or
more times)
( group and capture to 1:
[d.]+ any character of: digits (0-9), '.' (1
or more times)
) end of 1
You can try with Positive Look behind as well that do not consume characters in the string, but only assert whether a match is possible or not. In below regex you don’t need to findAll
and group
functions.
(?<=Version )[d.]+
Explanation:
(?<= look behind to see if there is:
Version 'Version '
) end of look-behind
[d.]+ any character of: digits (0-9), '.' (1 or more times)
(?<=Versions)S+
Try this.Use this with re.findall
.
x="""MySQL-vm
Version 1.0.1
WARNING:: NEVER EDIT/DELETE THIS SECTION"""
print re.findall(r"(?<=Versions)S+",x)
Output:[‘1.0.1’]
See demo.
https://regex101.com/r/5Us6ow/1
Bit recursive to match versions like 1, 1.0, 1.0.1:
def version_parser(v):
versionPattern = r'd+(=?.(d+(=?.(d+)*)*)*)*'
regexMatcher = re.compile(versionPattern)
return regexMatcher.search(v).group(0)
Old question but none of the answers cover corner cases such as Version 1.2.3.
(ending with dot) or Version 1.2.3.A
(ending with non-numeric values)
Here is my solution:
ver = "Version 1.2.3.9nWarning blah blah..."
print(bool(re.match("Versions*[d.]+d", ver)))
We can use the python re
library.
The regex described is for versions containing numbers only.
import re
versions = re.findall('[0-9]+.[0-9]+.?[0-9]*', AVAILABLE_VERSIONS)
unique_versions = set(versions) # convert it to set to get unique versions
Where
AVAILABLE_VERSIONS
is string containing versions.
The string looks like this: (n
used to break the line)
MySQL-vm
Version 1.0.1
WARNING:: NEVER EDIT/DELETE THIS SECTION
What I want is only 1.0.1 .
I am trying re.search(r"Version+'([^']*)'", my_string, re.M).group(1)
but it is not working.
re.findall(r'd+', version)
is giving me an array of the numbers which again I have to append.
How can I improve the regex ?
Use the below regex and get the version number from group index 1.
Versions*([d.]+)
>>> import re
>>> s = """MySQL-vm
... Version 1.0.1
...
... WARNING:: NEVER EDIT/DELETE THIS SECTION"""
>>> re.search(r'Versions*([d.]+)', s).group(1)
'1.0.1'
Explanation:
Version 'Version'
s* whitespace (n, r, t, f, and " ") (0 or
more times)
( group and capture to 1:
[d.]+ any character of: digits (0-9), '.' (1
or more times)
) end of 1
You can try with Positive Look behind as well that do not consume characters in the string, but only assert whether a match is possible or not. In below regex you don’t need to findAll
and group
functions.
(?<=Version )[d.]+
Explanation:
(?<= look behind to see if there is:
Version 'Version '
) end of look-behind
[d.]+ any character of: digits (0-9), '.' (1 or more times)
(?<=Versions)S+
Try this.Use this with re.findall
.
x="""MySQL-vm
Version 1.0.1
WARNING:: NEVER EDIT/DELETE THIS SECTION"""
print re.findall(r"(?<=Versions)S+",x)
Output:[‘1.0.1’]
See demo.
https://regex101.com/r/5Us6ow/1
Bit recursive to match versions like 1, 1.0, 1.0.1:
def version_parser(v):
versionPattern = r'd+(=?.(d+(=?.(d+)*)*)*)*'
regexMatcher = re.compile(versionPattern)
return regexMatcher.search(v).group(0)
Old question but none of the answers cover corner cases such as Version 1.2.3.
(ending with dot) or Version 1.2.3.A
(ending with non-numeric values)
Here is my solution:
ver = "Version 1.2.3.9nWarning blah blah..."
print(bool(re.match("Versions*[d.]+d", ver)))
We can use the python re
library.
The regex described is for versions containing numbers only.
import re
versions = re.findall('[0-9]+.[0-9]+.?[0-9]*', AVAILABLE_VERSIONS)
unique_versions = set(versions) # convert it to set to get unique versions
Where
AVAILABLE_VERSIONS
is string containing versions.