python regex: get end digits from a string
Question:
I am quite new to python and regex (regex newbie here), and I have the following simple string:
s=r"""99-my-name-is-John-Smith-6376827-%^-1-2-767980716"""
I would like to extract only the last digits in the above string i.e 767980716 and I was wondering how I could achieve this using python regex.
I wanted to do something similar along the lines of:
re.compile(r"""-(.*?)""").search(str(s)).group(1)
indicating that I want to find the stuff in between (.*?) which starts with a “-” and ends at the end of string – but this returns nothing..
I was wondering if anyone could point me in the right direction..
Thanks.
Answers:
Use the below regex
d+$
$
depicts the end of string..
d
is a digit
+
matches the preceding character 1 to many times
Your Regex
should be (d+)$
.
d+
is used to match digit (one or more)
$
is used to match at the end of string.
So, your code should be: –
>>> s = "99-my-name-is-John-Smith-6376827-%^-1-2-767980716"
>>> import re
>>> re.compile(r'(d+)$').search(s).group(1)
'767980716'
And you don’t need to use str
function here, as s
is already a string.
Try using d+$
instead. That matches one or more numeric characters followed by the end of the string.
You can use re.match
to find only the characters:
>>> import re
>>> s=r"""99-my-name-is-John-Smith-6376827-%^-1-2-767980716"""
>>> re.match('.*?([0-9]+)$', s).group(1)
'767980716'
Alternatively, re.finditer
works just as well:
>>> next(re.finditer(r'd+$', s)).group(0)
'767980716'
Explanation of all regexp components:
.*?
is a non-greedy match and consumes only as much as possible (a greedy match would consume everything except for the last digit).
[0-9]
and d
are two different ways of capturing digits. Note that the latter also matches digits in other writing schemes, like ୪ or ൨.
- Parentheses (
()
) make the content of the expression a group, which can be retrieved with group(1)
(or 2 for the second group, 0 for the whole match).
+
means multiple entries (at least one number at the end).
$
matches only the end of the input.
Nice and simple with findall
:
import re
s=r"""99-my-name-is-John-Smith-6376827-%^-1-2-767980716"""
print re.findall('^.*-([0-9]+)$',s)
>>> ['767980716']
Regex Explanation:
^ # Match the start of the string
.* # Followed by anthing
- # Upto the last hyphen
([0-9]+) # Capture the digits after the hyphen
$ # Upto the end of the string
Or more simply just match the digits followed at the end of the string '([0-9]+)$'
Save the regular expressions for something that requires more heavy lifting.
>>> def parse_last_digits(line): return line.split('-')[-1]
>>> s = parse_last_digits(r"99-my-name-is-John-Smith-6376827-%^-1-2-767980716")
>>> s
'767980716'
I have been playing around with several of these solutions, but many seem to fail if there are no numeric digits at the end of the string. The following code should work.
import re
W = input("Enter a string:")
if re.match('.*?([0-9]+)$', W)== None:
last_digits = "None"
else:
last_digits = re.match('.*?([0-9]+)$', W).group(1)
print("Last digits of "+W+" are "+last_digits)
I am quite new to python and regex (regex newbie here), and I have the following simple string:
s=r"""99-my-name-is-John-Smith-6376827-%^-1-2-767980716"""
I would like to extract only the last digits in the above string i.e 767980716 and I was wondering how I could achieve this using python regex.
I wanted to do something similar along the lines of:
re.compile(r"""-(.*?)""").search(str(s)).group(1)
indicating that I want to find the stuff in between (.*?) which starts with a “-” and ends at the end of string – but this returns nothing..
I was wondering if anyone could point me in the right direction..
Thanks.
Use the below regex
d+$
$
depicts the end of string..
d
is a digit
+
matches the preceding character 1 to many times
Your Regex
should be (d+)$
.
d+
is used to match digit (one or more)$
is used to match at the end of string.
So, your code should be: –
>>> s = "99-my-name-is-John-Smith-6376827-%^-1-2-767980716"
>>> import re
>>> re.compile(r'(d+)$').search(s).group(1)
'767980716'
And you don’t need to use str
function here, as s
is already a string.
Try using d+$
instead. That matches one or more numeric characters followed by the end of the string.
You can use re.match
to find only the characters:
>>> import re
>>> s=r"""99-my-name-is-John-Smith-6376827-%^-1-2-767980716"""
>>> re.match('.*?([0-9]+)$', s).group(1)
'767980716'
Alternatively, re.finditer
works just as well:
>>> next(re.finditer(r'd+$', s)).group(0)
'767980716'
Explanation of all regexp components:
.*?
is a non-greedy match and consumes only as much as possible (a greedy match would consume everything except for the last digit).[0-9]
andd
are two different ways of capturing digits. Note that the latter also matches digits in other writing schemes, like ୪ or ൨.- Parentheses (
()
) make the content of the expression a group, which can be retrieved withgroup(1)
(or 2 for the second group, 0 for the whole match). +
means multiple entries (at least one number at the end).$
matches only the end of the input.
Nice and simple with findall
:
import re
s=r"""99-my-name-is-John-Smith-6376827-%^-1-2-767980716"""
print re.findall('^.*-([0-9]+)$',s)
>>> ['767980716']
Regex Explanation:
^ # Match the start of the string
.* # Followed by anthing
- # Upto the last hyphen
([0-9]+) # Capture the digits after the hyphen
$ # Upto the end of the string
Or more simply just match the digits followed at the end of the string '([0-9]+)$'
Save the regular expressions for something that requires more heavy lifting.
>>> def parse_last_digits(line): return line.split('-')[-1]
>>> s = parse_last_digits(r"99-my-name-is-John-Smith-6376827-%^-1-2-767980716")
>>> s
'767980716'
I have been playing around with several of these solutions, but many seem to fail if there are no numeric digits at the end of the string. The following code should work.
import re
W = input("Enter a string:")
if re.match('.*?([0-9]+)$', W)== None:
last_digits = "None"
else:
last_digits = re.match('.*?([0-9]+)$', W).group(1)
print("Last digits of "+W+" are "+last_digits)