Extract last sequence of digits from string along with everything that precede it

Question:

Consider the following string:

AB01CD03

What I want to do is break it down into two tokens namely AB01CD and 03.

In my string the number of digits following the last alpha character is unknown. There is always a sequence of digits at the end of the string.

Now, I can do this:

import re
S = 'AB01CD03'
v, = re.findall(r'(d+)$', S)
assert v == '03'

…and because I now know the length of v I can deduce how to acquire the preamble using a slice – e.g.,

preamble = S[:-len(v)]
assert preamble == 'AB01CD'

Bearing in mind that the preamble may contain digits, what I’m looking for is a single RE that will reveal the two separate tokens – i.e.,

a, b = re.findall(MAGIC_EXPRESSION, S)

Is this possible?

Asked By: Vlad

||

Answers:

Yes, like this:

import re
s = 'AB01CD03'
m = re.match(r'^(.+?)(d+)$', s)
print(m.group(1), m.group(2))

This works because the group (.+?) is not greedy, so the second group (d+) is allowed to match all the digits at the end. ^ and $ ensure the groups sit at the start and end respectively.

Result:

AB01CD 03

Closer to the syntax you were asking for:

a, b = re.match(r'^(.+?)(d+)$', s).groups()
Answered By: Grismar

You can use this:

import re

ls = ['AB01CD03', 'AB34565701CD04564563']
for s in ls:
    a, b = re.findall(r'(.*(?:D|^))(d+)', s)[0]
    print(a,b)

Output:

AB01CD 03
AB34565701CD 04564563

(.*(?:D|^))(d+)

1st Capturing Group (.*(?:D|^))

  • . matches any character (except for line terminators)

  • * matches the previous token between zero and unlimited times,

as many times as possible, giving back as needed (greedy)

Non-capturing group (?:D|^)

1st Alternative D

  • D matches any character that’s not a digit (equivalent to [^0-9])

2nd Alternative ^

  • ^ asserts position at start of a line

2nd Capturing Group (d+)

  • d matches a digit (equivalent to [0-9])

    + matches the previous token between one and unlimited times, as
    many times as possible, giving back as needed (greedy)

Answered By: Shahab Rahnama
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.