python/regex: match letter only or letter followed by number

Question:

I want to split this string ‘AB4F2D’ in [‘A’, ‘B4’, ‘F2’, ‘D’].
Essentially, if character is a letter, return the letter, if character is a number return previous character plus present character (luckily there is no number >9 so there is never a X12).

I have tried several combinations but I am not able to find the correct one:

def get_elements(input_string):

    patterns = [
        r'[A-Z][A-Z0-9]',
        r'[A-Z][A-Z0-9]|[A-Z]',
        r'D|Dd',
        r'[A-Z]|[A-Z][0-9]',
        r'[A-Z]{1}|[A-Z0-9]{1,2}'
        ]

    for p in patterns:
        elements = re.findall(p, input_string)
        print(elements)

results:

['AB', 'F2']
['AB', 'F2', 'D']
['A', 'B', 'F', 'D']
['A', 'B', 'F', 'D']
['A', 'B', '4F', '2D']

Can anyone help? Thanks

Asked By: Marco Di Gennaro

||

Answers:

Dd?

One problem with yours is that you put the shorter alternative first, so the longer one never gets a chance. For example, the correct version of your D|Dd is Dd|D. But just use Dd?.

Answered By: Kelly Bundy

Use Extended Groups

There is special syntax for python regexes allowing you to match ahead without consuming the characters (and much more).

Here is a pattern I would come up with using that:

[A-Z](?![0-9])|[A-Z][0-9]

This matches everything in just one pattern. There might be simpler ways to match it, but I find this to be the most flexible if you want to adjust it later. Read it like this: greedily match a letter if the next character is not a digit. If that is not the case, match a letter followed by a digit.

More info in the docs. If you want to test around I recommend using a regex tester like this and make sure to select python syntax.

Answered By: Pinko
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.