Match a number of letters based on how many numbers are matched using Regex

Question:

I’m trying to create a regex pattern to match account ids following certain rules. This matching will occur within a python script using the re library, but I believe the question is mostly just a regex in general issue.

The account ids adhere to the following rules:

  1. Must be exactly 6 characters long
  2. The letters and numbers do not have to be unique

AND

  1. 3 uppercase letters followed by 3 numbers

OR

  1. Up to 6 numbers followed by an amount of letters that bring the length of the id to 6

So, the following would be ‘valid’ account ids:

ABC123
123456
12345A
1234AB
123ABC
12ABCD
1ABCDE
AAA111

And the following would be ‘invalid’ account ids

ABCDEF
ABCDE1
ABCD12
AB1234
A12345
ABCDEFG
1234567
1
12
123
1234
12345

I can match the 3 letters followed by 3 numbers very simply, but I’m having trouble understanding how to write a regex to varyingly match an amount of letters such that if x = number of numbers in string, then y = number of letters in string = 6 – x.

I suspect that using lookaheads might help solve this problem, but I’m still new to regex and don’t have an amazing grasp on how to use them correctly.

I have the following regex right now, which uses positive lookaheads to check if the string starts with a number or letter, and applies different matching rules accordingly:

((?=^[0-9])[0-9]{1,6}[A-Z]{0,5}$)|((?=^[A-Z])[A-Z]{3}[0-9]{3}$)

This works to match the ‘valid’ account ids listed above, however it also matches the following strings which should be invalid:

  • 1
  • 12
  • 123
  • 1234
  • 12345

How can I change the first capturing group ((?=^[0-9])[0-9]{1,6}[A-Z]{0,5}$) to know how many letters to match based on how many numbers begin the string, if that’s possible?

Asked By: Noah Petrasko

||

Answers:

I am unsure how to modify your regex to ensure that the overall username length is 6 characters. However, it would be extremely easy to check that in python.

import re

def check_username(name):
    if len(name) == 6:
        if re.search("((?=^[0-9])[0-9]{1,6}[A-Z]{0,5}$)|((?=^[A-Z])[A-Z]{3}[0-9]{3}$)", name) != None:
            return True
    return False

Hopefully this is helpful to you!

Answered By: aidan-j-rhoden

You could write the pattern as:

^(?=[A-Zd]{6}$)(?:[A-Z]{3}d{3}|d+[A-Z]*)$

Explanation

  • ^ Start of string
  • (?=[A-Zd]{6}$) Positive lookahead, assert 6 chars A-Z or digits till the end of the string
  • (?: Non capture group for the alternatives
    • [A-Z]{3}d{3} Match 3 chars A-Z and 3 digits
    • | Or
    • d+[A-Z]* Match 1+ digits and optional chars A-Z
  • ) Close the non capture group
  • $ End of string

Regex demo

Answered By: The fourth bird
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.