Validate string format based on format

Question:

I have an issue with the following task.
I have a string:

ABCD[A] or A7D3[A,B,C]
  • First 4 Characters are 0-9 or A-Z.
  • 5th character is [.
  • 6th to nth character is A-Z followed by , in case there is more than one letter
    e.g. A, E,F, A,B,C,D,F I don’t know if there is a character limit with the middle part, so I have to assume it is 26 (A-Z).
  • last character is ].

I need to verify, that the structure of the string is as stated above.

ABCD[A,B]
BD1F[E,G,A,R]
S4P5[C]

I tried with regex ( in python)

r = re.match('^[0-9A-Z]{4}[[A-Z,]+$',text)

text being an example of the string, however it is not working.
A true / false or 0 or 1 as result would be fine

Any ideas how this could be done? What I’ve seen on google so far regex would work, however I’m not proficient enough with it to solve this by myself.

Asked By: Murf

||

Answers:

You can use '[0-9A-Z]{4}[[A-Z](?:,[A-Z]){,25}]':

import re
for s in ['ABCD[A,B]', 'BD1F[E,G,A,R]', 'S4P5[C]']:
    print(re.fullmatch(r'[0-9A-Z]{4}[[A-Z](?:,[A-Z]){,25}]', s))

Note that the (?:,[A-Z]){,25} limits the number of letters in the square brackets but does not ensure that they are non-duplicates.

Output:

<re.Match object; span=(0, 9), match='ABCD[A,B]'>
<re.Match object; span=(0, 13), match='BD1F[E,G,A,R]'>
<re.Match object; span=(0, 7), match='S4P5[C]'>

regex demo

Answered By: mozway

You can try:

import re

lst = ["ABCD[A,B]", "BD1F[E,G,A,R]", "S4P5[C]", "S4P5[CD]"]
pattern = r"^[A-Z0-9]{4}[[A-Z](?:,[A-Z])*]$"

for string in lst:
    m = re.match(pattern, string)
    print(bool(m), m)

output:

True <re.Match object; span=(0, 9), match='ABCD[A,B]'>
True <re.Match object; span=(0, 13), match='BD1F[E,G,A,R]'>
True <re.Match object; span=(0, 7), match='S4P5[C]'>
False None

Explanation:

^: beginning of the string.
[A-Z0-9]{4} for getting the first 4 characters.
[ for escaping the bracket.
[A-Z] first character inside bracket is mandatory.
(?:,[A-Z])* the rest would be optional.
]$: end of the string.

Note-1: You could restrict the inside characters to 25 by changing * to {,25}.

Note-2: I didn’t escape the last bracket but doing so doesn’t hurt if you want (maybe better).

Answered By: S.B
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.