Validate string format based on format
Question:
I have an issue with the following task.
I have a string:
ABCD[A] or A7D3[A,B,C]
- First 4 Characters are
0-9
or A-Z
.
- 5th character is
[
.
- 6th to nth character is
A-Z
followed by ,
in case there is more than one letter
e.g. A
, E,F
, A,B,C,D,F
I don’t know if there is a character limit with the middle part, so I have to assume it is 26 (A-Z).
- last character is
]
.
I need to verify, that the structure of the string is as stated above.
ABCD[A,B]
BD1F[E,G,A,R]
S4P5[C]
I tried with regex ( in python)
r = re.match('^[0-9A-Z]{4}[[A-Z,]+$',text)
text being an example of the string, however it is not working.
A true / false or 0 or 1 as result would be fine
Any ideas how this could be done? What I’ve seen on google so far regex would work, however I’m not proficient enough with it to solve this by myself.
Answers:
You can use '[0-9A-Z]{4}[[A-Z](?:,[A-Z]){,25}]'
:
import re
for s in ['ABCD[A,B]', 'BD1F[E,G,A,R]', 'S4P5[C]']:
print(re.fullmatch(r'[0-9A-Z]{4}[[A-Z](?:,[A-Z]){,25}]', s))
Note that the (?:,[A-Z]){,25}
limits the number of letters in the square brackets but does not ensure that they are non-duplicates.
Output:
<re.Match object; span=(0, 9), match='ABCD[A,B]'>
<re.Match object; span=(0, 13), match='BD1F[E,G,A,R]'>
<re.Match object; span=(0, 7), match='S4P5[C]'>
You can try:
import re
lst = ["ABCD[A,B]", "BD1F[E,G,A,R]", "S4P5[C]", "S4P5[CD]"]
pattern = r"^[A-Z0-9]{4}[[A-Z](?:,[A-Z])*]$"
for string in lst:
m = re.match(pattern, string)
print(bool(m), m)
output:
True <re.Match object; span=(0, 9), match='ABCD[A,B]'>
True <re.Match object; span=(0, 13), match='BD1F[E,G,A,R]'>
True <re.Match object; span=(0, 7), match='S4P5[C]'>
False None
Explanation:
^
: beginning of the string.
[A-Z0-9]{4}
for getting the first 4 characters.
[
for escaping the bracket.
[A-Z]
first character inside bracket is mandatory.
(?:,[A-Z])*
the rest would be optional.
]$
: end of the string.
Note-1: You could restrict the inside characters to 25 by changing *
to {,25}
.
Note-2: I didn’t escape the last bracket but doing so doesn’t hurt if you want (maybe better).
I have an issue with the following task.
I have a string:
ABCD[A] or A7D3[A,B,C]
- First 4 Characters are
0-9
orA-Z
. - 5th character is
[
. - 6th to nth character is
A-Z
followed by,
in case there is more than one letter
e.g.A
,E,F
,A,B,C,D,F
I don’t know if there is a character limit with the middle part, so I have to assume it is 26 (A-Z). - last character is
]
.
I need to verify, that the structure of the string is as stated above.
ABCD[A,B]
BD1F[E,G,A,R]
S4P5[C]
I tried with regex ( in python)
r = re.match('^[0-9A-Z]{4}[[A-Z,]+$',text)
text being an example of the string, however it is not working.
A true / false or 0 or 1 as result would be fine
Any ideas how this could be done? What I’ve seen on google so far regex would work, however I’m not proficient enough with it to solve this by myself.
You can use '[0-9A-Z]{4}[[A-Z](?:,[A-Z]){,25}]'
:
import re
for s in ['ABCD[A,B]', 'BD1F[E,G,A,R]', 'S4P5[C]']:
print(re.fullmatch(r'[0-9A-Z]{4}[[A-Z](?:,[A-Z]){,25}]', s))
Note that the (?:,[A-Z]){,25}
limits the number of letters in the square brackets but does not ensure that they are non-duplicates.
Output:
<re.Match object; span=(0, 9), match='ABCD[A,B]'>
<re.Match object; span=(0, 13), match='BD1F[E,G,A,R]'>
<re.Match object; span=(0, 7), match='S4P5[C]'>
You can try:
import re
lst = ["ABCD[A,B]", "BD1F[E,G,A,R]", "S4P5[C]", "S4P5[CD]"]
pattern = r"^[A-Z0-9]{4}[[A-Z](?:,[A-Z])*]$"
for string in lst:
m = re.match(pattern, string)
print(bool(m), m)
output:
True <re.Match object; span=(0, 9), match='ABCD[A,B]'>
True <re.Match object; span=(0, 13), match='BD1F[E,G,A,R]'>
True <re.Match object; span=(0, 7), match='S4P5[C]'>
False None
Explanation:
^
: beginning of the string.
[A-Z0-9]{4}
for getting the first 4 characters.
[
for escaping the bracket.
[A-Z]
first character inside bracket is mandatory.
(?:,[A-Z])*
the rest would be optional.
]$
: end of the string.
Note-1: You could restrict the inside characters to 25 by changing *
to {,25}
.
Note-2: I didn’t escape the last bracket but doing so doesn’t hurt if you want (maybe better).