how to check input string with list of pattern sequentially in python?

Question:

I have specific patterns which composed of string, numbers and special character in specific order. I would like to check input string is in the list of pattern that I created and print error if seeing incorrect input. To do so, I tried of using regex but my code is not neat enough. I am wondering if someone help me with this.

use case

I have input att2_epic_app_clm1_sub_valid, where I split them by _; here is list of pattern I am expecting to check and print error if not match.

Rule:

input should start with att and some number like [att][0-6]*, or [ptt][0-6]; after that it should be continued at either epic or semi, then it should be continued with [app][0-6] or [app][0-6_][clm][0-9_]+[sub|sup]; then it should end with [valid|Invalid]

so I composed this pattern with re but when I passed invalid input, it is not detected and I expect error instead.

import re

acceptable_pattern=re.compile(r'([att]+[0-6_])(epic|semi_)([app]+[0-6_]+[clm]+[0-6_])([sub|sup_])([valid|invalid]))'
    input='att1_epic_app2_clm1_sub_valid'   # this is valid string

wlist=input.split('_')
for each in wlist:
  if any(ext in each for ext in acceptable_pattern): 
     print("valid")
  else:
     print("invalid")

this is not quite working because I have to check the string from beginning to end where split the string by _ where each new string much match of of the predefined rule such as:

input string should start with att|ptt which end with between 1-6; then next new word either epic or semi; then it should be app or app1~app6 or app{1_6}clm{1~6}{sub|sup_}; then string end with {valid|invalid};

how should I specify those rules by using re.compile to check pattern in input string and raise error if it is not sequentially? How should we do this in python? any quick way of making this happen?

Asked By: beyond_inifinity

||

Answers:

Instead of using split, you could consider writing a pattern that validates the whole string.

If I am reading the requirements, you might use:

^[ap]tt[0-6]_(?:epic|semi)_app(?:[1-6]|[1-6_]clm[0-9]*_su[bp])?_valid$
  • ^ Start of string
  • [ap]tt[0-6] match att or ptt and a digit 0-6
  • _(?:epic|semi) Match _epic or _semi
  • _app Match literally
  • (?: Non capture group for the alternation
    • [1-6] Match a digit 1-6
    • | Or
    • [1-6_]clm[0-9]*_su[bp] Match a digit 1-6 or _, then clm followed by optional digit 0-9 and then _sub or _sup
  • )? Close the non capture group and make it optional
  • _valid Match literally
  • $ End of string

See a regex demo.

If the string can also start with dev then you can use an alternation:

^(?:[ap]tt|dev)[0-6]_(?:epic|semi)_app(?:[1-6]|[1-6_]clm[0-9]*_su[bp])?_valid$

See another regex demo.

Then you can check if there was a match:

import re

pattern = r"^(?:[ap]tt|dev)[0-6]_(?:epic|semi)_app(?:[1-6]|[1-6_]clm[0-9]*_su[bp])?_valid$"

strings = [
    "att2_epic_app_clm1_sub_valid",
    "att12_epic_app_clm1_sub_valid",
    "att2_epic_app_valid",
    "att2_epic_app_clm1_sub_valid"
]

for s in strings:
    m = re.match(pattern, s, re.M)
    if m:
        print("Valid: " + m.group())
    else:
        print("Invalid: " + s)

Output

Valid: att2_epic_app_clm1_sub_valid
Invalid: att12_epic_app_clm1_sub_valid
Valid: att2_epic_app_valid
Valid: att2_epic_app_clm1_sub_valid
Answered By: The fourth bird
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.