match specific pattern with few substitution

Question:

HI I am working on antibodies where I have to find a specific pattern for it’s antigen specificity using python. I am puzzling over to find a match pattern with predefined numbers of substitution.

I tried regex (re.findall/re.search) with possible permutation/combination but this couldn’t solve my problem. Also, searching on internet didn’t help.

Not sure though if it needs an AI/ML algorithm to match specific pattern.

condition:-

I want to match any given string with the pattern with maximum 4
possible substitutions from substitution_list at any position
without changing its original frame.

substitution_list=’A’,’C’,’D’,’E’,’F’,’G’,’H’,’I’,’K’,’L’,’M’,’N’,’P’,’Q’,’R’,’S’,’T’,’V’,’W’,’Y’]

pattern="AVTLDPQRSTSTRP"

e.g:-

  string_1="AV**A**LDPQRSTSTRP" --> matched
  string_2="AV**A**LDPQ**C**STSTRP" --> matched
  string_3="AV**V**L**P**PQ**L**ST**L**TRP" --> matched
  string_4="**L**V**V**L**P**PQ**L**STS**C**RP" --> NOT matched (5 substitution)
  string_5="TRPAVQRSTLDPTS" --> NOT matched (original frame has changed)

Thanks.

Asked By: shivam

||

Answers:

I have find a way (dirty though) which helps me in this particular case.

  def match_pattern(string):
        pattern='AVTLDPQRSTSTRP'  ### standard template

        max_subs=4 ### maximum allowed substitutions
        score=0
        for i in range(len(string)):
            if string[i]!=pattern[i]:
                score+=1
        # print(score)
        if score <=max_subs:
            print('String matched')
        else:
            print('Not matched')

testing

 test_strings=["AVALDPQRSTSTRP" ,"AVALDPQCSTSTRP" ,"AVVLPPQLSTLTRP" ,"LVVLPPQLSTSCRP" ,"TRPAVQRSTLDPTS"]
 for string in test_strings:
     match_pattern(string)


  String matched
  String matched
  String matched
  Not matched
  Not matched
Answered By: shivam