Python: how to count the occurrence of specific pattern with overlap in a list or string?

Question:

My problem is logically easy, but hard to implement for me. I have a list of numbers,(or you could say a string of numbers, it is not hard to transfer between strings and lists) I’d like to count the occurrence of some specific patterns with overlap. For example, the code is below:

A = [0, 1, 2 ,4, 5, 8, 4, 4, 5, 8, 2, 4, 4, 5, 5, 8, 9, 10, 3, 2]

For “4,5,8” occurs, then I count a1 = 1, a2 = 1, a3 = 1. For “4,4,5,8” occurs, then I count a1 = 2, a2 = 1, a3 = 1. For “4,4,5,5,5,8,8,8” occurs, I count a1 = 2, a2 = 3, a3 = 3. That is to say, for a pattern, you count if the pattern at least include “4,5,8” in this order. “4,5,9” doesn’t count. “4,4,4,5,5,2,8” doesn’t count at all. For “4,5,4,5,8”, a1 = 1, a2 = 1, a3 = 1.

Thank you all for your help.

Asked By: Richard Riverlands

||

Answers:

You can match patterns like this using regular expressions.

https://regexr.com/ is a super helpful tool for experimenting with/learning about regular expressions.

The built in module re does the job:

import re

def make_regex_object(list_of_chars):
    # make a re object out of [a, b... n]: 'a+b+ ... n+''    (note the '+' at end)
    # the '+' means it matches one or more occurrence of each character
    return re.compile('+'.join([str(char) for char in list_of_chars]) + '+')

searcher = make_regex_object(['a', 'b', 'c', 'd'])
searcher.pattern  # 'a+b+c+d+'

x = searcher.search('abczzzzabbbcddefaaabbbccceeabc')   
# caution - search only matches first instance of pattern

print(x)   # <_sre.SRE_Match object; span=(7, 14), match='abbbcdd'>
x.end()    # 14
x.group()  # 'abbbcdd'

You could then repeat this on the remainder of your string if you want to count multiple instances of the pattern. You can count character occurrences with x.group().count(char) or something better.

Answered By: SAMBECK

I tried to use

re.findall(r’26+8′,test_string)

This will output the substring like “266666668”, “268”, “26668” in a nonoverlapping way. However, what if I want to search if there were a pattern shown below: “2(6+8+)+7” (this syntax doesn’t work in “re”, what I want essentially is a pattern like “266868688887”, in which you could see there is back and forth movement between 6 and 8. Once you reach 7, the search is done. Has anyone a right idea to express the pattern is “re”? Thanks!

Answered By: Richard Riverlands

You can do this using regular expressions in Python. This is called lookahead assertion where it matches a pattern but doesn’t consume any of the string.

a = 'BANANAAAS'
sub='AA'

import re
print(re.findall(r'(?=(AA))', a))

Reference: https://docs.python.org/3/library/re.html

(?=...) matches if … matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'.

Answered By: Jyothis – Intel
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.