Pattern finder in python

Question:

Lets say I have a list with a bunch of numbers in it, I’m looking to make a function that will list and return the numbers that are being repeated in most of them.

Example code:

—ListOfNumbers = [1234, 9912349, 578]
-print(GetPatern(ListOfNumbers))
1234
Asked By: kevinjonson133

||

Answers:

If I understand you correctly kevinjohnson, than the output of your example should be 1234, because 1234 is repeated twice in the numbers of your list (once in 1234, and the other time inside of the 9912349). So, you are looking for subpatterns inside of the numbers.

If this is the case, the solution of user7347835 will not work, because he is iterating over full numbers instead of iterating over subpatterns. Therefore, one should change the datatype. This should work, you can define the length of the pattern as a function input (though one could add functionality that returns the biggest pattern if the number of occurence is equal to a smaller pattern).

from collections import Counter 
def pattern_finder(lst_of_numbers):
    # Set up a dict to store the different patterns 
    pattern_dct = {}

    # Transform numbers to strings 
    lst_of_str = [str(num) for num in lst_of_numbers]

    # Iterate over list items 
    for num_str in lst_of_str:
    
        # Iterate over string lengths 
        for str_lenghts in range(len(num_str)+1): 
            if not str_lenghts in pattern_dct:
                pattern_dct[str_lenghts] = []

            # Iterate over index and value of subpattern
            for idx, val in enumerate(num_str):
                pat = num_str[idx:idx+str_lenghts]
                if len(pat) == str_lenghts and len(pat) != 0:
                pattern_dct[str_lenghts].append(pat)
                                                       

    # Set up a dict to store the max occurences of the patterns with different length                 
    result_dct = {}
    for i in pattern_dct: 
        count_lst = Counter(pattern_dct[i])
        lst_max_pattern = [{count: pattern} for pattern, count in count_lst.items() if count==max(count_lst.values())]
        result_dct[i] = lst_max_pattern

    # Iterate over the items of the result dict and choose the longest pattern with max occurence 
    max_occurence = 0
    for dct_lst in result_dct.items():
        if dct_lst[1]:
            if list(dct_lst[1][0].keys())[0] >= max_occurence:
                max_occurence = list(dct_lst[1][0].keys())[0]
                max_values = dct_lst
    # Returns a tuple with the length of the subpattern as the first value and the list with dict items representing the number of occurences (keys) and the pattern (values)
    return max_values

Test 1:

lst_of_numbers = [1234, 123, 9999]
pattern_finder(lst_of_numbers)

Output: (Subpattern ‘9’ of length 1 is repeated 4 times)

(1, [{4: '9'}])

Test 2:

lst_of_numbers = [1234, 123]
pattern_finder(lst_of_numbers)

Output: (Subpattern ‘123’ of length 3 is repeated 2 times)

(3, [{2: '123'}])

This actually got a bit messy and does not scale well, maybe somebody has a shorter and cleaner solution 🙂

Answered By: Frank Gallagher
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.