Pattern finder in python
Question:
Lets say I have a list with a bunch of numbers in it, I’m looking to make a function that will list and return the numbers that are being repeated in most of them.
Example code:
—ListOfNumbers = [1234, 9912349, 578]
-print(GetPatern(ListOfNumbers))
1234
Answers:
If I understand you correctly kevinjohnson, than the output of your example should be 1234, because 1234 is repeated twice in the numbers of your list (once in 1234, and the other time inside of the 9912349). So, you are looking for subpatterns inside of the numbers.
If this is the case, the solution of user7347835 will not work, because he is iterating over full numbers instead of iterating over subpatterns. Therefore, one should change the datatype. This should work, you can define the length of the pattern as a function input (though one could add functionality that returns the biggest pattern if the number of occurence is equal to a smaller pattern).
from collections import Counter
def pattern_finder(lst_of_numbers):
# Set up a dict to store the different patterns
pattern_dct = {}
# Transform numbers to strings
lst_of_str = [str(num) for num in lst_of_numbers]
# Iterate over list items
for num_str in lst_of_str:
# Iterate over string lengths
for str_lenghts in range(len(num_str)+1):
if not str_lenghts in pattern_dct:
pattern_dct[str_lenghts] = []
# Iterate over index and value of subpattern
for idx, val in enumerate(num_str):
pat = num_str[idx:idx+str_lenghts]
if len(pat) == str_lenghts and len(pat) != 0:
pattern_dct[str_lenghts].append(pat)
# Set up a dict to store the max occurences of the patterns with different length
result_dct = {}
for i in pattern_dct:
count_lst = Counter(pattern_dct[i])
lst_max_pattern = [{count: pattern} for pattern, count in count_lst.items() if count==max(count_lst.values())]
result_dct[i] = lst_max_pattern
# Iterate over the items of the result dict and choose the longest pattern with max occurence
max_occurence = 0
for dct_lst in result_dct.items():
if dct_lst[1]:
if list(dct_lst[1][0].keys())[0] >= max_occurence:
max_occurence = list(dct_lst[1][0].keys())[0]
max_values = dct_lst
# Returns a tuple with the length of the subpattern as the first value and the list with dict items representing the number of occurences (keys) and the pattern (values)
return max_values
Test 1:
lst_of_numbers = [1234, 123, 9999]
pattern_finder(lst_of_numbers)
Output: (Subpattern ‘9’ of length 1 is repeated 4 times)
(1, [{4: '9'}])
Test 2:
lst_of_numbers = [1234, 123]
pattern_finder(lst_of_numbers)
Output: (Subpattern ‘123’ of length 3 is repeated 2 times)
(3, [{2: '123'}])
This actually got a bit messy and does not scale well, maybe somebody has a shorter and cleaner solution 🙂
Lets say I have a list with a bunch of numbers in it, I’m looking to make a function that will list and return the numbers that are being repeated in most of them.
Example code:
—ListOfNumbers = [1234, 9912349, 578]
-print(GetPatern(ListOfNumbers))
1234
If I understand you correctly kevinjohnson, than the output of your example should be 1234, because 1234 is repeated twice in the numbers of your list (once in 1234, and the other time inside of the 9912349). So, you are looking for subpatterns inside of the numbers.
If this is the case, the solution of user7347835 will not work, because he is iterating over full numbers instead of iterating over subpatterns. Therefore, one should change the datatype. This should work, you can define the length of the pattern as a function input (though one could add functionality that returns the biggest pattern if the number of occurence is equal to a smaller pattern).
from collections import Counter
def pattern_finder(lst_of_numbers):
# Set up a dict to store the different patterns
pattern_dct = {}
# Transform numbers to strings
lst_of_str = [str(num) for num in lst_of_numbers]
# Iterate over list items
for num_str in lst_of_str:
# Iterate over string lengths
for str_lenghts in range(len(num_str)+1):
if not str_lenghts in pattern_dct:
pattern_dct[str_lenghts] = []
# Iterate over index and value of subpattern
for idx, val in enumerate(num_str):
pat = num_str[idx:idx+str_lenghts]
if len(pat) == str_lenghts and len(pat) != 0:
pattern_dct[str_lenghts].append(pat)
# Set up a dict to store the max occurences of the patterns with different length
result_dct = {}
for i in pattern_dct:
count_lst = Counter(pattern_dct[i])
lst_max_pattern = [{count: pattern} for pattern, count in count_lst.items() if count==max(count_lst.values())]
result_dct[i] = lst_max_pattern
# Iterate over the items of the result dict and choose the longest pattern with max occurence
max_occurence = 0
for dct_lst in result_dct.items():
if dct_lst[1]:
if list(dct_lst[1][0].keys())[0] >= max_occurence:
max_occurence = list(dct_lst[1][0].keys())[0]
max_values = dct_lst
# Returns a tuple with the length of the subpattern as the first value and the list with dict items representing the number of occurences (keys) and the pattern (values)
return max_values
Test 1:
lst_of_numbers = [1234, 123, 9999]
pattern_finder(lst_of_numbers)
Output: (Subpattern ‘9’ of length 1 is repeated 4 times)
(1, [{4: '9'}])
Test 2:
lst_of_numbers = [1234, 123]
pattern_finder(lst_of_numbers)
Output: (Subpattern ‘123’ of length 3 is repeated 2 times)
(3, [{2: '123'}])
This actually got a bit messy and does not scale well, maybe somebody has a shorter and cleaner solution 🙂