Common substring in list of strings

Question:

i encountered a problem while trying to solve a problem where given some strings and their lengths, you need to find their common substring. My code for the part where it loops through the list and then each through each word in it is this:

num_of_cases = int(input())
for i in range(1, num_of_cases+1):
    if __name__ == '__main__':
        len_of_str = list(map(int, input().split()))
        len_of_virus = int(input())

    strings = []
    def string(strings, len_of_str):
        len_of_list = len(len_of_str)
        for i in range(1, len_of_list+1):
            strings.append(input())
    
    lst_of_subs = []
    virus_index = []
    def substr(strings, len_of_virus):
        for word in strings:
             for i in range(len(len_of_str)):
                  leng = word[i:len_of_virus]
                  lst_of_subs.append(leng)
                  virus_index.append(i)

    print(string(strings, len_of_str))
    print(substr(strings, len_of_virus))

And it prints the following given the strings: ananasso, associazione, tassonomia, massone

['anan', 'nan', 'an', 'n', 'asso', 'sso', 'so', 'o', 'tass', 'ass', 'ss', 's', 'mass', 'ass', 'ss', 's']

It seems that the end index doesn’t increase, although i tried it by writing len_of_virus += 1 at the end of the loop.

sample input:

1
8 12 10 7
4
ananasso
associazione
tassonomia
massone

where the 1st letter is the number of cases, the second line is the name of the strings, 3rd is the length of the virus(the common substring), and then there are the given strings that i should loop through.

expected output:

Case #1: 4 0 1 1

where the four numbers are the starting indexes of the common substring.(i dont think that code for printing cares us for this particular problem)

What should i do? Please help!!

Asked By: geoka

||

Answers:

The problem, beside defining functions in odd places and using said function to get side effect in ways that aren’t really encourage, is here:

for i in range(len(len_of_str)):
    leng = word[i:len_of_virus]

i constantly increase in each iteration, but len_of_virus stay the same, so you are effectively doing

word[0:4] #when len_of_virus=4
word[1:4]
word[2:4]
word[3:4]
...

that is where the 'anan', 'nan', 'an', 'n', come from the first word "ananasso", and the same for the other

>>> word="ananasso"
>>> len_of_virus = 4
>>> for i in range(len(word)):
        word[i:len_of_virus]

    
'anan'
'nan'
'an'
'n'
''
''
''
''
>>> 

you can fix it moving the upper end by i, but that leave with the same problem in the other end

>>> for i in range(len(word)):
    word[i:len_of_virus+i]

    
'anan'
'nana'
'anas'
'nass'
'asso'
'sso'
'so'
'o'
>>>

so some simple adjustments in the range and problem solve:

>>> for i in range(len(word)-len_of_virus+1):
        word[i:len_of_virus+i]

    
'anan'
'nana'
'anas'
'nass'
'asso'
>>> 

Now that the substring part is done, the rest is also easy

>>> def substring(text,size):
        return [text[i:i+size] for i in range(len(text)-size+1)]

>>> def find_common(lst_text,size):
        subs = [set(substring(x,size)) for x in lst_text]
        return set.intersection(*subs)

>>> test="""ananasso
associazione
tassonomia
massone""".split()
>>> find_common(test,4)
{'asso'}
>>> 

To find the common part to all the strings in our list we can use a set, first we put all the substring of a given word into a set and finally we intersect them all.

the rest is just printing it to your liking

>>> virus = find_common(test,4).pop()
>>> print("case 1:",*[x.index(virus) for x in test])
case 1: 4 0 1 1
>>> 
Answered By: Copperfield

First extract all the substrings of the give size from the shortest string. Then select the first of these substrings that is present in all of the strings. Finally output the position of this common substring in each of the strings:

def commonSubs(strings,size):
    base = min(strings,key=len) # shortest string
    subs = [base[i:i+size] for i in range(len(base)-size+1)] # all substrings
    cs = next(ss for ss in subs if all(ss in s for s in strings)) # first common
    return [s.index(cs) for s in strings] # indexes of common substring

output:

S = ["ananasso", "associazione", "tassonomia", "massone"]
print(commonSubs(S,4))
[4, 0, 1, 1]

You could also use a recursive approach:

def commonSubs(strings,size,i=0):
    sub = strings[0][i:i+size]
    if all(sub in s for s in strings):
        return [s.index(sub) for s in strings]
    return commonSubs(strings,size,i+1)
Answered By: Alain T.
from suffix_trees import STree


STree.STree(["come have some apple pies",
       'apple pie available',
       'i love apple pie haha']).lcs()

the most simple way is use STree

Answered By: 吴思位