finding the longest substrings python

Question

Hello i am new to python programming i have a task that requires me to find the longest substrings i have some written down that outputs the longest substring but i need it to output two of the longest substring

def sequences(nums):
   pos1=0
   pos2=1
   seq1=0
   seq2=1

        while pos2<len(nums):
    
            if nums[pos2] != nums[(pos2-1)]:
               if seq2 - seq1 < pos2 - pos1:
                    seq1, seq2 = pos1, pos2
               pos1=pos2
            pos2+=1
    
        if seq2 - seq1 < pos2 - pos1:
          seq1, seq2 = pos1, pos2
        return nums[seq1:seq2]
    
    
seq3=sequences
print(seq3('10111000'))`

eg:’10111000′ right now all i get is 111 but i would also like it to output 000 aswell i have been trying to find away to do it but cannot seem to crack it i would really appreciate your help
thank you

Asked By: GGANGG

||

Source

Answer 1

Easiest way to do this is probably to use a regular expression to find repeating sequences of all the characters:

import re

def longest_substrings(input: str):
    # get all unique characters in the input string
    chars = set(input)

    # initialise max_length to zero
    max_length = 0

    # empty list to store output
    output = []

    # loop over all characters in the string
    for c in chars:
        # find all substrings of any length
        # matching *only* that character
        substrings = re.findall(f"{c}+", input)

        # loop over all substrings
        for substring in substrings:
            # check the length of the substring
            length = len(substring)

            # if longer than the previous max_length,
            # we replace the output list with just this
            # substring
            if length > max_length:
                output = [substring, ]
                max_length = length
            # if length is equal, we add this substring
            # to the output
            elif length == max_length:
                output.append(substring)

            # implicitly: if the length is less, we do
            # nothing

    return output

print(longest_substrings("1100111100111101100100001101"))
# ['1111', '1111', '0000']

f"{c}+" is a regex pattern that turns into 0+ for 0 and 1+ for 1. The + means match one or more repeating occurrences of this character.

Answered By: Simon Lundberg

Answer 2

I’d like to propose another solution, using groupby:

def longest_substrings(input: str):

    substring_generator = ("".join(g) for _, g in groupby(input))
    sorted_substrings = sorted(substring_generator, key=lambda x: -len(x))
    result = [list(g) for k, g in groupby(sorted_substrings, key=lambda x: len(x))][0]
    return result

print(longest_substrings("1100111100111101100100001101"))

Output: ['1111', '1111', '0000']

Basically what I do here is the following:

First I group the input string. This means, that groups will be created for each consecutive similar item, like for example 111 or 00. -> substring_generator

This I sort in a descending way by the strings’ lengths. -> sorted_substrings

This sorted list I again group by the lengths of the strings and then I take the first element from it. -> result.

The result should be exactly what you asked for and I think it is very readable and understandable what is happening.

Answered By: Christian

finding the longest substrings python

Question:

Answers: