finding the longest substrings python
Question:
Hello i am new to python programming i have a task that requires me to find the longest substrings i have some written down that outputs the longest substring but i need it to output two of the longest substring
def sequences(nums):
pos1=0
pos2=1
seq1=0
seq2=1
while pos2<len(nums):
if nums[pos2] != nums[(pos2-1)]:
if seq2 - seq1 < pos2 - pos1:
seq1, seq2 = pos1, pos2
pos1=pos2
pos2+=1
if seq2 - seq1 < pos2 - pos1:
seq1, seq2 = pos1, pos2
return nums[seq1:seq2]
seq3=sequences
print(seq3('10111000'))`
eg:’10111000′ right now all i get is 111 but i would also like it to output 000 aswell i have been trying to find away to do it but cannot seem to crack it i would really appreciate your help
thank you
Answers:
Easiest way to do this is probably to use a regular expression to find repeating sequences of all the characters:
import re
def longest_substrings(input: str):
# get all unique characters in the input string
chars = set(input)
# initialise max_length to zero
max_length = 0
# empty list to store output
output = []
# loop over all characters in the string
for c in chars:
# find all substrings of any length
# matching *only* that character
substrings = re.findall(f"{c}+", input)
# loop over all substrings
for substring in substrings:
# check the length of the substring
length = len(substring)
# if longer than the previous max_length,
# we replace the output list with just this
# substring
if length > max_length:
output = [substring, ]
max_length = length
# if length is equal, we add this substring
# to the output
elif length == max_length:
output.append(substring)
# implicitly: if the length is less, we do
# nothing
return output
print(longest_substrings("1100111100111101100100001101"))
# ['1111', '1111', '0000']
f"{c}+"
is a regex pattern that turns into 0+
for 0
and 1+
for 1
. The +
means match one or more repeating occurrences of this character
.
I’d like to propose another solution, using groupby
:
def longest_substrings(input: str):
substring_generator = ("".join(g) for _, g in groupby(input))
sorted_substrings = sorted(substring_generator, key=lambda x: -len(x))
result = [list(g) for k, g in groupby(sorted_substrings, key=lambda x: len(x))][0]
return result
print(longest_substrings("1100111100111101100100001101"))
Output: ['1111', '1111', '0000']
Basically what I do here is the following:
First I group the input string. This means, that groups will be created for each consecutive similar item, like for example 111 or 00. -> substring_generator
This I sort in a descending way by the strings’ lengths. -> sorted_substrings
This sorted list I again group by the lengths of the strings and then I take the first element from it. -> result
.
The result should be exactly what you asked for and I think it is very readable and understandable what is happening.
Hello i am new to python programming i have a task that requires me to find the longest substrings i have some written down that outputs the longest substring but i need it to output two of the longest substring
def sequences(nums):
pos1=0
pos2=1
seq1=0
seq2=1
while pos2<len(nums):
if nums[pos2] != nums[(pos2-1)]:
if seq2 - seq1 < pos2 - pos1:
seq1, seq2 = pos1, pos2
pos1=pos2
pos2+=1
if seq2 - seq1 < pos2 - pos1:
seq1, seq2 = pos1, pos2
return nums[seq1:seq2]
seq3=sequences
print(seq3('10111000'))`
eg:’10111000′ right now all i get is 111 but i would also like it to output 000 aswell i have been trying to find away to do it but cannot seem to crack it i would really appreciate your help
thank you
Easiest way to do this is probably to use a regular expression to find repeating sequences of all the characters:
import re
def longest_substrings(input: str):
# get all unique characters in the input string
chars = set(input)
# initialise max_length to zero
max_length = 0
# empty list to store output
output = []
# loop over all characters in the string
for c in chars:
# find all substrings of any length
# matching *only* that character
substrings = re.findall(f"{c}+", input)
# loop over all substrings
for substring in substrings:
# check the length of the substring
length = len(substring)
# if longer than the previous max_length,
# we replace the output list with just this
# substring
if length > max_length:
output = [substring, ]
max_length = length
# if length is equal, we add this substring
# to the output
elif length == max_length:
output.append(substring)
# implicitly: if the length is less, we do
# nothing
return output
print(longest_substrings("1100111100111101100100001101"))
# ['1111', '1111', '0000']
f"{c}+"
is a regex pattern that turns into 0+
for 0
and 1+
for 1
. The +
means match one or more repeating occurrences of this character
.
I’d like to propose another solution, using groupby
:
def longest_substrings(input: str):
substring_generator = ("".join(g) for _, g in groupby(input))
sorted_substrings = sorted(substring_generator, key=lambda x: -len(x))
result = [list(g) for k, g in groupby(sorted_substrings, key=lambda x: len(x))][0]
return result
print(longest_substrings("1100111100111101100100001101"))
Output: ['1111', '1111', '0000']
Basically what I do here is the following:
First I group the input string. This means, that groups will be created for each consecutive similar item, like for example 111 or 00. -> substring_generator
This I sort in a descending way by the strings’ lengths. -> sorted_substrings
This sorted list I again group by the lengths of the strings and then I take the first element from it. -> result
.
The result should be exactly what you asked for and I think it is very readable and understandable what is happening.