Find Longest Alphabetically Ordered Substring – Efficiently

Question:

The goal of some a piece of code I wrote is to find the longest alphabetically ordered substring within a string.

"""
Find longest alphabetically ordered substring in string s.
"""

s = 'zabcabcd' # Test string.

alphabetical_str, temp_str = s[0], s[0]
for i in range(len(s) - 1):  # Loop through string.
    if s[i] <= s[i + 1]:  # Check if next character is alphabetically next.
        temp_str += s[i + 1]  # Add character to temporary string.
        if len(temp_str) > len(alphabetical_str):  # Check is temporary string is the longest string.
            alphabetical_str = temp_str  # Assign longest string.
    else:
        temp_str = s[i + 1]  # Assign last checked character to temporary string.
print(alphabetical_str)

I get an output of abcd.

But the instructor says there is PEP 8 compliant way of writing this code that is 7-8 lines of code and there is a more computational efficient way of writing this code that is ~16 lines. Also that there is a way of writing this code in only 1 line 75 character!

Can anyone provide some insight on what the code would look like if it was 7-8 lines or what the most work appropriate way of writing this code would be? Also any PEP 8 compliance critique would be appreciated.

Asked By: Ryan Rosenfeld

||

Answers:

Depending on how you choose to count, this is only 6-7 lines and PEP 8 compliant:

def longest_alphabetical_substring(s):
    sub = '', 0
    for i in range(len(s)):
        j = i + len(sub) + 1
        while list(s[i:j]) == sorted(s[i:j]) and j <= len(s):
            sub, j = s[i:j], j+1
    return sub


print(longest_alphabetical_substring('zabcabcd'))

Your own code was PEP 8 compliant as far as I can tell, although it would make sense to capture code like this in a function, for easy reuse and logical grouping for improved readability.

The solution I provided here is not very efficient, as it keeps extracting copies of the best result so far. A slightly longer solution that avoids this:

def longest_alphabetical_substring(s):
    n = m = 0
    for i in range(len(s)):
        for j in range(i+1, len(s)+1):
            if j == len(s) or s[j] < s[j-1]:
                if j-i > m-n:
                    n, m = i, j
                break
    return s[n:m]


print(longest_alphabetical_substring('zabcabcd'))

There may be more efficient ways of doing this; for example you could detect that there’s no need to keep looking because there is not enough room left in the string to find longer strings, and exit the outer loop sooner.

User @kellybundy is correct, a truly efficient solution would be linear in time. Something like:

def las_efficient(s):
    t = s[0]
    return max([(t := c) if c < t[-1] else (t := t + c) for c in s[1:]], key=len)


print(las_efficient('zabcabcd'))

No points for readability here, but PEP 8 otherwise, and very brief.

And for an even more efficient solution:

def las_very_efficient(s):
    m, lm, t, ls = '', 0, s[0], len(s)
    for n, c in enumerate(s[1:]):
        if c < t[-1]:
            t = c
        else:
            t += c
            if len(t) > lm:
                m, lm = t, len(t)
            if n + lm > ls:
                break
    return m
Answered By: Grismar

Linear time:

s = 'zabcabcd'

longest = current = []
for c in s:
    if [c] < current[-1:]:
        current = []
    current += c
    longest = max(longest, current, key=len)

print(''.join(longest))

Your PEP 8 issues I see:

  • "Limit all lines to a maximum of 79 characters." (link) – You have two lines longer than that.
  • "do not rely on CPython’s efficient implementation of in-place string concatenation for statements in the form a += b" […] the ''.join() form should be used instead" (link). You do that repeated string concatenation.

Also, yours crashes if the input string is empty.

1 line 72 characters:

s='zabcabcd';print(max([t:='']+[t:=t*(c>=t[-1:])+c for c in s],key=len))

Optimized linear time (I might add benchmarks tomorrow):

def Kelly_fast(s):
    maxstart = maxlength = start = length = 0
    prev = ''
    for c in s:
        if c >= prev:
            length += 1
        else:
            if length > maxlength:
                maxstart = start
                maxlength = length
            start += length
            length = 1
        prev = c
    if length > maxlength:
        maxstart = start
        maxlength = length
    return s[maxstart : maxstart+maxlength]
Answered By: Kelly Bundy

You can keep appending characters from the input string to a candidate list, but clear the list when the current character is lexicographically smaller than the last character in the list, and set the candidate list as the output list if it’s longer than the current output list. Join the list into a string for the final output:

s = 'zabcabcdabc'
candidate = longest = []
for c in s:
    if candidate and c < candidate[-1]:
        candidate = []
    candidate.append(c)
    if len(candidate) > len(longest):
        longest = candidate
print(''.join(longest))

This outputs:

abcd
Answered By: blhsing