How does one find the largest consecutive set of numbers in a list that are not necessarily adjacent?

Question:

For instance, if I have a list

``````[1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11]
``````

This algorithm should return [1,2,3,4,5,6,7,8,9,10,11].

To clarify, the longest list should run forwards. I was wondering what is an algorithmically efficient way to do this (preferably not O(n^2))?

Also, I’m open to a solution not in python since the algorithm is what matters.

Thank you.

This should do the trick (and is O(n)):

``````target = 1
result = []
for x in list:
for y in result:
if y[0] == target:
y[0] += 1
result.append(x)
``````

For any starting number, this works:

``````result = []
for x in mylist:
matched = False
for y in result:
if y[0] == x:
matched = True
y[0] += 1
y.append(x)
if not matched:
result.append([x+1, x])
return max(result, key=len)[1:]
``````

You can use The Patience Sort implementation of the Largest Ascending Sub-sequence Algorithm

``````def LargAscSub(seq):
deck = []
for x in seq:
newDeck = [x]
i = bisect.bisect_left(deck, newDeck)
deck[i].insert(0, x) if i != len(deck) else deck.append(newDeck)
return [p[0] for p in deck]
``````

And here is the Test results

``````>>> LargAscSub([1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11])
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>>> LargAscSub([1, 2, 3, 11, 12, 13, 14])
[1, 2, 3, 11, 12, 13, 14]
>>> LargAscSub([11,12,13,14])
[11, 12, 13, 14]
``````

The Order of Complexity is O(nlogn)

There was one note in the wiki link where they claimed that you can achieve O(n.loglogn) by relying on Van Emde Boas tree

Not that clever, not O(n), could use a bit of optimization. But it works.

``````def longest(seq):
result = []
for v in seq:
for l in result:
if v == l[-1] + 1:
l.append(v)
else:
result.append([v])
return max(result, key=len)
``````

How about using a modified Radix Sort? As JanneKarila pointed out the solution is not O(n). It uses Radix sort, which wikipedia says `Radix sort's efficiency is O(k·n) for n keys which have k or fewer digits.`

This will only work if you know the range of numbers that we’re dealing with so that will be the first step.

1. Look at each element in starting list to find lowest, `l` and highest, `h` number. In this case `l` is 1 and `h` is 11. Note, if you already know the range for some reason, you can skip this step.

2. Create a result list the size of our range and set each element to null.

3. Look at each element in list and add them to the result list at the appropriate place if needed. ie, the element is a 4, add a 4 to the result list at position 4. `result[element] = starting_list[element]`. You can throw out duplicates if you want, they’ll just be overwritten.

4. Go through the result list to find the longest sequence without any null values. Keep a `element_counter` to know what element in the result list we’re looking at. Keep a `curr_start_element` set to the beginning element of the current sequence and keep a `curr_len` of how long the current sequence is. Also keep a `longest_start_element` and a `longest_len’ which will start out as zero and be updated as we move through the list.

5. Return the result list starting at `longest_start_element` and taking `longest_len`

EDIT: Code added. Tested and working

``````#note this doesn't work with negative numbers
#it's certainly possible to write this to work with negatives
# but the code is a bit hairier
import sys
def findLongestSequence(lst):
#step 1
high = -sys.maxint - 1

for num in lst:
if num > high:
high = num

#step 2
result = [None]*(high+1)

#step 3
for num in lst:
result[num] = num

#step 4
curr_start_element = 0
curr_len = 0
longest_start_element = -1
longest_len = -1

for element_counter in range(len(result)):
if result[element_counter] == None:

if curr_len > longest_len:
longest_start_element = curr_start_element
longest_len = curr_len

curr_len = 0
curr_start_element = -1

elif curr_start_element == -1:
curr_start_element = element_counter

curr_len += 1

#just in case the last element makes the longest
if curr_len > longest_len:
longest_start_element = curr_start_element
longest_len = curr_len

#step 5
return result[longest_start_element:longest_start_element + longest_len-1]
``````

If the result really does have to be a sub-sequence of consecutive ascending integers, rather than merely ascending integers, then there’s no need to remember each entire consecutive sub-sequence until you determine which is the longest, you need only remember the starting and ending values of each sub-sequence. So you could do something like this:

``````def longestConsecutiveSequence(sequence):
# map starting values to largest ending value so far
map = collections.OrderedDict()

for i in sequence:
found = False
for k, v in map.iteritems():
if i == v:
map[k] += 1
found = True

map[i] = i + 1

return xrange(*max(map.iteritems(), key=lambda i: i[1] - i[0]))
``````

If I run this on the original sample date (i.e. `[1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11]`) I get:

``````>>> print list(longestConsecutiveSequence([1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11]))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
``````

If I run it on one of Abhijit’s samples `[1,2,3,11,12,13,14]`, I get:

``````>>> print list(longestConsecutiveSequence([1,2,3,11,12,13,14]))
[11, 12, 13, 14]
``````

Regrettably, this algorithm is O(n*n) in the worst case.

Here is a simple one-pass O(n) solution:

``````s = [1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11,42]
maxrun = -1
rl = {}
for x in s:
run = rl[x] = rl.get(x-1, 0) + 1
print x-run+1, 'to', x
if run > maxrun:
maxend, maxrun = x, run
print range(maxend-maxrun+1, maxend+1)
``````

The logic may be a little more self-evident if you think in terms of ranges instead of individual variables for the endpoint and run length:

``````rl = {}
best_range = xrange(0)
for x in s:
run = rl[x] = rl.get(x-1, 0) + 1
r = xrange(x-run+1, x+1)
if len(r) > len(best_range):
best_range = r
print list(best_range)
``````

Warning: This is the cheaty way to do it (aka I use python…)

``````import operator as op
import itertools as it

def longestSequence(data):

longest = []

for k, g in it.groupby(enumerate(set(data)), lambda(i, y):i-y):
thisGroup = map(op.itemgetter(1), g)

if len(thisGroup) > len(longest):
longest = thisGroup

return longest

longestSequence([1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11, 15,15,16,17,25])
``````

You need the Maximum contiguous sum(Optimal Substructure):

``````def msum2(a):
bounds, s, t, j = (0,0), -float('infinity'), 0, 0

for i in range(len(a)):
t = t + a[i]
if t > s: bounds, s = (j, i+1), t
if t < 0: t, j = 0, i+1
return (s, bounds)
``````

This is an example of dynamic programming and is O(N)

O(n) solution works even if the sequence does not start from the first element.

Warning does not work if len(A) = 0.

``````A = [1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11]
def pre_process(A):
Last = {}
Arrow = []
Length = []
ArgMax = 0
Max = 0
for i in xrange(len(A)):
Arrow.append(i)
Length.append(0)
if A[i] - 1 in Last:
Aux = Last[A[i] - 1]
Arrow[i] = Aux
Length[i] = Length[Aux] + 1
Last[A[i]] = i
if Length[i] > Max:
ArgMax = i
Max = Length[i]
return (Arrow,ArgMax)

(Arr,Start) = pre_process(A)
Old = Arr[Start]
ToRev = []
while 1:
ToRev.append(A[Start])
if Old == Start:
break
Start = Old
New = Arr[Start]
Old = New
ToRev.reverse()
print ToRev
``````

Pythonizations are welcome!!

Ok, here’s yet another attempt in python:

``````def popper(l):
listHolders = []
pos = 0
while l:
appended = False
item = l.pop()
for holder in listHolders:
if item == holder[-1][0]-1:
appended = True
holder.append((item, pos))
if not appended:
pos += 1
listHolders.append([(item, pos)])
longest = []
for holder in listHolders:
try:
if (holder[0][0] < longest[-1][0]) and (holder[0][1] > longest[-1][1]):
longest.extend(holder)
except:
pass
if len(holder) > len(longest):
longest = holder
longest.reverse()
return [x[0] for x in longest]
``````

Sample inputs and outputs:

``````>>> demo = list(range(50))
>>> shuffle(demo)
>>> demo
[40, 19, 24, 5, 48, 36, 23, 43, 14, 35, 18, 21, 11, 7, 34, 16, 38, 25, 46, 27, 26, 29, 41, 8, 31, 1, 33, 2, 13, 6, 44, 22, 17,
12, 39, 9, 49, 3, 42, 37, 30, 10, 47, 20, 4, 0, 28, 32, 45, 15]
>>> popper(demo)
[1, 2, 3, 4]
>>> demo = [1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11]
>>> popper(demo)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>>>
``````
``````s = [1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11]
l = list(set(s))
l.sort()

output = []
for i in l:
if not output or i - output[-1][-1] != 1:
output.append([])
output[-1].append(i)
max_elem = max(output, key=len)
print(range(max_elem[0],max_elem[-1]+1))
``````
Categories: questions
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.