My method for eliminating the largest sequence of repeated elements on singly linked list does not work

Question:

This code is supposed to search through a Node list for the largest sequence of repeated elements and then eliminate it (each Node has connection only to the next element, it is a singly linked list). The execution gets stuck in the last element of the list ( which isnt printed, I dont understand why) and the code never finishes ( it never gets to print the last "print").My code does count correctly the position of each element but it gets stuck on the last one. In previous tinkering with the code it either not eliminate the largest sequence or it deleted the whole list.

from slistH import SList
from slistH import SNode
import sys


class SList2(SList):
    def delLargestSeq(self):
        # implement here your solution
        previous_to_seq = self._head # we start at the first position for all pointers
        first_after_sequence = self._head
        pointer2 = self._head
        pointer = self._head
        new_pointer = self._head # this pointer will store the node that is the most repeated

        a = 0  # this variable controls that it has been the first time that a condition has been fulfilled
        count = 0 

        first = 0 # this variable controls the position of the first element o a sequence
        last = 0 
        prev_first = 0 # stores the biggest sequence's first element position (until a bigger one is detected)
        prev_last = 0   


        while pointer is not None: 
            if  a == 0 : 
              if pointer.elem == pointer.next.elem:
                last = count + 1 

            else:
                if (last - first) >= (prev_last - prev_first): # if a bigger sequence is detected...
                    prev_first = first 
                    prev_last = last 
                    new_pointer = pointer # this will now be the most repeated element


                first = last + 1 # the next element will be the first of the next sequence that we will compare
                last = first

            print(f"{pointer.elem} || {count} ") # these are just to observe that everything is working

            pointer = pointer.next 
            count += 1 
            if pointer is None:
                a = 1

        for i in range(prev_last): # goes through the list until the biggest sequence's last element's position
            if pointer2.next.elem == new_pointer.elem: 
                if a == 1: # if it is the first element of the sequence...
                    previous_to_seq = pointer2 # it is the first element before the biggest sequence
                    a = 2 # this ensures that it is the first time that this has been executed

            elif a == 2 and pointer2.next.elem != new_pointer.elem: 
                first_after_sequence = pointer2.next # first element different from the ones in the sequence after it
                a = 3 

            pointer2 = pointer2.next

        if previous_to_seq.elem == new_pointer.elem: # in case there is no element before the sequence
            previous_to_seq = previous_to_seq.next
            
        previous_to_seq.next = first_after_sequence
        print(f"first after sequence: {first_after_sequence.elem}")
        print(f"last before sequence {previous_to_seq.elem}")

nodes_list = SList2()
for i in (1, 3, 3, 3, 3, 4, 4, 5, 5, 5, 6, 6):
    nodes_list.addLast(i)

print(nodes_list)

nodes_list.delLargestSeq()

print(f"{nodes_list} || final list")          

The execution gets stuck in the last element of the list ( which isnt printed, I dont understand why) and the code never finishes ( it never gets to print the last "print").My code does count correctly the position of each element but it gets stuck on the last one. In previous tinkering with the code it either not eliminate the largest sequence or it deleted the whole list. I also had a problem where the loop reached "None" and None has no ".next".

Asked By: Raúl Armas

||

Answers:

This statement will eventually run when pointer is the last node in the list:

if pointer.elem == pointer.next.elem:

At that moment pointer.next is None, and so pointer.next.elem is an invalid reference, and will produce an error.

Some other remarks:

  • You have too many comments that just state the obvious. A comment should not translate the statement into English, but should give a higher level of meaning. If there is no such meaning to give, then don’t add a comment.

  • Naming is important and could make the code more readable (without the need to comment every line). For instance, a is a bad name for something that is supposed to indicate "…that it has been the first time that a condition has been fulfilled".

  • In the first loop, the variable a is only updated when the loop is about to exit, so that means that the first loop will never execute the else block and will never update the value of first

  • The algorithm depends too much on positions. In a linked list you should not need to keep track of positions (indices), but of nodes. Indices are what you would typically use when working with a native list, not so with linked lists. The advantage of keeping track of node references is that you will not need a second loop to perform the actual removal. In a linked list, a removal can be done with one assignment only.

  • There is no provision in your code for the case where the removal has to happen at the very start of the list, because in that case the _head reference should be updated, but there is no such statement in your implementation.

There are more issues, but I think this is enough to explain why the algorithm doesn’t work.

Assumptions

I add an implementation below, but I had to make some assumptions which were not clarified in your question:

  • If a (non-empty) list does not have two consecutive nodes that have an equal value, then the longest sequence with repeating values would be 1. As this is not really something that could be called "duplicate", I assume that in this case the list should stay as it is, without any removal.

  • If the list has two or more sections with duplicates that have the same length, and these happen to be the longest, then only the first section of nodes will be removed.

  • It is OK to create a helper node, which can be created with SNode(None). As you didn’t provide the SNode implementation, I can only guess that the constructor can be called like that.

Algorithm

To remove a section from a linked list, you can better track which is the node that precedes that section, because it is that node that will need its next attribute to be updated.

The first section of duplicates could occur at the very start of the list, and then there is no such predecessor node. In that case we have a different situation where the _head attribute of the list should be updated (if it happens to be the longest duplicate series).

To allow these different scenarios to be dealt with in the same way, it is common practice to create a dummy node that is prefixed before the head node of the list. Then the predecessor node for the first series is that dummy node, and if that section needs to be removed, we can update the next attribute of the predecessor without taking any special precautions. Finally we can assign dummy.next back to the _head attribute and this will update the head correctly when relevant.

Implementation

    def delLargestSeq(self):
        # Create a dummy node in front of the list, as this facilitates the rest
        #   of the code, and makes it easy to update the head at the end of the process
        dummy = SNode(None)
        dummy.next = self._head
        
        # For efficient removal, we should keep track of the node that *precedes*
        #    the first node to remove
        beforeDupes = dummy
        beforeFirstToRemove = lastToRemove = None
        numNodesToRemove = 1
        
        while beforeDupes.next:
            endDupes = beforeDupes.next
            numNodes = 1
            # Extend the selection of nodes for as long as they have the same value
            while endDupes.next and endDupes.next.elem == endDupes.elem:
                numNodes += 1
                endDupes = endDupes.next
            # Keep track of the longest sequence
            if numNodes > numNodesToRemove:
                numNodesToRemove = numNodes
                beforeFirstToRemove = beforeDupes
                lastToRemove = endDupes
            beforeDupes = endDupes

        # Assuming that a removal must concern at least 2 nodes, 
        #     as a single node cannot be called "duplicate"
        if numNodesToRemove > 1: 
            # Remove the longest sequence of duplicates
            beforeFirstToRemove.next = lastToRemove.next
            # If the sequence started at the first node, the head reference must be updated
            self._head = dummy.next

See it run on repl.it

Answered By: trincot