My method for eliminating the largest sequence of repeated elements on singly linked list does not work
Question:
This code is supposed to search through a Node list for the largest sequence of repeated elements and then eliminate it (each Node has connection only to the next element, it is a singly linked list). The execution gets stuck in the last element of the list ( which isnt printed, I dont understand why) and the code never finishes ( it never gets to print the last "print").My code does count correctly the position of each element but it gets stuck on the last one. In previous tinkering with the code it either not eliminate the largest sequence or it deleted the whole list.
from slistH import SList
from slistH import SNode
import sys
class SList2(SList):
def delLargestSeq(self):
# implement here your solution
previous_to_seq = self._head # we start at the first position for all pointers
first_after_sequence = self._head
pointer2 = self._head
pointer = self._head
new_pointer = self._head # this pointer will store the node that is the most repeated
a = 0 # this variable controls that it has been the first time that a condition has been fulfilled
count = 0
first = 0 # this variable controls the position of the first element o a sequence
last = 0
prev_first = 0 # stores the biggest sequence's first element position (until a bigger one is detected)
prev_last = 0
while pointer is not None:
if a == 0 :
if pointer.elem == pointer.next.elem:
last = count + 1
else:
if (last - first) >= (prev_last - prev_first): # if a bigger sequence is detected...
prev_first = first
prev_last = last
new_pointer = pointer # this will now be the most repeated element
first = last + 1 # the next element will be the first of the next sequence that we will compare
last = first
print(f"{pointer.elem} || {count} ") # these are just to observe that everything is working
pointer = pointer.next
count += 1
if pointer is None:
a = 1
for i in range(prev_last): # goes through the list until the biggest sequence's last element's position
if pointer2.next.elem == new_pointer.elem:
if a == 1: # if it is the first element of the sequence...
previous_to_seq = pointer2 # it is the first element before the biggest sequence
a = 2 # this ensures that it is the first time that this has been executed
elif a == 2 and pointer2.next.elem != new_pointer.elem:
first_after_sequence = pointer2.next # first element different from the ones in the sequence after it
a = 3
pointer2 = pointer2.next
if previous_to_seq.elem == new_pointer.elem: # in case there is no element before the sequence
previous_to_seq = previous_to_seq.next
previous_to_seq.next = first_after_sequence
print(f"first after sequence: {first_after_sequence.elem}")
print(f"last before sequence {previous_to_seq.elem}")
nodes_list = SList2()
for i in (1, 3, 3, 3, 3, 4, 4, 5, 5, 5, 6, 6):
nodes_list.addLast(i)
print(nodes_list)
nodes_list.delLargestSeq()
print(f"{nodes_list} || final list")
The execution gets stuck in the last element of the list ( which isnt printed, I dont understand why) and the code never finishes ( it never gets to print the last "print").My code does count correctly the position of each element but it gets stuck on the last one. In previous tinkering with the code it either not eliminate the largest sequence or it deleted the whole list. I also had a problem where the loop reached "None" and None has no ".next".
Answers:
This statement will eventually run when pointer
is the last node in the list:
if pointer.elem == pointer.next.elem:
At that moment pointer.next
is None
, and so pointer.next.elem
is an invalid reference, and will produce an error.
Some other remarks:
-
You have too many comments that just state the obvious. A comment should not translate the statement into English, but should give a higher level of meaning. If there is no such meaning to give, then don’t add a comment.
-
Naming is important and could make the code more readable (without the need to comment every line). For instance, a
is a bad name for something that is supposed to indicate "…that it has been the first time that a condition has been fulfilled".
-
In the first loop, the variable a
is only updated when the loop is about to exit, so that means that the first loop will never execute the else
block and will never update the value of first
-
The algorithm depends too much on positions. In a linked list you should not need to keep track of positions (indices), but of nodes. Indices are what you would typically use when working with a native list, not so with linked lists. The advantage of keeping track of node references is that you will not need a second loop to perform the actual removal. In a linked list, a removal can be done with one assignment only.
-
There is no provision in your code for the case where the removal has to happen at the very start of the list, because in that case the _head
reference should be updated, but there is no such statement in your implementation.
There are more issues, but I think this is enough to explain why the algorithm doesn’t work.
Assumptions
I add an implementation below, but I had to make some assumptions which were not clarified in your question:
-
If a (non-empty) list does not have two consecutive nodes that have an equal value, then the longest sequence with repeating values would be 1. As this is not really something that could be called "duplicate", I assume that in this case the list should stay as it is, without any removal.
-
If the list has two or more sections with duplicates that have the same length, and these happen to be the longest, then only the first section of nodes will be removed.
-
It is OK to create a helper node, which can be created with SNode(None)
. As you didn’t provide the SNode
implementation, I can only guess that the constructor can be called like that.
Algorithm
To remove a section from a linked list, you can better track which is the node that precedes that section, because it is that node that will need its next
attribute to be updated.
The first section of duplicates could occur at the very start of the list, and then there is no such predecessor node. In that case we have a different situation where the _head
attribute of the list should be updated (if it happens to be the longest duplicate series).
To allow these different scenarios to be dealt with in the same way, it is common practice to create a dummy node that is prefixed before the head node of the list. Then the predecessor node for the first series is that dummy node, and if that section needs to be removed, we can update the next
attribute of the predecessor without taking any special precautions. Finally we can assign dummy.next
back to the _head
attribute and this will update the head correctly when relevant.
Implementation
def delLargestSeq(self):
# Create a dummy node in front of the list, as this facilitates the rest
# of the code, and makes it easy to update the head at the end of the process
dummy = SNode(None)
dummy.next = self._head
# For efficient removal, we should keep track of the node that *precedes*
# the first node to remove
beforeDupes = dummy
beforeFirstToRemove = lastToRemove = None
numNodesToRemove = 1
while beforeDupes.next:
endDupes = beforeDupes.next
numNodes = 1
# Extend the selection of nodes for as long as they have the same value
while endDupes.next and endDupes.next.elem == endDupes.elem:
numNodes += 1
endDupes = endDupes.next
# Keep track of the longest sequence
if numNodes > numNodesToRemove:
numNodesToRemove = numNodes
beforeFirstToRemove = beforeDupes
lastToRemove = endDupes
beforeDupes = endDupes
# Assuming that a removal must concern at least 2 nodes,
# as a single node cannot be called "duplicate"
if numNodesToRemove > 1:
# Remove the longest sequence of duplicates
beforeFirstToRemove.next = lastToRemove.next
# If the sequence started at the first node, the head reference must be updated
self._head = dummy.next
See it run on repl.it
This code is supposed to search through a Node list for the largest sequence of repeated elements and then eliminate it (each Node has connection only to the next element, it is a singly linked list). The execution gets stuck in the last element of the list ( which isnt printed, I dont understand why) and the code never finishes ( it never gets to print the last "print").My code does count correctly the position of each element but it gets stuck on the last one. In previous tinkering with the code it either not eliminate the largest sequence or it deleted the whole list.
from slistH import SList
from slistH import SNode
import sys
class SList2(SList):
def delLargestSeq(self):
# implement here your solution
previous_to_seq = self._head # we start at the first position for all pointers
first_after_sequence = self._head
pointer2 = self._head
pointer = self._head
new_pointer = self._head # this pointer will store the node that is the most repeated
a = 0 # this variable controls that it has been the first time that a condition has been fulfilled
count = 0
first = 0 # this variable controls the position of the first element o a sequence
last = 0
prev_first = 0 # stores the biggest sequence's first element position (until a bigger one is detected)
prev_last = 0
while pointer is not None:
if a == 0 :
if pointer.elem == pointer.next.elem:
last = count + 1
else:
if (last - first) >= (prev_last - prev_first): # if a bigger sequence is detected...
prev_first = first
prev_last = last
new_pointer = pointer # this will now be the most repeated element
first = last + 1 # the next element will be the first of the next sequence that we will compare
last = first
print(f"{pointer.elem} || {count} ") # these are just to observe that everything is working
pointer = pointer.next
count += 1
if pointer is None:
a = 1
for i in range(prev_last): # goes through the list until the biggest sequence's last element's position
if pointer2.next.elem == new_pointer.elem:
if a == 1: # if it is the first element of the sequence...
previous_to_seq = pointer2 # it is the first element before the biggest sequence
a = 2 # this ensures that it is the first time that this has been executed
elif a == 2 and pointer2.next.elem != new_pointer.elem:
first_after_sequence = pointer2.next # first element different from the ones in the sequence after it
a = 3
pointer2 = pointer2.next
if previous_to_seq.elem == new_pointer.elem: # in case there is no element before the sequence
previous_to_seq = previous_to_seq.next
previous_to_seq.next = first_after_sequence
print(f"first after sequence: {first_after_sequence.elem}")
print(f"last before sequence {previous_to_seq.elem}")
nodes_list = SList2()
for i in (1, 3, 3, 3, 3, 4, 4, 5, 5, 5, 6, 6):
nodes_list.addLast(i)
print(nodes_list)
nodes_list.delLargestSeq()
print(f"{nodes_list} || final list")
The execution gets stuck in the last element of the list ( which isnt printed, I dont understand why) and the code never finishes ( it never gets to print the last "print").My code does count correctly the position of each element but it gets stuck on the last one. In previous tinkering with the code it either not eliminate the largest sequence or it deleted the whole list. I also had a problem where the loop reached "None" and None has no ".next".
This statement will eventually run when pointer
is the last node in the list:
if pointer.elem == pointer.next.elem:
At that moment pointer.next
is None
, and so pointer.next.elem
is an invalid reference, and will produce an error.
Some other remarks:
-
You have too many comments that just state the obvious. A comment should not translate the statement into English, but should give a higher level of meaning. If there is no such meaning to give, then don’t add a comment.
-
Naming is important and could make the code more readable (without the need to comment every line). For instance,
a
is a bad name for something that is supposed to indicate "…that it has been the first time that a condition has been fulfilled". -
In the first loop, the variable
a
is only updated when the loop is about to exit, so that means that the first loop will never execute theelse
block and will never update the value offirst
-
The algorithm depends too much on positions. In a linked list you should not need to keep track of positions (indices), but of nodes. Indices are what you would typically use when working with a native list, not so with linked lists. The advantage of keeping track of node references is that you will not need a second loop to perform the actual removal. In a linked list, a removal can be done with one assignment only.
-
There is no provision in your code for the case where the removal has to happen at the very start of the list, because in that case the
_head
reference should be updated, but there is no such statement in your implementation.
There are more issues, but I think this is enough to explain why the algorithm doesn’t work.
Assumptions
I add an implementation below, but I had to make some assumptions which were not clarified in your question:
-
If a (non-empty) list does not have two consecutive nodes that have an equal value, then the longest sequence with repeating values would be 1. As this is not really something that could be called "duplicate", I assume that in this case the list should stay as it is, without any removal.
-
If the list has two or more sections with duplicates that have the same length, and these happen to be the longest, then only the first section of nodes will be removed.
-
It is OK to create a helper node, which can be created with
SNode(None)
. As you didn’t provide theSNode
implementation, I can only guess that the constructor can be called like that.
Algorithm
To remove a section from a linked list, you can better track which is the node that precedes that section, because it is that node that will need its next
attribute to be updated.
The first section of duplicates could occur at the very start of the list, and then there is no such predecessor node. In that case we have a different situation where the _head
attribute of the list should be updated (if it happens to be the longest duplicate series).
To allow these different scenarios to be dealt with in the same way, it is common practice to create a dummy node that is prefixed before the head node of the list. Then the predecessor node for the first series is that dummy node, and if that section needs to be removed, we can update the next
attribute of the predecessor without taking any special precautions. Finally we can assign dummy.next
back to the _head
attribute and this will update the head correctly when relevant.
Implementation
def delLargestSeq(self):
# Create a dummy node in front of the list, as this facilitates the rest
# of the code, and makes it easy to update the head at the end of the process
dummy = SNode(None)
dummy.next = self._head
# For efficient removal, we should keep track of the node that *precedes*
# the first node to remove
beforeDupes = dummy
beforeFirstToRemove = lastToRemove = None
numNodesToRemove = 1
while beforeDupes.next:
endDupes = beforeDupes.next
numNodes = 1
# Extend the selection of nodes for as long as they have the same value
while endDupes.next and endDupes.next.elem == endDupes.elem:
numNodes += 1
endDupes = endDupes.next
# Keep track of the longest sequence
if numNodes > numNodesToRemove:
numNodesToRemove = numNodes
beforeFirstToRemove = beforeDupes
lastToRemove = endDupes
beforeDupes = endDupes
# Assuming that a removal must concern at least 2 nodes,
# as a single node cannot be called "duplicate"
if numNodesToRemove > 1:
# Remove the longest sequence of duplicates
beforeFirstToRemove.next = lastToRemove.next
# If the sequence started at the first node, the head reference must be updated
self._head = dummy.next
See it run on repl.it