Fastest way to remove 'directly' repeated items in a Python list?
Question:
There are several questions on removing duplicate items from a list, but I am looking for a fast way to remove ‘directly’ repeated entries from a list:
myList = [1, 2, 3, 3, 2, 4, 4, 1, 4]
should become:
myList = [1, 2, 3, 2, 4, 1, 4]
So the entries which are directly repeated should ‘collapse’ to a single entry.
I tried:
myList = [1, 2, 3, 3, 2, 4, 4, 1, 4]
result = []
for i in range(len(myList)-1):
if(myList[i] != myList[i+1]):
result.append(myList[i])
if(myList[-1] != myList[-2]):
result.append(myList[-1])
print(result)
Which seems to work, but it’s a little ugly (how it deals with the end, and large).
I’m wondering if there is a better way to do this (shorter), and more importantly if there is a faster way to do this.
Answers:
Here’s a more concise version of your implementation:
def shorten_list(input_list):
output_list = []
output_list.append(input_list[0])
for i in input_list[1:]:
if output_list[-1] != i:
output_list.append(i)
return output_list
EDIT: As pointed out, this list comprehension gets very slow. DecoderS’ answer gives a better list comprehension.
or a list comprehension:
[my_list[i] for i in range(len(my_list)) if my_list[i] != ([None]+my_list)[i]]
which "right-shifts" the list by one, and compares it to the original list. This method works more robustly than if my_list[i] != my_list[i-1]
as the latter would fail on the list [1, 1, 1]
(where in the first position it compares item 0 to item -1, both of which are 1)
I think we can use list comprehension
myList = [1, 2, 3, 3, 2, 4, 4, 1, 4]
newList = [myList[i] for i in range(len(myList)) if i == 0 or myList[i] !=
myList[i-1]]
print(newList) # Output: [1, 2, 3, 2, 4, 1, 4]
This is one of the uses of the standard itertools.groupby
function:
import itertools
deduplicated_list = [item for (item, group) in itertools.groupby(myList)]
There are several questions on removing duplicate items from a list, but I am looking for a fast way to remove ‘directly’ repeated entries from a list:
myList = [1, 2, 3, 3, 2, 4, 4, 1, 4]
should become:
myList = [1, 2, 3, 2, 4, 1, 4]
So the entries which are directly repeated should ‘collapse’ to a single entry.
I tried:
myList = [1, 2, 3, 3, 2, 4, 4, 1, 4]
result = []
for i in range(len(myList)-1):
if(myList[i] != myList[i+1]):
result.append(myList[i])
if(myList[-1] != myList[-2]):
result.append(myList[-1])
print(result)
Which seems to work, but it’s a little ugly (how it deals with the end, and large).
I’m wondering if there is a better way to do this (shorter), and more importantly if there is a faster way to do this.
Here’s a more concise version of your implementation:
def shorten_list(input_list):
output_list = []
output_list.append(input_list[0])
for i in input_list[1:]:
if output_list[-1] != i:
output_list.append(i)
return output_list
EDIT: As pointed out, this list comprehension gets very slow. DecoderS’ answer gives a better list comprehension.
or a list comprehension:
[my_list[i] for i in range(len(my_list)) if my_list[i] != ([None]+my_list)[i]]
which "right-shifts" the list by one, and compares it to the original list. This method works more robustly than if my_list[i] != my_list[i-1]
as the latter would fail on the list [1, 1, 1]
(where in the first position it compares item 0 to item -1, both of which are 1)
I think we can use list comprehension
myList = [1, 2, 3, 3, 2, 4, 4, 1, 4]
newList = [myList[i] for i in range(len(myList)) if i == 0 or myList[i] !=
myList[i-1]]
print(newList) # Output: [1, 2, 3, 2, 4, 1, 4]
This is one of the uses of the standard itertools.groupby
function:
import itertools
deduplicated_list = [item for (item, group) in itertools.groupby(myList)]