How to remove all duplicate items from a list

Question:

How would I use python to check a list and delete all duplicates? I don’t want to have to specify what the duplicate item is – I want the code to figure out if there are any and remove them if so, keeping only one instance of each. It also must work if there are multiple duplicates in a list.

For example, in my code below, the list lseparatedOrbList has 12 items – one is repeated six times, one is repeated five times, and there is only one instance of one. I want it to change the list so there are only three items – one of each, and in the same order they appeared before. I tried this:

for i in lseparatedOrbList:
   for j in lseparatedOrblist:
        if lseparatedOrbList[i] == lseparatedOrbList[j]:
            lseparatedOrbList.remove(lseparatedOrbList[j])

But I get the error:

Traceback (most recent call last):
  File "qchemOutputSearch.py", line 123, in <module>
    for j in lseparatedOrblist:
NameError: name 'lseparatedOrblist' is not defined

I’m guessing because it’s because I’m trying to loop through lseparatedOrbList while I loop through it, but I can’t think of another way to do it.

Asked By: laplacian

||

Answers:

Use set():

woduplicates = set(lseparatedOrblist)

Returns a set without duplicates. If you, for some reason, need a list back:

woduplicates = list(set(lseperatedOrblist))

This will, however, have a different order than your original list.

Answered By: Jacob

It’s because you are missing a capital letter, actually.

Purposely dedented:

for i in lseparatedOrbList:   # capital 'L'
for j in lseparatedOrblist:   # lowercase 'l'

Though the more efficient way to do it would be to insert the contents into a set.

If maintaining the list order matters (ie, it must be “stable”), check out the answers on this question

Answered By: Daniel DiPaolo

This should do it for you:

new_list = list(set(old_list))

set will automatically remove duplicates. list will cast it back to a list.

Answered By: Manny D

Just make a new list to populate, if the item for your list is not yet in the new list input it, else just move on to the next item in your original list.

for i in mylist:
  if i not in newlist:
    newlist.append(i)
Answered By: Jonathon Vandezande

You can do this like that:

x = list(set(x))

Example: if you do something like that:

x = [1,2,3,4,5,6,7,8,9,10,2,1,6,31,20]
x = list(set(x))
x

you will see the following result:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 31]

There is only one thing you should think of: the resulting list will not be ordered as the original one (will lose the order in the process).

Answered By: Tadeck

No, it’s simply a typo, the “list” at the end must be capitalized. You can nest loops over the same variable just fine (although there’s rarely a good reason to).

However, there are other problems with the code. For starters, you’re iterating through lists, so i and j will be items not indices. Furthermore, you can’t change a collection while iterating over it (well, you “can” in that it runs, but madness lies that way – for instance, you’ll propably skip over items). And then there’s the complexity problem, your code is O(n^2). Either convert the list into a set and back into a list (simple, but shuffles the remaining list items) or do something like this:

seen = set()
new_x = []
for x in xs:
    if x in seen:
        continue
    seen.add(x)
    new_xs.append(x)

Both solutions require the items to be hashable. If that’s not possible, you’ll probably have to stick with your current approach sans the mentioned problems.

Answered By: user395760

This should be faster and will preserve the original order:

seen = {}
new_list = [seen.setdefault(x, x) for x in my_list if x not in seen]

If you don’t care about order, you can just:

new_list = list(set(my_list))
Answered By: Paolo Moretti

for unhashable lists. It is faster as it does not iterate about already checked entries.

def purge_dublicates(X):
    unique_X = []
    for i, row in enumerate(X):
        if row not in X[i + 1:]:
            unique_X.append(row)
    return unique_X

The modern way to do it that maintains the order is:

>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys(lseparatedOrbList))

as discussed by Raymond Hettinger in this answer. In python 3.5 and above this is also the fastest way – see the linked answer for details. However the keys must be hashable (as is the case in your list I think)


As of python 3.7 ordered dicts are a language feature so the above call becomes

>>> list(dict.fromkeys(lseparatedOrbList))

Performance:

"""Dedup list."""
import sys
import timeit

repeat = 3
numbers = 1000

setup = """"""
def timer(statement, msg='', _setup=None):
    print(msg, min(
        timeit.Timer(statement, setup=_setup or setup).repeat(
            repeat, numbers)))

print(sys.version)
s = """import random; n=%d; li = [random.randint(0, 100) for _ in range(n)]"""
for siz, m in ((150, "nFew duplicates"), (15000, "nMany duplicates")):
    print(m)
    setup = s % siz
    timer('s = set(); [i for i in li if i not in s if not s.add(i)]', "s.add(i):")
    timer('list(dict.fromkeys(li))', "dict:")
    timer('list(set(li))', 'Not order preserving: list(set(li)):')

gives:

3.7.6 (tags/v3.7.6:43364a7ae0, Dec 19 2019, 00:42:30) [MSC v.1916 64 bit (AMD64)]

Few duplicates
s.add(i): 0.008242200000040611
dict: 0.0037373999998635554
Not order preserving: list(set(li)): 0.0029409000001123786

Many duplicates
s.add(i): 0.2839437000000089
dict: 0.21970469999996567
Not order preserving: list(set(li)): 0.102068700000018

So dict seems consistently faster although approaching list comprehension with set.add for many duplicates – not sure if further varying the numbers would give different results. list(set) is of course faster but does not preserve original list order, a requirement here

Answered By: Mr_and_Mrs_D

There is a faster way to fix this:

list = [1, 1.0, 1.41, 1.73, 2, 2, 2.0, 2.24, 3, 3, 4, 4, 4, 5, 6, 6, 8, 8, 9, 10]
list2=[]

for value in list:
    try:
        list2.index(value)
    except:
        list2.append(value)
list.clear()
for value in list2:
    list.append(value)
list2.clear()
print(list)
print(list2)
Answered By: aurel

In this way one can delete a particular item which is present multiple times in a list : Try deleting all 5

list1=[1,2,3,4,5,6,5,3,5,7,11,5,9,8,121,98,67,34,5,21]
print list1
n=input("item to be deleted : " )
for i in list1:
    if n in list1:
        list1.remove(n)
print list1
Answered By: ajeet214
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.