# Removing duplicates in lists

## Question:

How can I check if a list has any duplicates and return a new list without duplicates?

The common approach to get a unique collection of items is to use a `set`. Sets are unordered collections of distinct objects. To create a set from any iterable, you can simply pass it to the built-in `set()` function. If you later need a real list again, you can similarly pass the set to the `list()` function.

The following example should cover whatever you are trying to do:

``````>>> t = [1, 2, 3, 1, 2, 3, 5, 6, 7, 8]
>>> list(set(t))
[1, 2, 3, 5, 6, 7, 8]
>>> s = [1, 2, 3]
>>> list(set(t) - set(s))
[8, 5, 6, 7]
``````

As you can see from the example result, the original order is not maintained. As mentioned above, sets themselves are unordered collections, so the order is lost. When converting a set back to a list, an arbitrary order is created.

### Maintaining order

If order is important to you, then you will have to use a different mechanism. A very common solution for this is to rely on `OrderedDict` to keep the order of keys during insertion:

``````>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys(t))
[1, 2, 3, 5, 6, 7, 8]
``````

Starting with Python 3.7, the built-in dictionary is guaranteed to maintain the insertion order as well, so you can also use that directly if you are on Python 3.7 or later (or CPython 3.6):

``````>>> list(dict.fromkeys(t))
[1, 2, 3, 5, 6, 7, 8]
``````

Note that this may have some overhead of creating a dictionary first, and then creating a list from it. If you don’t actually need to preserve the order, you’re often better off using a set, especially because it gives you a lot more operations to work with. Check out this question for more details and alternative ways to preserve the order when removing duplicates.

Finally note that both the `set` as well as the `OrderedDict`/`dict` solutions require your items to be hashable. This usually means that they have to be immutable. If you have to deal with items that are not hashable (e.g. list objects), then you will have to use a slow approach in which you will basically have to compare every item with every other item in a nested loop.

If you don’t care about the order, just do this:

``````def remove_duplicates(l):
return list(set(l))
``````

A `set` is guaranteed to not have duplicates.

It’s a one-liner: `list(set(source_list))` will do the trick.

A `set` is something that can’t possibly have duplicates.

Update: an order-preserving approach is two lines:

``````from collections import OrderedDict
OrderedDict((x, True) for x in source_list).keys()
``````

Here we use the fact that `OrderedDict` remembers the insertion order of keys, and does not change it when a value at a particular key is updated. We insert `True` as values, but we could insert anything, values are just not used. (`set` works a lot like a `dict` with ignored values, too.)

In Python 2.7, the new way of removing duplicates from an iterable while keeping it in the original order is:

``````>>> from collections import OrderedDict
['a', 'b', 'r', 'c', 'd']
``````

In Python 3.5, the OrderedDict has a C implementation. My timings show that this is now both the fastest and shortest of the various approaches for Python 3.5.

In Python 3.6, the regular dict became both ordered and compact. (This feature is holds for CPython and PyPy but may not present in other implementations). That gives us a new fastest way of deduping while retaining order:

``````>>> list(dict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']
``````

In Python 3.7, the regular dict is guaranteed to both ordered across all implementations. So, the shortest and fastest solution is:

``````>>> list(dict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']
``````

Try using sets:

``````import sets
t = sets.Set(['a', 'b', 'c', 'd'])
t1 = sets.Set(['a', 'b', 'c'])

print t | t1
print t - t1
``````
``````>>> t = [1, 2, 3, 1, 2, 5, 6, 7, 8]
>>> t
[1, 2, 3, 1, 2, 5, 6, 7, 8]
>>> s = []
>>> for i in t:
if i not in s:
s.append(i)
>>> s
[1, 2, 3, 5, 6, 7, 8]
``````

Another way of doing:

``````>>> seq = [1,2,3,'a', 'a', 1,2]
>> dict.fromkeys(seq).keys()
['a', 1, 2, 3]
``````

I had a dict in my list, so I could not use the above approach. I got the error:

``````TypeError: unhashable type:
``````

So if you care about order and/or some items are unhashable. Then you might find this useful:

``````def make_unique(original_list):
unique_list = []
[unique_list.append(obj) for obj in original_list if obj not in unique_list]
return unique_list
``````

Some may consider list comprehension with a side effect to not be a good solution. Here’s an alternative:

``````def make_unique(original_list):
unique_list = []
map(lambda x: unique_list.append(x) if (x not in unique_list) else False, original_list)
return unique_list
``````

Here is an example, returning list without repetiotions preserving order. Does not need any external imports.

``````def GetListWithoutRepetitions(loInput):
# return list, consisting of elements of list/tuple loInput, without repetitions.
# Example: GetListWithoutRepetitions([None,None,1,1,2,2,3,3,3])
# Returns: [None, 1, 2, 3]

if loInput==[]:
return []

loOutput = []

if loInput is None:
oGroupElement=1
else: # loInput<>None
oGroupElement=None

for oElement in loInput:
if oElement<>oGroupElement:
loOutput.append(oElement)
oGroupElement = oElement
return loOutput
``````

There are also solutions using Pandas and Numpy. They both return numpy array so you have to use the function `.tolist()` if you want a list.

``````t=['a','a','b','b','b','c','c','c']
t2= ['c','c','b','b','b','a','a','a']
``````

## Pandas solution

Using Pandas function `unique()`:

``````import pandas as pd
pd.unique(t).tolist()
>>>['a','b','c']
pd.unique(t2).tolist()
>>>['c','b','a']
``````

## Numpy solution

Using numpy function `unique()`.

``````import numpy as np
np.unique(t).tolist()
>>>['a','b','c']
np.unique(t2).tolist()
>>>['a','b','c']
``````

Note that numpy.unique() also sort the values. So the list `t2` is returned sorted. If you want to have the order preserved use as in this answer:

``````_, idx = np.unique(t2, return_index=True)
t2[np.sort(idx)].tolist()
>>>['c','b','a']
``````

The solution is not so elegant compared to the others, however, compared to pandas.unique(), numpy.unique() allows you also to check if nested arrays are unique along one selected axis.

To make a new list retaining the order of first elements of duplicates in `L`:

``````newlist = [ii for n,ii in enumerate(L) if ii not in L[:n]]
``````

For example: if `L = [1, 2, 2, 3, 4, 2, 4, 3, 5]`, then `newlist` will be `[1, 2, 3, 4, 5]`

This checks each new element has not appeared previously in the list before adding it.
Also it does not need imports.

This one cares about the order without too much hassle (OrderdDict & others). Probably not the most Pythonic way, nor shortest way, but does the trick:

``````def remove_duplicates(item_list):
''' Removes duplicate items from a list '''
singles_list = []
for element in item_list:
if element not in singles_list:
singles_list.append(element)
return singles_list
``````

A colleague have sent the accepted answer as part of his code to me for a codereview today.
While I certainly admire the elegance of the answer in question, I am not happy with the performance.
I have tried this solution (I use set to reduce lookup time)

``````def ordered_set(in_list):
out_list = []
for val in in_list:
out_list.append(val)
return out_list
``````

To compare efficiency, I used a random sample of 100 integers – 62 were unique

``````from random import randint
x = [randint(0,100) for _ in xrange(100)]

In : len(set(x))
Out: 62
``````

Here are the results of the measurements

``````In : %timeit list(OrderedDict.fromkeys(x))
10000 loops, best of 3: 86.4 us per loop

In : %timeit ordered_set(x)
100000 loops, best of 3: 15.1 us per loop
``````

Well, what happens if set is removed from the solution?

``````def ordered_set(inlist):
out_list = []
for val in inlist:
if not val in out_list:
out_list.append(val)
return out_list
``````

The result is not as bad as with the OrderedDict, but still more than 3 times of the original solution

``````In : %timeit ordered_set(x)
10000 loops, best of 3: 52.6 us per loop
``````

Simple and easy:

``````myList = [1, 2, 3, 1, 2, 5, 6, 7, 8]
cleanlist = []
[cleanlist.append(x) for x in myList if x not in cleanlist]
``````

Output:

``````>>> cleanlist
[1, 2, 3, 5, 6, 7, 8]
``````

Reduce variant with ordering preserve:

Assume that we have list:

``````l = [5, 6, 6, 1, 1, 2, 2, 3, 4]
``````

Reduce variant (unefficient):

``````>>> reduce(lambda r, v: v in r and r or r + [v], l, [])
[5, 6, 1, 2, 3, 4]
``````

5 x faster but more sophisticated

``````>>> reduce(lambda r, v: v in r and r or (r.append(v) or r.add(v)) or r, l, ([], set()))
[5, 6, 1, 2, 3, 4]
``````

Explanation:

``````default = (list(), set())
# user list to keep order
# use set to make lookup faster

def reducer(result, item):
if item not in result:
result.append(item)
return result

reduce(reducer, l, default)
``````

below code is simple for removing duplicate in list

``````def remove_duplicates(x):
a = []
for i in x:
if i not in a:
a.append(i)
return a

print remove_duplicates([1,2,2,3,3,4])
``````

it returns [1,2,3,4]

To remove the duplicates, make it a SET and then again make it a LIST and print/use it.
A set is guaranteed to have unique elements. For example :

``````a = [1,2,3,4,5,9,11,15]
b = [4,5,6,7,8]
c=a+b
print c
print list(set(c)) #one line for getting unique elements of c
``````

The output will be as follows (checked in python 2.7)

``````[1, 2, 3, 4, 5, 9, 11, 15, 4, 5, 6, 7, 8]  #simple list addition with duplicates
[1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 15] #duplicates removed!!
``````

There are many other answers suggesting different ways to do this, but they’re all batch operations, and some of them throw away the original order. That might be okay depending on what you need, but if you want to iterate over the values in the order of the first instance of each value, and you want to remove the duplicates on-the-fly versus all at once, you could use this generator:

``````def uniqify(iterable):
seen = set()
for item in iterable:
if item not in seen:
yield item
``````

This returns a generator/iterator, so you can use it anywhere that you can use an iterator.

``````for unique_item in uniqify([1, 2, 3, 4, 3, 2, 4, 5, 6, 7, 6, 8, 8]):
print(unique_item, end=' ')

print()
``````

Output:

``````1 2 3 4 5 6 7 8
``````

If you do want a `list`, you can do this:

``````unique_list = list(uniqify([1, 2, 3, 4, 3, 2, 4, 5, 6, 7, 6, 8, 8]))

print(unique_list)
``````

Output:

``````[1, 2, 3, 4, 5, 6, 7, 8]
``````

Check this if you want to remove duplicates (in-place edit rather than returning new list) without using inbuilt set, dict.keys, uniqify, counter

``````>>> t = [1, 2, 3, 1, 2, 5, 6, 7, 8]
>>> for i in t:
...     if i in t[t.index(i)+1:]:
...         t.remove(i)
...
>>> t
[3, 1, 2, 5, 6, 7, 8]
``````

All the order-preserving approaches I’ve seen here so far either use naive comparison (with O(n^2) time-complexity at best) or heavy-weight `OrderedDicts`/`set`+`list` combinations that are limited to hashable inputs. Here is a hash-independent O(nlogn) solution:

Update added the `key` argument, documentation and Python 3 compatibility.

``````# from functools import reduce <-- add this import on Python 3

def uniq(iterable, key=lambda x: x):
"""
Remove duplicates from an iterable. Preserves order.
:type iterable: Iterable[Ord => A]
:param iterable: an iterable of objects of any orderable type
:type key: Callable[A] -> (Ord => B)
:param key: optional argument; by default an item (A) is discarded
if another item (B), such that A == B, has already been encountered and taken.
If you provide a key, this condition changes to key(A) == key(B); the callable
must return orderable objects.
"""
# Enumerate the list to restore order lately; reduce the sorted list; restore order
def append_unique(acc, item):
return acc if key(acc[-1]) == key(item) else acc.append(item) or acc
srt_enum = sorted(enumerate(iterable), key=lambda item: key(item))
return [item for item in sorted(reduce(append_unique, srt_enum, [srt_enum]))]
``````

It requires installing a 3rd-party module but the package `iteration_utilities` contains a `unique_everseen`1 function that can remove all duplicates while preserving the order:

``````>>> from iteration_utilities import unique_everseen

>>> list(unique_everseen(['a', 'b', 'c', 'd'] + ['a', 'c', 'd']))
['a', 'b', 'c', 'd']
``````

In case you want to avoid the overhead of the list addition operation you can use `itertools.chain` instead:

``````>>> from itertools import chain
>>> list(unique_everseen(chain(['a', 'b', 'c', 'd'], ['a', 'c', 'd'])))
['a', 'b', 'c', 'd']
``````

The `unique_everseen` also works if you have unhashable items (for example lists) in the lists:

``````>>> from iteration_utilities import unique_everseen
>>> list(unique_everseen([['a'], ['b'], 'c', 'd'] + ['a', 'c', 'd']))
[['a'], ['b'], 'c', 'd', 'a']
``````

However that will be (much) slower than if the items are hashable.

1 Disclosure: I’m the author of the `iteration_utilities`-library.

For completeness, and since this is a very popular question, the toolz library offers a `unique` function:

``````>>> tuple(unique((1, 2, 3)))
(1, 2, 3)
>>> tuple(unique((1, 2, 1, 3)))
(1, 2, 3)
``````

Here’s the fastest pythonic solution comaring to others listed in replies.

Using implementation details of short-circuit evaluation allows to use list comprehension, which is fast enough. `visited.add(item)` always returns `None` as a result, which is evaluated as `False`, so the right-side of `or` would always be the result of such an expression.

Time it yourself

``````def deduplicate(sequence):
visited = set()
out = [adder(item) or item for item in sequence if item not in visited]
return out
``````

You could also do this:

``````>>> t = [1, 2, 3, 3, 2, 4, 5, 6]
>>> s = [x for i, x in enumerate(t) if i == t.index(x)]
>>> s
[1, 2, 3, 4, 5, 6]
``````

The reason that above works is that `index` method returns only the first index of an element. Duplicate elements have higher indices. Refer to here:

list.index(x[, start[, end]])
Return zero-based index in the list of
the first item whose value is x. Raises a ValueError if there is no
such item.

I think converting to set is the easiest way to remove duplicate:

``````list1 = [1,2,1]
list1 = list(set(list1))
print list1
``````

Using set :

``````a = [0,1,2,3,4,3,3,4]
a = list(set(a))
print a
``````

Using unique :

``````import numpy as np
a = [0,1,2,3,4,3,3,4]
a = np.unique(a).tolist()
print a
``````

Best approach of removing duplicates from a list is using set() function, available in python, again converting that set into list

``````In : some_list = ['a','a','v','v','v','c','c','d']
In : list(set(some_list))
Out: ['a', 'c', 'd', 'v']
``````

You can do this simply by using sets.

Step1: Get Different elements of lists
Step2 Get Common elements of lists
Step3 Combine them

``````In : a = ["apples", "bananas", "cucumbers"]

In : b = ["pears", "apples", "watermelons"]

In : set(a).symmetric_difference(b).union(set(a).intersection(b))
Out: {'apples', 'bananas', 'cucumbers', 'pears', 'watermelons'}
``````
``````def remove_duplicates(A):
[A.pop(count) for count,elem in enumerate(A) if A.count(elem)!=1]
return A
``````

A list comprehesion to remove duplicates

If you don’t care about order and want something different than the pythonic ways suggested above (that is, it can be used in interviews) then :

``````def remove_dup(arr):
size = len(arr)
j = 0    # To store index of next unique element
for i in range(0, size-1):
# If current element is not equal
# to next element then store that
# current element
if(arr[i] != arr[i+1]):
arr[j] = arr[i]
j+=1

arr[j] = arr[size-1] # Store the last element as whether it is unique or repeated, it hasn't stored previously

return arr[0:j+1]

if __name__ == '__main__':
arr = [10, 10, 1, 1, 1, 3, 3, 4, 5, 6, 7, 8, 8, 9]
print(remove_dup(sorted(arr)))
``````

Time Complexity : O(n)

Auxiliary Space : O(n)

Without using set

``````data=[1, 2, 3, 1, 2, 5, 6, 7, 8]
uni_data=[]
for dat in data:
if dat not in uni_data:
uni_data.append(dat)

print(uni_data)
``````

There are a lot of answers here that use a `set(..)` (which is fast given the elements are hashable), or a list (which has the downside that it results in an O(n2) algorithm.

The function I propose is a hybrid one: we use a `set(..)` for items that are hashable, and a `list(..)` for the ones that are not. Furthermore it is implemented as a generator such that we can for instance limit the number of items, or do some additional filtering.

Finally we also can use a `key` argument to specify in what way the elements should be unique. For instance we can use this if we want to filter a list of strings such that every string in the output has a different length.

``````def uniq(iterable, key=lambda x: x):
seens = set()
seenl = []
for item in iterable:
k = key(item)
try:
seen = k in seens
except TypeError:
seen = k in seenl
if not seen:
yield item
try:
except TypeError:
seenl.append(k)``````

We can now for instance use this like:

``````>>> list(uniq(["apple", "pear", "banana", "lemon"], len))
['apple', 'pear', 'banana']
>>> list(uniq(["apple", "pear", "lemon", "banana"], len))
['apple', 'pear', 'banana']
>>> list(uniq(["apple", "pear", {}, "lemon", [], "banana"], len))
['apple', 'pear', {}, 'banana']
>>> list(uniq(["apple", "pear", {}, "lemon", [], "banana"]))
['apple', 'pear', {}, 'lemon', [], 'banana']
>>> list(uniq(["apple", "pear", {}, "lemon", {}, "banana"]))
['apple', 'pear', {}, 'lemon', 'banana']
``````

It is thus a uniqeness filter that can work on any iterable and filter out uniques, regardless whether these are hashable or not.

It makes one assumption: that if one object is hashable, and another one is not, the two objects are never equal. This can strictly speaking happen, although it would be very uncommon.

Another solution might be the following. Create a dictionary out of the list with item as key and index as value, and then print the dictionary keys.

``````>>> lst = [1, 3, 4, 2, 1, 21, 1, 32, 21, 1, 6, 5, 7, 8, 2]
>>>
>>> dict_enum = {item:index for index, item in enumerate(lst)}
>>> print dict_enum.keys()
[32, 1, 2, 3, 4, 5, 6, 7, 8, 21]
``````
``````def remove_duplicates(input_list):
if input_list == []:
return []
#sort list from smallest to largest
input_list=sorted(input_list)
#initialize ouput list with first element of the       sorted input list
output_list = [input_list]
for item in input_list:
if item >output_list[-1]:
output_list.append(item)
return output_list
``````

Very simple way in Python 3:

``````>>> n = [1, 2, 3, 4, 1, 1]
>>> n
[1, 2, 3, 4, 1, 1]
>>> m = sorted(list(set(n)))
>>> m
[1, 2, 3, 4]
``````

Unfortunately. Most answers here either do not preserve the order or are too long. Here is a simple, order preserving answer.

``````s = [1,2,3,4,5,2,5,6,7,1,3,9,3,5]
x=[]

[x.append(i) for i in s if i not in x]
print(x)
``````

This will give you x with duplicates removed but preserving the order.

this is just a readable funtion ,easily understandable ,and i have used the dict data structure,i have used some builtin funtions and a better complexity of O(n)

``````def undup(dup_list):
b={}
for i in dup_list:
b.update({i:1})
return b.keys()
a=["a",'b','a']
print undup(a)
``````

disclamer: u may get an indentation error(if copy and paste) ,use the above code with proper indentation before pasting

You can use `set` to remove duplicates:

``````mylist = list(set(mylist))
``````

But note the results will be unordered. If that’s an issue:

``````mylist.sort()
``````

One more better approach could be,

``````import pandas as pd

myList = [1, 2, 3, 1, 2, 5, 6, 7, 8]
cleanList = pd.Series(myList).drop_duplicates().tolist()
print(cleanList)

#> [1, 2, 3, 5, 6, 7, 8]
``````

and the order remains preserved.

Python has built-in many functions You can use set() to remove the duplicate inside the list.
As per your example there are below two lists t and t2

``````t = ['a', 'b', 'c', 'd']
t2 = ['a', 'c', 'd']
result = list(set(t) - set(t2))
result
``````

You can use the following function:

``````def rem_dupes(dup_list):
yooneeks = []
for elem in dup_list:
if elem not in yooneeks:
yooneeks.append(elem)
return yooneeks
``````

Example:

``````my_list = ['this','is','a','list','with','dupicates','in', 'the', 'list']
``````

Usage:

``````rem_dupes(my_list)
``````

[‘this’, ‘is’, ‘a’, ‘list’, ‘with’, ‘dupicates’, ‘in’, ‘the’]

Sometimes you need to remove the duplicate items in-place, without creating new list. For example, the list is big, or keep it as a shadow copy

``````from collections import Counter
cntDict = Counter(t)
for item,cnt in cntDict.items():
for _ in range(cnt-1):
t.remove(item)
``````

If you want to preserve the order, and not use any external modules here is an easy way to do this:

``````>>> t = [1, 9, 2, 3, 4, 5, 3, 6, 7, 5, 8, 9]
>>> list(dict.fromkeys(t))
[1, 9, 2, 3, 4, 5, 6, 7, 8]
``````

Note: This method preserves the order of appearance, so, as seen above, nine will come after one because it was the first time it appeared. This however, is the same result as you would get with doing

``````from collections import OrderedDict
ulist=list(OrderedDict.fromkeys(l))
``````

but it is much shorter, and runs faster.

This works because each time the `fromkeys` function tries to create a new key, if the value already exists it will simply overwrite it. This wont affect the dictionary at all however, as `fromkeys` creates a dictionary where all keys have the value `None`, so effectively it eliminates all duplicates this way.

If your list is ordered, you can use the following approach to iterate over it skipping the repeated values. This is especially useful to handle big lists with low memory consumption evading the cost of building a `dict` or a `set`:

``````def uniq(iterator):
prev = None
for item in iterator:
if item != prev:
prev = item
yield item
``````

Then:

``````for item in uniq([1, 1, 3, 5, 5, 6]):
print(item, end=' ')
``````

The output is going to be: `1 3 5 6`

To return a list object, you could do:

``````>>> print(list(uniq([1, 1, 3, 5, 5, 6])))
[1, 3, 5, 6]
``````

# The Magic of Python Built-in type

In python, it is very easy to process the complicated cases like this and only by python’s built-in type.

Let me show you how to do !

Method 1: General Case

The way (1 line code) to remove duplicated element in list and still keep sorting order

``````line = [1, 2, 3, 1, 2, 5, 6, 7, 8]
new_line = sorted(set(line), key=line.index) # remove duplicated element
print(new_line)
``````

You will get the result

``````[1, 2, 3, 5, 6, 7, 8]
``````

Method 2: Special Case

``````TypeError: unhashable type: 'list'
``````

The special case to process unhashable (3 line codes)

``````line=[['16.4966155686595', '-27.59776154691', '52.3786295521147']
,['16.4966155686595', '-27.59776154691', '52.3786295521147']
,['17.6508629295574', '-27.143305738671', '47.534955022564']
,['17.6508629295574', '-27.143305738671', '47.534955022564']
,['18.8051102904552', '-26.688849930432', '42.6912804930134']
,['18.8051102904552', '-26.688849930432', '42.6912804930134']
,['19.5504702331098', '-26.205884452727', '37.7709192714727']
,['19.5504702331098', '-26.205884452727', '37.7709192714727']
,['20.2929416861422', '-25.722717575124', '32.8500163147157']
,['20.2929416861422', '-25.722717575124', '32.8500163147157']]

tuple_line = [tuple(pt) for pt in line] # convert list of list into list of tuple
tuple_new_line = sorted(set(tuple_line),key=tuple_line.index) # remove duplicated element
new_line = [list(t) for t in tuple_new_line] # convert list of tuple into list of list

print (new_line)
``````

You will get the result :

``````[
['16.4966155686595', '-27.59776154691', '52.3786295521147'],
['17.6508629295574', '-27.143305738671', '47.534955022564'],
['18.8051102904552', '-26.688849930432', '42.6912804930134'],
['19.5504702331098', '-26.205884452727', '37.7709192714727'],
['20.2929416861422', '-25.722717575124', '32.8500163147157']
]
``````

Because tuple is hashable and you can convert data between list and tuple easily

In this answer, there will be two sections: Two unique solutions, and a graph of speed for specific solutions.

## Removing Duplicate Items

Most of these answers only remove duplicate items which are hashable, but this question doesn’t imply it doesn’t just need hashable items, meaning I’ll offer some solutions which don’t require hashable items.

`collections.Counter` is a powerful tool in the standard library which could be perfect for this. There’s only one other solution which even has Counter in it. However, that solution is also limited to hashable keys.

To allow unhashable keys in Counter, I made a Container class, which will try to get the object’s default hash function, but if it fails, it will try its identity function. It also defines an eq and a hash method. This should be enough to allow unhashable items in our solution. Unhashable objects will be treated as if they are hashable. However, this hash function uses identity for unhashable objects, meaning two equal objects that are both unhashable won’t work. I suggest you override this, and changing it to use the hash of an equivalent mutable type (like using `hash(tuple(my_list))` if `my_list` is a list).

I also made two solutions. Another solution which keeps the order of the items, using a subclass of both OrderedDict and Counter which is named ‘OrderedCounter’. Now, here are the functions:

``````from collections import OrderedDict, Counter

class Container:
def __init__(self, obj):
self.obj = obj
def __eq__(self, obj):
return self.obj == obj
def __hash__(self):
try:
return hash(self.obj)
except:
return id(self.obj)

class OrderedCounter(Counter, OrderedDict):
'Counter that remembers the order elements are first encountered'

def __repr__(self):
return '%s(%r)' % (self.__class__.__name__, OrderedDict(self))

def __reduce__(self):
return self.__class__, (OrderedDict(self),)

def remd(sequence):
cnt = Counter()
for x in sequence:
cnt[Container(x)] += 1
return [item.obj for item in cnt]

def oremd(sequence):
cnt = OrderedCounter()
for x in sequence:
cnt[Container(x)] += 1
return [item.obj for item in cnt]
``````

`remd` is non-ordered sorting, while `oremd` is ordered sorting. You can clearly tell which one is faster, but I’ll explain anyways. The non-ordered sorting is slightly faster, since it doesn’t store the order of the items.

Now, I also wanted to show the speed comparisons of each answer. So, I’ll do that now.

## Which Function is the Fastest?

For removing duplicates, I gathered 10 functions from a few answers. I calculated the speed of each function and put it into a graph using matplotlib.pyplot.

I divided this into three rounds of graphing. A hashable is any object which can be hashed, an unhashable is any object which cannot be hashed. An ordered sequence is a sequence which preserves order, an unordered sequence does not preserve order. Now, here are a few more terms:

Unordered Hashable was for any method which removed duplicates, which didn’t necessarily have to keep the order. It didn’t have to work for unhashables, but it could.

Ordered Hashable was for any method which kept the order of the items in the list, but it didn’t have to work for unhashables, but it could.

Ordered Unhashable was any method which kept the order of the items in the list, and worked for unhashables.

On the y-axis is the amount of seconds it took.

On the x-axis is the number the function was applied to.

I generated sequences for unordered hashables and ordered hashables with the following comprehension: `[list(range(x)) + list(range(x)) for x in range(0, 1000, 10)]`

For ordered unhashables: `[[list(range(y)) + list(range(y)) for y in range(x)] for x in range(0, 1000, 10)]`

Note there is a `step` in the range because without it, this would’ve taken 10x as long. Also because in my personal opinion, I thought it might’ve looked a little easier to read.

Also note the keys on the legend are what I tried to guess as the most vital parts of the implementation of the function. As for what function does the worst or best? The graph speaks for itself.

With that settled, here are the graphs.

## Ordered Unhashables

If you don’t care about the list order, you can use `*arg` expansion with `set` uniqueness to remove dupes, i.e.:

``````l = [*{*l}]
``````

Python3 Demo

I did this with pure python function. This works when your `items` value is JSON.

``````[i for n, i in enumerate(items) if i not in items[n + 1 :]]
``````

I didn’t see answers for non-hashable values, one liner, n log n, standard-library only, so here’s my answer:

``````list(map(operator.itemgetter(0), itertools.groupby(sorted(items))))
``````

Or as a generator function:

``````def unique(items: Iterable[T]) -> Iterable[T]:
"""For unhashable items (can't use set to unique) with a partial order"""
yield from map(operator.itemgetter(0), itertools.groupby(sorted(items)))
``````
``````Test = [1,8,2,7,3,4,5,1,2,3,6]
Test.sort()
i=1
while i< len(Test):
if Test[i] == Test[i-1]:
Test.remove(Test[i])
i= i+1
print(Test)
``````

# Check for the string ‘a’ and ‘b’

``````clean_list = []
for ele in raw_list:
if 'b' in ele or 'a' in ele:
pass
else:
clean_list.append(ele)
``````
``````Write a Python program to create a list of numbers by taking input from the user and then remove  the duplicates from the list. You can take input of non-zero numbers, with an appropriate  prompt, from the user until the user enters a zero to create the list assuming that the numbers  are non-zero.
Sample Input: [10, 34, 18, 10, 12, 34, 18, 20, 25, 20]
Output: [10, 34, 18, 12, 20, 25]

lst = []
print("ENTER ZERO NUMBER FOR EXIT !!!!!!!!!!!!")
print("ENTER LIST ELEMENTS  :: ")
while True:
n = int(input())
if n == 0 :
print("!!!!!!!!!!! EXIT !!!!!!!!!!!!")
break
else :
lst.append(n)
print("LIST ELEMENR ARE :: ",lst)
#dup = set()
uniq = []
for x in lst:
if x not in uniq:
uniq.append(x)
print("UNIQUE ELEMENTS IN LIST ARE :: ",uniq)
``````
• You can remove duplicates using a Python set or the dict.fromkeys() method.

• The dict.fromkeys() method converts a list into a dictionary. Dictionaries cannot contain duplicate values so a dictionary with only unique values is returned by dict.fromkeys().

• Sets, like dictionaries, cannot contain duplicate values. If we convert a list to a set, all the duplicates are removed.

##### Method 1: The naive approach
``````mylist = [5, 10, 15, 20, 3, 15, 25, 20, 30, 10, 100]

uniques = []

for i in mylist:

if i not in uniques:

uniques.append(i)

print(uniques)
``````
##### Method 2: Using set()
``````mylist = [5, 10, 15, 20, 3, 15, 25, 20, 30, 10, 100]

myset = set(mylist)

print(list(myset))
``````

I’ve compared the various suggestions with perfplot. It turns out that, if the input array doesn’t have duplicate elements, all methods are more or less equally fast, independently of whether the input data is a Python list or a NumPy array. If the input array is large, but contains just one unique element, then the `set`, `dict` and `np.unique` methods are costant-time if the input data is a list. If it’s a NumPy array, `np.unique` is about 10 times faster than the other alternatives. It’s somewhat surprising to me that those are not constant-time operations, too.

Code to reproduce the plots:

``````import perfplot
import numpy as np
import matplotlib.pyplot as plt

def setup_list(n):
# return list(np.random.permutation(np.arange(n)))
return  * n

def setup_np_array(n):
# return np.random.permutation(np.arange(n))
return np.zeros(n, dtype=int)

def list_set(data):
return list(set(data))

def numpy_unique(data):
return np.unique(data)

def list_dict(data):
return list(dict.fromkeys(data))

b = perfplot.bench(
setup=[
setup_list,
setup_list,
setup_list,
setup_np_array,
setup_np_array,
setup_np_array,
],
kernels=[list_set, numpy_unique, list_dict, list_set, numpy_unique, list_dict],
labels=[
"list(set(lst))",
"np.unique(lst)",
"list(dict(lst))",
"list(set(arr))",
"np.unique(arr)",
"list(dict(arr))",
],
n_range=[2 ** k for k in range(23)],
xlabel="len(array)",
equality_check=None,
)
# plt.title("input array = [0, 1, 2,..., n]")
plt.title("input array = [0, 0,..., 0]")
b.save("out.png")
b.show()
``````

Using set, but preserving order

``````unique = set()
[unique.add(n) or n for n in l if n not in unique]
``````

You can compare the length of the set and the list and save the set items to list.

``````if len(t) != len(set(t)):
t = [x for x in set(t)]

``````
Categories: questions
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.