How to make lists contain only distinct element in Python?
Question:
I have a list in Python, how can I make it’s values unique?
Answers:
The simplest is to convert to a set then back to a list:
my_list = list(set(my_list))
One disadvantage with this is that it won’t preserve the order. You may also want to consider if a set would be a better data structure to use in the first place, instead of a list.
From http://www.peterbe.com/plog/uniqifiers-benchmark:
def f5(seq, idfun=None):
# order preserving
if idfun is None:
def idfun(x): return x
seen = {}
result = []
for item in seq:
marker = idfun(item)
# in old Python versions:
# if seen.has_key(marker)
# but in new ones:
if marker in seen: continue
seen[marker] = 1
result.append(item)
return result
To preserve the order:
l = [1, 1, 2, 2, 3]
result = list()
map(lambda x: not x in result and result.append(x), l)
result
# [1, 2, 3]
If all elements of the list may be used as dictionary keys (i.e. they are all hashable) this is often faster. Python Programming FAQ
d = {}
for x in mylist:
d[x] = 1
mylist = list(d.keys())
Modified versions of http://www.peterbe.com/plog/uniqifiers-benchmark
To preserve the order:
def f(seq): # Order preserving
''' Modified version of Dave Kirby solution '''
seen = set()
return [x for x in seq if x not in seen and not seen.add(x)]
OK, now how does it work, because it’s a little bit tricky here if x not in seen and not seen.add(x)
:
In [1]: 0 not in [1,2,3] and not print('add')
add
Out[1]: True
Why does it return True? print (and set.add) returns nothing:
In [3]: type(seen.add(10))
Out[3]: <type 'NoneType'>
and not None == True
, but:
In [2]: 1 not in [1,2,3] and not print('add')
Out[2]: False
Why does it print ‘add’ in [1] but not in [2]? See False and print('add')
, and doesn’t check the second argument, because it already knows the answer, and returns true only if both arguments are True.
More generic version, more readable, generator based, adds the ability to transform values with a function:
def f(seq, idfun=None): # Order preserving
return list(_f(seq, idfun))
def _f(seq, idfun=None):
''' Originally proposed by Andrew Dalke '''
seen = set()
if idfun is None:
for x in seq:
if x not in seen:
seen.add(x)
yield x
else:
for x in seq:
x = idfun(x)
if x not in seen:
seen.add(x)
yield x
Without order (it’s faster):
def f(seq): # Not order preserving
return list(set(seq))
The simplest way to remove duplicates whilst preserving order is to use collections.OrderedDict (Python 2.7+).
from collections import OrderedDict
d = OrderedDict()
for x in mylist:
d[x] = True
print d.iterkeys()
How about dictionary comprehensions?
>>> mylist = [3, 2, 1, 3, 4, 4, 4, 5, 5, 3]
>>> {x:1 for x in mylist}.keys()
[1, 2, 3, 4, 5]
EDIT
To @Danny’s comment: my original suggestion does not keep the keys ordered. If you need the keys sorted, try:
>>> from collections import OrderedDict
>>> OrderedDict( (x,1) for x in mylist ).keys()
[3, 2, 1, 4, 5]
which keeps elements in the order by the first occurrence of the element (not extensively tested)
one-liner and preserve order
list(OrderedDict.fromkeys([2,1,1,3]))
although you’ll need
from collections import OrderedDict
Let me explain to you by an example:
if you have Python list
>>> randomList = ["a","f", "b", "c", "d", "a", "c", "e", "d", "f", "e"]
and you want to remove duplicates from it.
>>> uniqueList = []
>>> for letter in randomList:
if letter not in uniqueList:
uniqueList.append(letter)
>>> uniqueList
['a', 'f', 'b', 'c', 'd', 'e']
This is how you can remove duplicates from the list.
The characteristics of sets in Python are that the data items in a set
are unordered and duplicates are not allowed. If you try to add a data item to a set that already contains the data item, Python simply ignores it.
>>> l = ['a', 'a', 'bb', 'b', 'c', 'c', '10', '10', '8','8', 10, 10, 6, 10, 11.2, 11.2, 11, 11]
>>> distinct_l = set(l)
>>> print(distinct_l)
set(['a', '10', 'c', 'b', 6, 'bb', 10, 11, 11.2, '8'])
I have a list in Python, how can I make it’s values unique?
The simplest is to convert to a set then back to a list:
my_list = list(set(my_list))
One disadvantage with this is that it won’t preserve the order. You may also want to consider if a set would be a better data structure to use in the first place, instead of a list.
From http://www.peterbe.com/plog/uniqifiers-benchmark:
def f5(seq, idfun=None):
# order preserving
if idfun is None:
def idfun(x): return x
seen = {}
result = []
for item in seq:
marker = idfun(item)
# in old Python versions:
# if seen.has_key(marker)
# but in new ones:
if marker in seen: continue
seen[marker] = 1
result.append(item)
return result
To preserve the order:
l = [1, 1, 2, 2, 3]
result = list()
map(lambda x: not x in result and result.append(x), l)
result
# [1, 2, 3]
If all elements of the list may be used as dictionary keys (i.e. they are all hashable) this is often faster. Python Programming FAQ
d = {}
for x in mylist:
d[x] = 1
mylist = list(d.keys())
Modified versions of http://www.peterbe.com/plog/uniqifiers-benchmark
To preserve the order:
def f(seq): # Order preserving
''' Modified version of Dave Kirby solution '''
seen = set()
return [x for x in seq if x not in seen and not seen.add(x)]
OK, now how does it work, because it’s a little bit tricky here if x not in seen and not seen.add(x)
:
In [1]: 0 not in [1,2,3] and not print('add')
add
Out[1]: True
Why does it return True? print (and set.add) returns nothing:
In [3]: type(seen.add(10))
Out[3]: <type 'NoneType'>
and not None == True
, but:
In [2]: 1 not in [1,2,3] and not print('add')
Out[2]: False
Why does it print ‘add’ in [1] but not in [2]? See False and print('add')
, and doesn’t check the second argument, because it already knows the answer, and returns true only if both arguments are True.
More generic version, more readable, generator based, adds the ability to transform values with a function:
def f(seq, idfun=None): # Order preserving
return list(_f(seq, idfun))
def _f(seq, idfun=None):
''' Originally proposed by Andrew Dalke '''
seen = set()
if idfun is None:
for x in seq:
if x not in seen:
seen.add(x)
yield x
else:
for x in seq:
x = idfun(x)
if x not in seen:
seen.add(x)
yield x
Without order (it’s faster):
def f(seq): # Not order preserving
return list(set(seq))
The simplest way to remove duplicates whilst preserving order is to use collections.OrderedDict (Python 2.7+).
from collections import OrderedDict
d = OrderedDict()
for x in mylist:
d[x] = True
print d.iterkeys()
How about dictionary comprehensions?
>>> mylist = [3, 2, 1, 3, 4, 4, 4, 5, 5, 3]
>>> {x:1 for x in mylist}.keys()
[1, 2, 3, 4, 5]
EDIT
To @Danny’s comment: my original suggestion does not keep the keys ordered. If you need the keys sorted, try:
>>> from collections import OrderedDict
>>> OrderedDict( (x,1) for x in mylist ).keys()
[3, 2, 1, 4, 5]
which keeps elements in the order by the first occurrence of the element (not extensively tested)
one-liner and preserve order
list(OrderedDict.fromkeys([2,1,1,3]))
although you’ll need
from collections import OrderedDict
Let me explain to you by an example:
if you have Python list
>>> randomList = ["a","f", "b", "c", "d", "a", "c", "e", "d", "f", "e"]
and you want to remove duplicates from it.
>>> uniqueList = []
>>> for letter in randomList:
if letter not in uniqueList:
uniqueList.append(letter)
>>> uniqueList
['a', 'f', 'b', 'c', 'd', 'e']
This is how you can remove duplicates from the list.
The characteristics of sets in Python are that the data items in a set
are unordered and duplicates are not allowed. If you try to add a data item to a set that already contains the data item, Python simply ignores it.
>>> l = ['a', 'a', 'bb', 'b', 'c', 'c', '10', '10', '8','8', 10, 10, 6, 10, 11.2, 11.2, 11, 11]
>>> distinct_l = set(l)
>>> print(distinct_l)
set(['a', '10', 'c', 'b', 6, 'bb', 10, 11, 11.2, '8'])