How to get list of objects with unique attribute
Question:
Background
I have a list
.
This list
has many objects. Each object has an id
. Now the objects are of different types.
objects = [Aobject, Bobject, Cobject]
where
>>> Aobject != Bobject
True
>>> Aobject.id == Bobject.id
True
Problem
I want a list
of unique objects based on the object.id
.
Something like this:
set(objects, key=operator.attrgetter('id'))
(This does not work. But I want something like this)
Answers:
seen = set()
# never use list as a variable name
[seen.add(obj.id) or obj for obj in mylist if obj.id not in seen]
This works because set.add
returns None
, so the expression in the list comprehension always yields obj
, but only if obj.id
has not already been added to seen
.
(The expression could only evaluate to None
if obj is None
; in that case, obj.id
would raise an exception. In case mylist
contains None
values, change the test to if obj and (obj.id not in seen)
)
Note that this will give you the first object in the list which has a given id. @Abhijit’s answer will give you the last such object.
Update:
Alternatively, an ordereddict could be a good choice:
import collections
seen = collections.OrderedDict()
for obj in mylist:
# eliminate this check if you want the last item
if obj.id not in seen:
seen[obj.id] = obj
list(seen.values())
A fairly simple way to do this would be
for obj in mylist:
if obj.id not in s:
s.add(obj.id)
And this should add any id not seen. Time taken is linear on the size of the source list.
If you can change the class of the objects, you can add the appropriate methods which are used in set comparison:
# Assumption: this is the 'original' object
class OriginalExampleObject(object):
def __init__(self, name, nid):
self.name = name
self.id = nid
def __repr__(self):
return "(OriginalExampleObject [%s] [%s])" % (self.name, self.id)
class SetExampleObj(OriginalExampleObject):
def __init__(self, name, nid):
super(SetExampleObj, self).__init__(name, nid)
def __eq__(self, other):
return self.id == other.id
def __hash__(self):
return self.id.__hash__()
AObject = SetExampleObj("A", 1)
BObject = SetExampleObj("B", 1)
CObject = SetExampleObj("C", 2)
s = set()
s.add(AObject)
s.add(CObject)
print(s)
s.add(BObject)
print(s)
Output:
set([(OriginalExampleObject [A] [1]), (OriginalExampleObject [C] [2])])
set([(OriginalExampleObject [A] [1]), (OriginalExampleObject [C] [2])])
Given your list of object somelist
be something like
[(Object [A] [1]), (Object [B] [1]), (Object [C] [2]), (Object [D] [2]), (Object [E] [3])]
You can do something like this
>>> {e.id:e for e in somelist}.values()
[(Object [B] [1]), (Object [D] [2]), (Object [E] [3])]
How about using dict
(since its keys are unique)?
Assuming we have
class Object:
def __init__(self, id):
self.id = id
Aobject = Object(1)
Bobject = Object(1)
Cobject = Object(2)
objects = [Aobject, Bobject, Cobject]
then list
with Object
s unique by id
field can be generated using dict
comprehension in Python 3
unique_objects = list({object_.id: object_ for object_ in objects}.values())
in Python 2.7
unique_objects = {object_.id: object_ for object_ in objects}.values()
and in Python <2.7
unique_objects = dict([(object_.id, object_) for object_ in objects]).values()
Finally, we can write function (Python 3 version)
def unique(elements, key):
return list({key(element): element for element in elements}.values())
where elements
may be any iterable
and key
is some callable
which returns hashable
objects from elements
(key
equals to operator.attrgetter('id')
in our particular case).
Marcin’s answer works fine but doesn’t look Pythonic to me since list comprehension mutates seen
object from outer scope, also there is some magic behind using set.add
method and comparing its result (which is None
) with obj
.
And final but not less important part:
Benchmark
setup = '''
import random
class Object:
def __init__(self, id):
self.id = id
objects = [Object(random.randint(-100, 100))
for i in range(1000)]
'''
solution = '''
seen = set()
result = [seen.add(object_.id) or object_
for object_ in objects
if object_.id not in seen]
'''
print('list comprehension + set: ',
min(timeit.Timer(solution, setup).repeat(7, 1000)))
solution = '''
result = list({object_.id: object_
for object_ in objects}.values())
'''
print('dict comprehension: ',
min(timeit.Timer(solution, setup).repeat(7, 1000)))
on my machine gives
list comprehension + set: 0.20700953400228173
dict comprehension: 0.1477799109998159
You can use the unique_everseen
recipe available in the itertools
docs. This is also available in 3rd party libraries, e.g. toolz.unique
. Note this method will keep the first instance of an object for a given attribute.
from toolz import unique
from operator import attrgetter
res = list(unique(objects, key=attrgetter('id')))
If a lazy iterator is sufficient, you can omit list
conversion.
objects = [Aobject, Bobject, Cobject]
unique_objects = {o['id']:o for o in objects}.values()
Background
I have a list
.
This list
has many objects. Each object has an id
. Now the objects are of different types.
objects = [Aobject, Bobject, Cobject]
where
>>> Aobject != Bobject
True
>>> Aobject.id == Bobject.id
True
Problem
I want a list
of unique objects based on the object.id
.
Something like this:
set(objects, key=operator.attrgetter('id'))
(This does not work. But I want something like this)
seen = set()
# never use list as a variable name
[seen.add(obj.id) or obj for obj in mylist if obj.id not in seen]
This works because set.add
returns None
, so the expression in the list comprehension always yields obj
, but only if obj.id
has not already been added to seen
.
(The expression could only evaluate to None
if obj is None
; in that case, obj.id
would raise an exception. In case mylist
contains None
values, change the test to if obj and (obj.id not in seen)
)
Note that this will give you the first object in the list which has a given id. @Abhijit’s answer will give you the last such object.
Update:
Alternatively, an ordereddict could be a good choice:
import collections
seen = collections.OrderedDict()
for obj in mylist:
# eliminate this check if you want the last item
if obj.id not in seen:
seen[obj.id] = obj
list(seen.values())
A fairly simple way to do this would be
for obj in mylist:
if obj.id not in s:
s.add(obj.id)
And this should add any id not seen. Time taken is linear on the size of the source list.
If you can change the class of the objects, you can add the appropriate methods which are used in set comparison:
# Assumption: this is the 'original' object
class OriginalExampleObject(object):
def __init__(self, name, nid):
self.name = name
self.id = nid
def __repr__(self):
return "(OriginalExampleObject [%s] [%s])" % (self.name, self.id)
class SetExampleObj(OriginalExampleObject):
def __init__(self, name, nid):
super(SetExampleObj, self).__init__(name, nid)
def __eq__(self, other):
return self.id == other.id
def __hash__(self):
return self.id.__hash__()
AObject = SetExampleObj("A", 1)
BObject = SetExampleObj("B", 1)
CObject = SetExampleObj("C", 2)
s = set()
s.add(AObject)
s.add(CObject)
print(s)
s.add(BObject)
print(s)
Output:
set([(OriginalExampleObject [A] [1]), (OriginalExampleObject [C] [2])])
set([(OriginalExampleObject [A] [1]), (OriginalExampleObject [C] [2])])
Given your list of object somelist
be something like
[(Object [A] [1]), (Object [B] [1]), (Object [C] [2]), (Object [D] [2]), (Object [E] [3])]
You can do something like this
>>> {e.id:e for e in somelist}.values()
[(Object [B] [1]), (Object [D] [2]), (Object [E] [3])]
How about using dict
(since its keys are unique)?
Assuming we have
class Object:
def __init__(self, id):
self.id = id
Aobject = Object(1)
Bobject = Object(1)
Cobject = Object(2)
objects = [Aobject, Bobject, Cobject]
then list
with Object
s unique by id
field can be generated using dict
comprehension in Python 3
unique_objects = list({object_.id: object_ for object_ in objects}.values())
in Python 2.7
unique_objects = {object_.id: object_ for object_ in objects}.values()
and in Python <2.7
unique_objects = dict([(object_.id, object_) for object_ in objects]).values()
Finally, we can write function (Python 3 version)
def unique(elements, key):
return list({key(element): element for element in elements}.values())
where elements
may be any iterable
and key
is some callable
which returns hashable
objects from elements
(key
equals to operator.attrgetter('id')
in our particular case).
Marcin’s answer works fine but doesn’t look Pythonic to me since list comprehension mutates seen
object from outer scope, also there is some magic behind using set.add
method and comparing its result (which is None
) with obj
.
And final but not less important part:
Benchmark
setup = '''
import random
class Object:
def __init__(self, id):
self.id = id
objects = [Object(random.randint(-100, 100))
for i in range(1000)]
'''
solution = '''
seen = set()
result = [seen.add(object_.id) or object_
for object_ in objects
if object_.id not in seen]
'''
print('list comprehension + set: ',
min(timeit.Timer(solution, setup).repeat(7, 1000)))
solution = '''
result = list({object_.id: object_
for object_ in objects}.values())
'''
print('dict comprehension: ',
min(timeit.Timer(solution, setup).repeat(7, 1000)))
on my machine gives
list comprehension + set: 0.20700953400228173
dict comprehension: 0.1477799109998159
You can use the unique_everseen
recipe available in the itertools
docs. This is also available in 3rd party libraries, e.g. toolz.unique
. Note this method will keep the first instance of an object for a given attribute.
from toolz import unique
from operator import attrgetter
res = list(unique(objects, key=attrgetter('id')))
If a lazy iterator is sufficient, you can omit list
conversion.
objects = [Aobject, Bobject, Cobject]
unique_objects = {o['id']:o for o in objects}.values()