How to get list of objects with unique attribute

Question:

Background

I have a list.
This list has many objects. Each object has an id. Now the objects are of different types.

objects = [Aobject, Bobject, Cobject]

where

>>> Aobject != Bobject
True
>>> Aobject.id ==  Bobject.id
True

Problem

I want a list of unique objects based on the object.id.

Something like this:

set(objects, key=operator.attrgetter('id'))

(This does not work. But I want something like this)

Asked By: Akamad007

||

Answers:

seen = set() 

# never use list as a variable name
[seen.add(obj.id) or obj for obj in mylist if obj.id not in seen]

This works because set.add returns None, so the expression in the list comprehension always yields obj, but only if obj.id has not already been added to seen.

(The expression could only evaluate to None if obj is None; in that case, obj.id would raise an exception. In case mylist contains None values, change the test to if obj and (obj.id not in seen))

Note that this will give you the first object in the list which has a given id. @Abhijit’s answer will give you the last such object.

Update:

Alternatively, an ordereddict could be a good choice:

import collections
seen = collections.OrderedDict()

for obj in mylist:
    # eliminate this check if you want the last item
    if obj.id not in seen:
       seen[obj.id] = obj

list(seen.values())
Answered By: Marcin

A fairly simple way to do this would be

for obj in mylist:
    if obj.id not in s:
        s.add(obj.id)

And this should add any id not seen. Time taken is linear on the size of the source list.

Answered By: Nathan S.

If you can change the class of the objects, you can add the appropriate methods which are used in set comparison:

# Assumption: this is the 'original' object
class OriginalExampleObject(object):
    def __init__(self, name, nid):
        self.name = name
        self.id = nid
    def __repr__(self):
        return "(OriginalExampleObject [%s] [%s])" % (self.name, self.id)

class SetExampleObj(OriginalExampleObject):
    def __init__(self, name, nid):
        super(SetExampleObj, self).__init__(name, nid)
    def __eq__(self, other):
        return self.id == other.id
    def __hash__(self):
        return self.id.__hash__()


AObject = SetExampleObj("A", 1)
BObject = SetExampleObj("B", 1)
CObject = SetExampleObj("C", 2)

s = set()
s.add(AObject)
s.add(CObject)
print(s)

s.add(BObject)
print(s)

Output:

set([(OriginalExampleObject [A] [1]), (OriginalExampleObject [C] [2])])
set([(OriginalExampleObject [A] [1]), (OriginalExampleObject [C] [2])])
Answered By: Andreas Florath

Given your list of object somelist be something like

[(Object [A] [1]), (Object [B] [1]), (Object [C] [2]), (Object [D] [2]), (Object [E] [3])]

You can do something like this

>>> {e.id:e for e in somelist}.values()
[(Object [B] [1]), (Object [D] [2]), (Object [E] [3])]
Answered By: Abhijit

How about using dict (since its keys are unique)?

Assuming we have

class Object:
    def __init__(self, id):
        self.id = id


Aobject = Object(1)
Bobject = Object(1)
Cobject = Object(2)
objects = [Aobject, Bobject, Cobject]

then list with Objects unique by id field can be generated using dict comprehension in Python 3

unique_objects = list({object_.id: object_ for object_ in objects}.values())

in Python 2.7

unique_objects = {object_.id: object_ for object_ in objects}.values()

and in Python <2.7

unique_objects = dict([(object_.id, object_) for object_ in objects]).values()

Finally, we can write function (Python 3 version)

def unique(elements, key):
    return list({key(element): element for element in elements}.values())

where elements may be any iterable and key is some callable which returns hashable objects from elements (key equals to operator.attrgetter('id') in our particular case).

Marcin’s answer works fine but doesn’t look Pythonic to me since list comprehension mutates seen object from outer scope, also there is some magic behind using set.add method and comparing its result (which is None) with obj.

And final but not less important part:

Benchmark

setup = '''
import random


class Object:
    def __init__(self, id):
        self.id = id


objects = [Object(random.randint(-100, 100))
           for i in range(1000)]
'''
solution = '''
seen = set()
result = [seen.add(object_.id) or object_
          for object_ in objects
          if object_.id not in seen]
'''
print('list comprehension + set: ',
      min(timeit.Timer(solution, setup).repeat(7, 1000)))
solution = '''
result = list({object_.id: object_
               for object_ in objects}.values())
'''
print('dict comprehension: ',
      min(timeit.Timer(solution, setup).repeat(7, 1000)))

on my machine gives

list comprehension + set:  0.20700953400228173
dict comprehension:  0.1477799109998159
Answered By: Azat Ibrakov

You can use the unique_everseen recipe available in the itertools docs. This is also available in 3rd party libraries, e.g. toolz.unique. Note this method will keep the first instance of an object for a given attribute.

from toolz import unique
from operator import attrgetter

res = list(unique(objects, key=attrgetter('id')))

If a lazy iterator is sufficient, you can omit list conversion.

Answered By: jpp
objects = [Aobject, Bobject, Cobject]
unique_objects = {o['id']:o for o in objects}.values()
Answered By: ife
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.