how to uniqify a list of dict in python

Question:

I have a list:

d = [{'x':1, 'y':2}, {'x':3, 'y':4}, {'x':1, 'y':2}]

{'x':1, 'y':2} comes more than once I want to remove it from the list.My result should be:

 d = [{'x':1, 'y':2}, {'x':3, 'y':4} ]

Note:
list(set(d)) is not working here throwing an error.

Asked By: ramesh.c

||

Answers:

If your value is hashable this will work:

>>> [dict(y) for y in set(tuple(x.items()) for x in d)]
[{'y': 4, 'x': 3}, {'y': 2, 'x': 1}]

EDIT:

I tried it with no duplicates and it seemed to work fine

>>> d = [{'x':1, 'y':2}, {'x':3, 'y':4}]
>>> [dict(y) for y in set(tuple(x.items()) for x in d)]
[{'y': 4, 'x': 3}, {'y': 2, 'x': 1}]

and

>>> d = [{'x':1,'y':2}]
>>> [dict(y) for y in set(tuple(x.items()) for x in d)]
[{'y': 2, 'x': 1}]
Answered By: GWW

Dicts aren’t hashable, so you can’t put them in a set. A relatively efficient approach would be turning the (key, value) pairs into a tuple and hashing those tuples (feel free to eliminate the intermediate variables):

tuples = tuple(set(d.iteritems()) for d in dicts)
unique = set(tuples)
return [dict(pairs) for pairs in unique]

If the values aren’t always hashable, this is not possible at all using sets and you’ll propably have to use the O(n^2) approach using an in check per element.

Answered By: user395760

A simple loop:

tmp=[]

for i in d:
    if i not in tmp:
        tmp.append(i)        
tmp
[{'x': 1, 'y': 2}, {'x': 3, 'y': 4}]
Answered By: Fredrik Pihl

Avoid this whole problem and use namedtuples instead

from collections import namedtuple

Point = namedtuple('Point','x y'.split())
better_d = [Point(1,2), Point(3,4), Point(1,2)]
print set(better_d)
Answered By: Jochen Ritzel

Another dark magic(please don’t beat me):

map(dict, set(map(lambda x: tuple(x.items()), d)))
Answered By: Artsiom Rudzenka

tuple the dict won’t be okay, if the value of one dict item looks like a list.

e.g.,

data = [
  {'a': 1, 'b': 2},
  {'a': 1, 'b': 2},
  {'a': 2, 'b': 3}
]

using [dict(y) for y in set(tuple(x.items()) for x in data)] will get the unique data.

However, same action on such data will be failed:

data = [
  {'a': 1, 'b': 2, 'c': [1,2]},
  {'a': 1, 'b': 2, 'c': [1,2]},
  {'a': 2, 'b': 3, 'c': [3]}
]

ignore the performance, json dumps/loads could be a nice choice.

data = set([json.dumps(d) for d in data])
data = [json.loads(d) for d in data]
Answered By: Eric
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.