how to uniqify a list of dict in python
Question:
I have a list:
d = [{'x':1, 'y':2}, {'x':3, 'y':4}, {'x':1, 'y':2}]
{'x':1, 'y':2}
comes more than once I want to remove it from the list.My result should be:
d = [{'x':1, 'y':2}, {'x':3, 'y':4} ]
Note:
list(set(d))
is not working here throwing an error.
Answers:
If your value is hashable this will work:
>>> [dict(y) for y in set(tuple(x.items()) for x in d)]
[{'y': 4, 'x': 3}, {'y': 2, 'x': 1}]
EDIT:
I tried it with no duplicates and it seemed to work fine
>>> d = [{'x':1, 'y':2}, {'x':3, 'y':4}]
>>> [dict(y) for y in set(tuple(x.items()) for x in d)]
[{'y': 4, 'x': 3}, {'y': 2, 'x': 1}]
and
>>> d = [{'x':1,'y':2}]
>>> [dict(y) for y in set(tuple(x.items()) for x in d)]
[{'y': 2, 'x': 1}]
Dicts aren’t hashable, so you can’t put them in a set. A relatively efficient approach would be turning the (key, value)
pairs into a tuple and hashing those tuples (feel free to eliminate the intermediate variables):
tuples = tuple(set(d.iteritems()) for d in dicts)
unique = set(tuples)
return [dict(pairs) for pairs in unique]
If the values aren’t always hashable, this is not possible at all using sets and you’ll propably have to use the O(n^2) approach using an in
check per element.
A simple loop:
tmp=[]
for i in d:
if i not in tmp:
tmp.append(i)
tmp
[{'x': 1, 'y': 2}, {'x': 3, 'y': 4}]
Avoid this whole problem and use namedtuples instead
from collections import namedtuple
Point = namedtuple('Point','x y'.split())
better_d = [Point(1,2), Point(3,4), Point(1,2)]
print set(better_d)
Another dark magic(please don’t beat me):
map(dict, set(map(lambda x: tuple(x.items()), d)))
tuple the dict won’t be okay, if the value of one dict item looks like a list.
e.g.,
data = [
{'a': 1, 'b': 2},
{'a': 1, 'b': 2},
{'a': 2, 'b': 3}
]
using [dict(y) for y in set(tuple(x.items()) for x in data)] will get the unique data.
However, same action on such data will be failed:
data = [
{'a': 1, 'b': 2, 'c': [1,2]},
{'a': 1, 'b': 2, 'c': [1,2]},
{'a': 2, 'b': 3, 'c': [3]}
]
ignore the performance, json dumps/loads
could be a nice choice.
data = set([json.dumps(d) for d in data])
data = [json.loads(d) for d in data]
I have a list:
d = [{'x':1, 'y':2}, {'x':3, 'y':4}, {'x':1, 'y':2}]
{'x':1, 'y':2}
comes more than once I want to remove it from the list.My result should be:
d = [{'x':1, 'y':2}, {'x':3, 'y':4} ]
Note:
list(set(d))
is not working here throwing an error.
If your value is hashable this will work:
>>> [dict(y) for y in set(tuple(x.items()) for x in d)]
[{'y': 4, 'x': 3}, {'y': 2, 'x': 1}]
EDIT:
I tried it with no duplicates and it seemed to work fine
>>> d = [{'x':1, 'y':2}, {'x':3, 'y':4}]
>>> [dict(y) for y in set(tuple(x.items()) for x in d)]
[{'y': 4, 'x': 3}, {'y': 2, 'x': 1}]
and
>>> d = [{'x':1,'y':2}]
>>> [dict(y) for y in set(tuple(x.items()) for x in d)]
[{'y': 2, 'x': 1}]
Dicts aren’t hashable, so you can’t put them in a set. A relatively efficient approach would be turning the (key, value)
pairs into a tuple and hashing those tuples (feel free to eliminate the intermediate variables):
tuples = tuple(set(d.iteritems()) for d in dicts)
unique = set(tuples)
return [dict(pairs) for pairs in unique]
If the values aren’t always hashable, this is not possible at all using sets and you’ll propably have to use the O(n^2) approach using an in
check per element.
A simple loop:
tmp=[]
for i in d:
if i not in tmp:
tmp.append(i)
tmp
[{'x': 1, 'y': 2}, {'x': 3, 'y': 4}]
Avoid this whole problem and use namedtuples instead
from collections import namedtuple
Point = namedtuple('Point','x y'.split())
better_d = [Point(1,2), Point(3,4), Point(1,2)]
print set(better_d)
Another dark magic(please don’t beat me):
map(dict, set(map(lambda x: tuple(x.items()), d)))
tuple the dict won’t be okay, if the value of one dict item looks like a list.
e.g.,
data = [
{'a': 1, 'b': 2},
{'a': 1, 'b': 2},
{'a': 2, 'b': 3}
]
using [dict(y) for y in set(tuple(x.items()) for x in data)] will get the unique data.
However, same action on such data will be failed:
data = [
{'a': 1, 'b': 2, 'c': [1,2]},
{'a': 1, 'b': 2, 'c': [1,2]},
{'a': 2, 'b': 3, 'c': [3]}
]
ignore the performance, json dumps/loads
could be a nice choice.
data = set([json.dumps(d) for d in data])
data = [json.loads(d) for d in data]