Python3 dictionary: remove duplicate values in alphabetical order

Question:

Let’s say I have the following dictionary:

full_dic = {
  'aa': 1,
  'ac': 1,
  'ab': 1,
  'ba': 2,
  ...
}

I normally use standard dictionary comprehension to remove dupes like:

t = {val : key for (key, val) in full_dic.items()}
cleaned_dic = {val : key for (key, val) in t.items()}

Calling print(cleaned_dic) outputs {'ab': 1,'ba': 2, ...}

With this code, the key that remains seems to always be the final one in the list, but I’m not sure that’s even guaranteed as dictionaries are unordered. Instead, I’d like to find a way to ensure that the key I keep is the first alphabetically.

So, regardless of the ‘order’ the dictionary is in, I want the output to be:

>> {'aa': 1,'ba': 2, ...}

Where ‘aa’ comes first alphabetically.


I ran some timer tests on 3 answers below and got the following (dictionary was created with random key/value pairs):

dict length:  10
# of loops:  100000
HoliSimo (OrderedDict): 0.0000098405 seconds
Ricardo: 0.0000115448 seconds
Mark (itertools.groupby): 0.0000111745 seconds

dict length:  1000000
# of loops:  10
HoliSimo (OrderedDict): 6.1724137300 seconds
Ricardo: 3.3102091300 seconds
Mark (itertools.groupby): 6.1338266200 seconds

We can see that for smaller dictionary sizes using OrderedDict is fastest but for large dictionary sizes it’s slightly better to use Ricardo’s answer below.

Asked By: John

||

Answers:

You should use the OrderectDict class.

import collections
full_dic = {
  'aa': 1,
  'ac': 1,
  'ab': 1
}
od = collections.OrderedDict(sorted(full_dic.items()))

In this way you will be sure to have sorted dictionary (Original code: StackOverflow).

And then:

result = {}
for k, vin od.items():
    if value not in result.values():
        result[key] = value

I’m not sure if it will speed up the computation but you can try:

inverted_dict = {}
for k, v in od.items():
    if inverted_dict.get(v) is None:
        inverted_dict[v] = k

res = {v: k for k, v in inverted_dict.items()}
Answered By: HoliSimo
t = {val : key for (key, val) in dict(sorted(full_dic.items(), key=lambda x: x[0].lower(), reverse=True)).items()}
cleaned_dic = {val : key for (key, val) in t.items()}
dict(sorted(cleaned_dic.items(), key=lambda x: x[0].lower()))
>>> {'aa': 1, 'ba': 2}
Answered By: Ricardo

Seems like you can do this with a single sort and itertools.groupby. First sort the items by value, then key. Pass this to groupby and take the first item of each group to pass to the dict constructor:

from itertools import groupby

full_dic = {
  'aa': 1,
  'ac': 1,
  'xx': 2,
  'ab': 1,
  'ba': 2,
}
groups = groupby(sorted(full_dic.items(), key=lambda p: (p[1], p[0])), key=lambda x: x[1])
dict(next(g) for k, g in groups)
# {'aa': 1, 'ba': 2}
Answered By: Mark
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.