Merge dictionary of tuple items based only on first term of tuple key

Question:

I am trying to convert a dictionary of tuples to a list of tuples. However, I need to do so based on the first value of the tuple key. I have worked out most of the steps to accomplish this, but I cannot figure out the first step in the process. Obviously, if there is a cleaner method without these steps, these steps can be ignored, but here is the process I have been trying so far:

input_dict = {("1", "a"): 1.0, ("1", "b"): 2.0, ("2", "a"): 4.0}
desired_output = [(4, "2", "a"), (3, "1", "a")]



# step 1) merge items, summing values based on first term in tuple key, keeping only one occurence of second term in tuple key

## ? can't figure out how to do this step. Does not matter if option 1 or option 2 is produced
desired_step_1_output_option_1 = {("1", "a"): 3.0, ("2", "a"): 4.0}
desired_step_1_output_option_2 = {("1", "b"): 3.0, ("2", "a"): 4.0}

# step 2) order dictionary by value and convert to list of tuples

output_step_2 = sorted(desired_step_1_output_option_1.items(), key=lambda item: item[1], reverse = True)
## Output: [(('2', 'a'), 4.0), (('1', 'a'), 3.0)]

# step 3) Re-order results

output_step_3 = [(keys, value) for value, keys in output_step_2]
## Output: [(4.0, ('2', 'a')), (3.0, ('1', 'a'))]

# step 4) convert values to int, and un-nest tuples

output_step_4 = [(int(value), *keys) for value, keys in output_step_3]
## Output: [(4, '2', 'a'), (3, '1', 'a')]
Asked By: talker90

||

Answers:

You can use itertools.groupby to group by the first term.

Use input_dict.items() as the input to maintain access to the full tuple key and the value.

from itertools import groupby

input_dict = {("1", "a"): 1.0, ("2", "a"): 4.0, ("1", "b"): 2.0}

def aggregate(tupleDict):
    l = lambda k: k[0][0]
    for g in groupby(sorted(tupleDict.items(), key=l), key=l):
        group = [(elem[1], elem[0][1]) for elem in g[1]]
        total = int(sum(elem[0] for elem in group))
        yield (total, g[0], group[0][1])

result = [a for a in sorted(aggregate(input_dict), key=lambda k: k[0], reverse=True)]
print(result)

Note that g[1] is an iterator. I create a temporary group list so that I can access the numeric values to compute the sum, but also the first group element to get at one of the letters. I considered using itertools.tee to get two iterators so that I can access both, but advancing one of the iterators but not the other would cause the values yielded by the first iterator to be stored in memory for the second iterator. Using a list instead is simpler and the memory consumption is probably the same. You could probably optimize this further in other ways, however.

You could also group by the first term using a dict and store the second term and the numeric value as value. Because of this composite value, it’s a little cumbersome to set the value:

temp_dict = {}
for key, value in input_dict.items():
    temp_dict[key[0]] = (
        key[1],
        temp_dict.setdefault(key[0], (None, 0.0))[1] + value
    )

result = [(int(v[1]), k, v[0])
    for k, v in sorted(temp_dict.items(), key=lambda kv: kv[1][1], reverse=True)]

Similarly, using defaultdict requires a custom class because of the composite type, but it kind of makes it more readable:

from collections import defaultdict

class LetterValue:
    def __init__(self, letter=None, value=0.0):
        self.letter = letter
        self.value = value
    def add(self, letter, value):
        self.letter = letter
        self.value += value

temp_dict = defaultdict(LetterValue)

for key, value in input_dict.items():
    temp_dict[key[0]].add(key[1], value)

result = [(int(v.value), k, v.letter)
    for k, v in sorted(temp_dict.items(), key=lambda kv: kv[1].value, reverse=True)]
Answered By: CodeManX