Group all values with the same identic key in list of tuples

Question:

I have a list of tuples with strings and dictionaries which looks like following:

# data type: List<Tuple<string, dict>>    
input_data_structure = [
('key1', {'a': 'b'}), 
('key2', {'w': 'x'}), 
('key1', {'c': 'd'}), 
('key2', {'y': 'z'})]

I want to group alle values with the same keys. So the result could look like this or similiar:

# data type: List<Tuple<string, List<dict>>>     
result_data_structure = [
('key1',  [{'a': 'b'}, {'c': 'd'}]), 
('key2',  [{'w': 'x'}, {'y': 'z'}])]

For me it is important to have a good data structure, where I can loop through the existing arrays of the keys to get the values like this:

for t in result:
     for val in t[1]:
         print(val)

Does someone has an idea how to process or transform the data? Thanks in advance!

Asked By: H. Senkaya

||

Answers:

You can use defaultdicts to achieve easily this.

from collections import defaultdict
d = defaultdict(list)

for key, value in input_data_structure:
    d[key].append(value)

d  # defaultdict(<class 'list'>, {'key1': [{'a': 'b'}, {'c': 'd'}], 'key2': [{'w': 'x'}, {'y': 'z'}]})

If you need your output to be a list of tuples key/value, then you can just execute this line.

list(d.items())  # [('key1', [{'a': 'b'}, {'c': 'd'}]), ('key2', [{'w': 'x'}, {'y': 'z'}])]
Answered By: crissal

A solution without import:

result = {}
for key, data in input_data_structure:
    try:
        result[key].append(data)
    except KeyError:
        result[key] = [data]
result = list(result.items())
Answered By: Robert Haas

This is a great place to use itertools.groupby, but remember that it only works properly on sorted data.

from itertools import groupby
from operator import itemgetter

input_data_structure = [
('key1', {'a': 'b'}), 
('key2', {'w': 'x'}), 
('key1', {'c': 'd'}), 
('key2', {'y': 'z'})]

sorted_data = sorted(input_data_structure, key=itemgetter(0))
# [('key1', {'a': 'b'}), ('key1', {'c': 'd'}), ('key2', {'w': 'x'}), ('key2', {'y': 'z'})]
grouped_data = [(k, list(map(itemgetter(1), g))) for k, g in groupby(sorted_data, itemgetter(0))] 
# [('key1', [{'a': 'b'}, {'c': 'd'}]), ('key2', [{'w': 'x'}, {'y': 'z'}])]
Answered By: Chris
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.