Group list items in a dict

Question:

I want to generate a dictionary from a list of dictionaries, grouping list items by the value of some key, such as:

input_list = [
        {'a':'tata', 'b': 'foo'},
        {'a':'pipo', 'b': 'titi'},
        {'a':'pipo', 'b': 'toto'},
        {'a':'tata', 'b': 'bar'}
]
output_dict = {
        'pipo': [
             {'a': 'pipo', 'b': 'titi'}, 
             {'a': 'pipo', 'b': 'toto'}
         ],
         'tata': [
             {'a': 'tata', 'b': 'foo'},
             {'a': 'tata', 'b': 'bar'}
         ]
}

So far I’ve found two ways of doing this. The first simply iterates over the list, create sublists in the dict for each key value and append elements matching these keys to the sublist:

l = [ 
    {'a':'tata', 'b': 'foo'},
    {'a':'pipo', 'b': 'titi'},
    {'a':'pipo', 'b': 'toto'},
    {'a':'tata', 'b': 'bar'}
    ]

res = {}

for e in l:
    res[e['a']] = res.get(e['a'], []) 
    res[e['a']].append(e)

And another using itertools.groupby:

import itertools
from operator import itemgetter

l = [ 
        {'a':'tata', 'b': 'foo'},
        {'a':'pipo', 'b': 'titi'},
        {'a':'pipo', 'b': 'toto'},
        {'a':'tata', 'b': 'bar'}
]

l = sorted(l, key=itemgetter('a'))
res = dict((k, list(g)) for k, g in itertools.groupby(l, key=itemgetter('a')))

I wonder which alternative is the most efficient?

Is there any more pythonic/concise or better performing way of achieving this?

Asked By: Erwan Queffélec

||

Answers:

Is it correct that you want to group your input list by the value of the ‘a’ key of the list elements? If so, your first approach is the best, one minor improvement, use dict.setdefault:

res = {}
for item in l:
    res.setdefault(item['a'], []).append(item)
Answered By: Bernhard

A one liner –

>>> import itertools
>>> input_list = [
...         {'a':'tata', 'b': 'foo'},
...         {'a':'pipo', 'b': 'titi'},
...         {'a':'pipo', 'b': 'toto'},
...         {'a':'tata', 'b': 'bar'}
... ]
>>> {k:[v for v in input_list if v['a'] == k] for k, val in itertools.groupby(input_list,lambda x: x['a'])}
{'tata': [{'a': 'tata', 'b': 'foo'}, {'a': 'tata', 'b': 'bar'}], 'pipo': [{'a': 'pipo', 'b': 'titi'}, {'a': 'pipo', 'b': 'toto'}]}
Answered By: Anand S Kumar

The best approach is the first one you mentioned, and you can even make it more elegant by using setdefault as mentioned by bernhard above. The complexity of this approach is O(n) since we simply iterate over the input once and for each item we perform a lookup into the output dict we are building to find the appropriate list to append it to, which takes constant time (lookup+append) for each item. So overlal complexity is O(n) which is optimal.

When using itertools.groupby, you must sort the input beforehand (which is O(n log n)).

Answered By: gen-y-s

If by efficient you mean “time efficient”, it is possible to measure it using the timeit built in module.

For example:

import timeit
import itertools
from operator import itemgetter

input = [{'a': 'tata', 'b': 'foo'},
         {'a': 'pipo', 'b': 'titi'},
         {'a': 'pipo', 'b': 'toto'},
         {'a': 'tata', 'b': 'bar'}]

def solution1():
    res = {}
    for e in input:
        res[e['a']] = res.get(e['a'], [])
        res[e['a']].append(e)
    return res

def solution2():
    l = sorted(input, key=itemgetter('a'))
    res = dict(
        (k, list(g)) for k, g in itertools.groupby(l, key=itemgetter('a'))
    )
    return res

t = timeit.Timer(solution1)
print(t.timeit(10000))
# 0.0122511386871

t = timeit.Timer(solution2)
print(t.timeit(10000))
# 0.0366218090057

Please refer to the timeit official docs for further information.

Answered By: Gabriel Lima