Python dictionary comprehension to group together equal keys

Question:

I have a code snippit that groups together equal keys from a list of dicts and adds the dict with equal ObjectID to a list under that key.
Code bellow works, but I am trying to convert it to a Dictionary comprehension

group togheter subblocks if they have equal ObjectID

output = {}
subblkDBF : list[dict]
for row in subblkDBF:
    if row["OBJECTID"] not in output:
        output[row["OBJECTID"]] = []
    output[row["OBJECTID"]].append(row)
Asked By: Espen Enes

||

Answers:

Using a comprehension is possible, but likely inefficient in this case, since you need to (a) check if a key is in the dictionary at every iteration, and (b) append to, rather than set the value. You can, however, eliminate some of the boilerplate using collections.defaultdict:

output = defaultdict(list)
for row in subblkDBF:
    output[row['OBJECTID']].append(row)

The problem with using a comprehension is that if really want a one-liner, you have to nest a list comprehension that traverses the entire list multiple times (once for each key):

{k: [d for d in subblkDBF if d['OBJECTID'] == k] for k in set(d['OBJECTID'] for d in subblkDBF)}

Iterating over subblkDBF in both the inner and outer loop leads to O(n^2) complexity, which is pointless, especially given how illegible the result is.

As the other answer shows, these problems go away if you’re willing to sort the list first, or better yet, if it is already sorted.

Answered By: Mad Physicist

If rows are sorted by Object ID (or all rows with equal Object ID are at least next to each other, no matter the overall order of those IDs) you could write a neat dict comprehension using itertools.groupby:

from itertools import groupby
from operator import itemgetter

output = {k: list(g) for k, g in groupby(subblkDBF, key=itemgetter("OBJECTID"))}

However, if this is not the case, you’d have to sort by the same key first, making this a lot less neat, and less efficient than above or the loop (O(nlogn) instead of O(n)).

key = itemgetter("OBJECTID")
output = {k: list(g) for k, g in groupby(sorted(subblkDBF, key=key), key=key)}
Answered By: tobias_k

You can adding an else block to safe on time n slightly improve perfomrance a little:

output = {}
subblkDBF : list[dict]
for row in subblkDBF:
    if row["OBJECTID"] not in output:
        output[row["OBJECTID"]] = [row]
    else:
        output[row["OBJECTID"]].append(row)
Answered By: rv.kvetch
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.