compare a defaultdict key-value with another defaultdict

Question:

I have two defaultdict :

defaultdict(<type 'list'>, {'a': ['OS', 'sys', 'procs'], 'b': ['OS', 'sys']})

defaultdict(<type 'list'>, {'a': ['OS', 'sys'], 'b': ['OS']})

How do I compare these two to get the count of values missing from each one.
For example I should get two values are missing from second defaultdict for key 'a' and one missing from 'b'.

Asked By: Rob

||

Answers:

You should be able to use set differences to find (and count) missing elements most efficiently. If you’re careful, you can even do this without adding items to the defaultdict (and without assuming that the functions inputs are defaultdict).

From there, it becomes just a matter of accumulating those results in a dictionary.

def compare_dict_of_list(d1, d2):
    d = {}
    for key, value in d1.items():
        diff_count = len(set(value).difference(d2.get(key, [])))
        d[key] = diff_count
    return d
Answered By: mgilson

If you just want the total number missing from the second default dict, you can iterate through the first dict and look at the set difference to figure out how many more things are in A relative to B.

If you define the dicts like this:

a = defaultdict(list, {'a': ['OS', 'sys', 'procs'], 'b': ['OS', 'sys']})
b = defaultdict(list, {'a': ['OS', 'sys'], 'b': ['OS']})

This will tell you how many are missing from dict B:

total_missing_inB = 0
for i in a:
    diff = set(a[i]) - set(b[i])
    total_missing_inB += len(diff)

And this will tell you how many are missing from dict A

total_missing_inA = 0
for i in b:
    diff = set(b[i]) - set(a[i])
    total_missing_inA += len(diff)
Answered By: bmcmenamin

Here we present an alternate solution using collections.Counter to track values, and we consider some edge cases concerning uncommon keys and values.

Code

import collections as ct


def compare_missing(d1, d2, verbose=False):
    """Return the count of missing values from dict 2 compared to dict 1."""
    record = {}
    for k in d1.keys() & d2.keys():
        a, b = ct.Counter(d1[k]), ct.Counter(d2[k])
        record[k] = a - b
    if verbose: print(record)
    return sum(v for c in record.values() for v in c.values())

Demo

dd0 = ct.defaultdict(list, {"a": ["OS", "sys", "procs"], "b": ["OS", "sys"]})
dd1 = ct.defaultdict(list, {"a": ["OS", "sys"], "b": ["OS"]})

compare_missing(dd0, dd1, True)
# {'a': Counter({'procs': 1}), 'b': Counter({'sys': 1})}
# 2

compare_missing(dd1, dd0, True)
# {'a': Counter(), 'b': Counter()}
# 0
    

Details

compare_missing() will only iterate common keys. In the next example, even though a new key (c) was added to dd1, we get the same results as above:

dd2 = ct.defaultdict(list, {"a": ["OS", "sys"], "b": ["OS"], "c": ["OS"]})
compare_missing(dd0, dd2)
# 2

compare_missing(dd2, dd0)
# 0

If uncommon values or replicates are found (i.e. "admin" and "OS" in dd3[b] respectively), these occurrences are counted as well:

dd3 = ct.defaultdict(list, {"a": ["OS", "sys"], "b": ["OS", "admin", "OS"]})
compare_missing(dd3, dd0, True)
# {'a': Counter(), 'b': Counter({'OS': 1, 'admin': 1})}
# 2
Answered By: pylang
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.