How to check if one dictionary is a subset of another larger dictionary?
Question:
I’m trying to write a custom filter method that takes an arbitrary number of kwargs and returns a list containing the elements of a database-like list that contain those kwargs.
For example, suppose d1 = {'a':'2', 'b':'3'}
and d2
= the same thing. d1 == d2
results in True. But suppose d2
= the same thing plus a bunch of other things. My method needs to be able to tell if d1 in d2, but Python can’t do that with dictionaries.
Context:
I have a Word class, and each object has properties like word
, definition
, part_of_speech
, and so on. I want to be able to call a filter method on the main list of these words, like Word.objects.filter(word='jump', part_of_speech='verb-intransitive')
. I can’t figure out how to manage these keys and values at the same time. But this could have larger functionality outside this context for other people.
Answers:
Convert to item pairs and check for containment.
all(item in superset.items() for item in subset.items())
Optimization is left as an exercise for the reader.
>>> d1 = {'a':'2', 'b':'3'}
>>> d2 = {'a':'2', 'b':'3','c':'4'}
>>> all((k in d2 and d2[k]==v) for k,v in d1.iteritems())
True
context:
>>> d1 = {'a':'2', 'b':'3'}
>>> d2 = {'a':'2', 'b':'3','c':'4'}
>>> list(d1.iteritems())
[('a', '2'), ('b', '3')]
>>> [(k,v) for k,v in d1.iteritems()]
[('a', '2'), ('b', '3')]
>>> k,v = ('a','2')
>>> k
'a'
>>> v
'2'
>>> k in d2
True
>>> d2[k]
'2'
>>> k in d2 and d2[k]==v
True
>>> [(k in d2 and d2[k]==v) for k,v in d1.iteritems()]
[True, True]
>>> ((k in d2 and d2[k]==v) for k,v in d1.iteritems())
<generator object <genexpr> at 0x02A9D2B0>
>>> ((k in d2 and d2[k]==v) for k,v in d1.iteritems()).next()
True
>>> all((k in d2 and d2[k]==v) for k,v in d1.iteritems())
True
>>>
for keys and values check use:
set(d1.items()).issubset(set(d2.items()))
if you need to check only keys:
set(d1).issubset(set(d2))
My function for the same purpose, doing this recursively:
def dictMatch(patn, real):
"""does real dict match pattern?"""
try:
for pkey, pvalue in patn.iteritems():
if type(pvalue) is dict:
result = dictMatch(pvalue, real[pkey])
assert result
else:
assert real[pkey] == pvalue
result = True
except (AssertionError, KeyError):
result = False
return result
In your example, dictMatch(d1, d2)
should return True even if d2 has other stuff in it, plus it applies also to lower levels:
d1 = {'a':'2', 'b':{3: 'iii'}}
d2 = {'a':'2', 'b':{3: 'iii', 4: 'iv'},'c':'4'}
dictMatch(d1, d2) # True
Notes: There could be even better solution which avoids the if type(pvalue) is dict
clause and applies to even wider range of cases (like lists of hashes etc). Also recursion is not limited here so use at your own risk. đŸ˜‰
Note for people that need this for unit testing: there’s also an assertDictContainsSubset()
method in Python’s TestCase
class.
It’s however deprecated in 3.2, not sure why, maybe there’s a replacement for it.
This function works for non-hashable values. I also think that it is clear and easy to read.
def isSubDict(subDict,dictionary):
for key in subDict.keys():
if (not key in dictionary) or (not subDict[key] == dictionary[key]):
return False
return True
In [126]: isSubDict({1:2},{3:4})
Out[126]: False
In [127]: isSubDict({1:2},{1:2,3:4})
Out[127]: True
In [128]: isSubDict({1:{2:3}},{1:{2:3},3:4})
Out[128]: True
In [129]: isSubDict({1:{2:3}},{1:{2:4},3:4})
Out[129]: False
For completeness, you can also do this:
def is_subdict(small, big):
return dict(big, **small) == big
However, I make no claims whatsoever concerning speed (or lack thereof) or readability (or lack thereof).
Update: As pointed out by Boris’ comment, this trick does not work if your small dict has non-string keys and you’re using Python >= 3 (or in other words: in the face of arbitrarily typed keys, it only works in legacy Python 2.x).
If you are using Python 3.9 or newer, though, you can make it work both with non-string typed keys as well as get an even neater syntax.
Provided your code already has both dictionaries as variables, it’s very concise to check for this inline:
if big | small == big:
# do something
Otherwise, or if you prefer a reusable function as above, you can use this:
def is_subdict(small, big):
return big | small == big
The working principle is the same as the first function, only this time around making use of the union operator that was extended to support dicts.
In Python 3, you can use dict.items()
to get a set-like view of the dict items. You can then use the <=
operator to test if one view is a “subset” of the other:
d1.items() <= d2.items()
In Python 2.7, use the dict.viewitems()
to do the same:
d1.viewitems() <= d2.viewitems()
In Python 2.6 and below you will need a different solution, such as using all()
:
all(key in d2 and d2[key] == d1[key] for key in d1)
This seemingly straightforward issue costs me a couple hours in research to find a 100% reliable solution, so I documented what I’ve found in this answer.
-
“Pythonic-ally” speaking, small_dict <= big_dict
would be the most intuitive way, but too bad that it won’t work. {'a': 1} < {'a': 1, 'b': 2}
seemingly works in Python 2, but it is not reliable because the official documention explicitly calls it out. Go search “Outcomes other than equality are resolved consistently, but are not otherwise defined.” in this section. Not to mention, comparing 2 dicts in Python 3 results in a TypeError exception.
-
The second most-intuitive thing is small.viewitems() <= big.viewitems()
for Python 2.7 only, and small.items() <= big.items()
for Python 3. But there is one caveat: it is potentially buggy. If your program could potentially be used on Python <=2.6, its d1.items() <= d2.items()
are actually comparing 2 lists of tuples, without particular order, so the final result will be unreliable and it becomes a nasty bug in your program. I am not keen to write yet another implementation for Python<=2.6, but I still don’t feel comfortable that my code comes with a known bug (even if it is on an unsupported platform). So I abandon this approach.
-
I settle down with @blubberdiblub ‘s answer (Credit goes to him):
def is_subdict(small, big):
return dict(big, **small) == big
It is worth pointing out that, this answer relies on the ==
behavior between dicts, which is clearly defined in official document, hence should work in every Python version. Go search:
Here’s a general recursive solution for the problem given:
import traceback
import unittest
def is_subset(superset, subset):
for key, value in subset.items():
if key not in superset:
return False
if isinstance(value, dict):
if not is_subset(superset[key], value):
return False
elif isinstance(value, str):
if value not in superset[key]:
return False
elif isinstance(value, list):
if not set(value) <= set(superset[key]):
return False
elif isinstance(value, set):
if not value <= superset[key]:
return False
else:
if not value == superset[key]:
return False
return True
class Foo(unittest.TestCase):
def setUp(self):
self.dct = {
'a': 'hello world',
'b': 12345,
'c': 1.2345,
'd': [1, 2, 3, 4, 5],
'e': {1, 2, 3, 4, 5},
'f': {
'a': 'hello world',
'b': 12345,
'c': 1.2345,
'd': [1, 2, 3, 4, 5],
'e': {1, 2, 3, 4, 5},
'g': False,
'h': None
},
'g': False,
'h': None,
'question': 'mcve',
'metadata': {}
}
def tearDown(self):
pass
def check_true(self, superset, subset):
return self.assertEqual(is_subset(superset, subset), True)
def check_false(self, superset, subset):
return self.assertEqual(is_subset(superset, subset), False)
def test_simple_cases(self):
self.check_true(self.dct, {'a': 'hello world'})
self.check_true(self.dct, {'b': 12345})
self.check_true(self.dct, {'c': 1.2345})
self.check_true(self.dct, {'d': [1, 2, 3, 4, 5]})
self.check_true(self.dct, {'e': {1, 2, 3, 4, 5}})
self.check_true(self.dct, {'f': {
'a': 'hello world',
'b': 12345,
'c': 1.2345,
'd': [1, 2, 3, 4, 5],
'e': {1, 2, 3, 4, 5},
}})
self.check_true(self.dct, {'g': False})
self.check_true(self.dct, {'h': None})
def test_tricky_cases(self):
self.check_true(self.dct, {'a': 'hello'})
self.check_true(self.dct, {'d': [1, 2, 3]})
self.check_true(self.dct, {'e': {3, 4}})
self.check_true(self.dct, {'f': {
'a': 'hello world',
'h': None
}})
self.check_false(
self.dct, {'question': 'mcve', 'metadata': {'author': 'BPL'}})
self.check_true(
self.dct, {'question': 'mcve', 'metadata': {}})
self.check_false(
self.dct, {'question1': 'mcve', 'metadata': {}})
if __name__ == "__main__":
unittest.main()
NOTE: The original code would fail in certain cases, credits for the fixing goes to @olivier-melançon
I know this question is old, but here is my solution for checking if one nested dictionary is a part of another nested dictionary. The solution is recursive.
def compare_dicts(a, b):
for key, value in a.items():
if key in b:
if isinstance(a[key], dict):
if not compare_dicts(a[key], b[key]):
return False
elif value != b[key]:
return False
else:
return False
return True
If you don’t mind using pydash
there is is_match
there which does exactly that:
import pydash
a = {1:2, 3:4, 5:{6:7}}
b = {3:4.0, 5:{6:8}}
c = {3:4.0, 5:{6:7}}
pydash.predicates.is_match(a, b) # False
pydash.predicates.is_match(a, c) # True
A short recursive implementation that works for nested dictionaries:
def compare_dicts(a,b):
if not a: return True
if isinstance(a, dict):
key, val = a.popitem()
return isinstance(b, dict) and key in b and compare_dicts(val, b.pop(key)) and compare_dicts(a, b)
return a == b
This will consume the a and b dicts. If anyone knows of a good way to avoid that without resorting to partially iterative solutions as in other answers, please tell me. I would need a way to split a dict into head and tail based on a key.
This code is more usefull as a programming exercise, and probably is a lot slower than other solutions in here that mix recursion and iteration. @Nutcracker’s solution is pretty good for nested dictionaries.
Here is a solution that also properly recurses into lists and sets contained within the dictionary. You can also use this for lists containing dicts etc…
def is_subset(subset, superset):
if isinstance(subset, dict):
return all(key in superset and is_subset(val, superset[key]) for key, val in subset.items())
if isinstance(subset, list) or isinstance(subset, set):
return all(any(is_subset(subitem, superitem) for superitem in superset) for subitem in subset)
# assume that subset is a plain value if none of the above match
return subset == superset
When using python 3.10, you can use python’s new match statements to do the typechecking:
def is_subset(subset, superset):
match subset:
case dict(_): return all(key in superset and is_subset(val, superset[key]) for key, val in subset.items())
case list(_) | set(_): return all(any(is_subset(subitem, superitem) for superitem in superset) for subitem in subset)
# assume that subset is a plain value if none of the above match
case _: return subset == superset
Use this wrapper object that provides partial comparison and nice diffs:
class DictMatch(dict):
""" Partial match of a dictionary to another one """
def __eq__(self, other: dict):
assert isinstance(other, dict)
return all(other[name] == value for name, value in self.items())
actual_name = {'praenomen': 'Gaius', 'nomen': 'Julius', 'cognomen': 'Caesar'}
expected_name = DictMatch({'praenomen': 'Gaius'}) # partial match
assert expected_name == actual_name # True
Most of the answers will not work if within dict there are some arrays of other dicts, here is a solution for this:
def d_eq(d, d1):
if not isinstance(d, (dict, list)):
return d == d1
if isinstance(d, list):
return all(d_eq(a, b) for a, b in zip(d, d1))
return all(d.get(i) == d1[i] or d_eq(d.get(i), d1[i]) for i in d1)
def is_sub(d, d1):
if isinstance(d, list):
return any(is_sub(i, d1) for i in d)
return d_eq(d, d1) or (isinstance(d, dict) and any(is_sub(b, d1) for b in d.values()))
print(is_sub(dct_1, dict_2))
Taken from How to check if dict is subset of another complex dict
Another way of doing this:
>>> d1 = {'a':'2', 'b':'3'}
>>> d2 = {'a':'2', 'b':'3','c':'4'}
>>> d3 = {'a':'1'}
>>> set(d1.items()).issubset(d2.items())
True
>>> set(d3.items()).issubset(d2.items())
False
With Python 3.9, this is what I use:
def dict_contains_dict(small: dict, big: dict):
return (big | small) == big
I’m trying to write a custom filter method that takes an arbitrary number of kwargs and returns a list containing the elements of a database-like list that contain those kwargs.
For example, suppose d1 = {'a':'2', 'b':'3'}
and d2
= the same thing. d1 == d2
results in True. But suppose d2
= the same thing plus a bunch of other things. My method needs to be able to tell if d1 in d2, but Python can’t do that with dictionaries.
Context:
I have a Word class, and each object has properties like word
, definition
, part_of_speech
, and so on. I want to be able to call a filter method on the main list of these words, like Word.objects.filter(word='jump', part_of_speech='verb-intransitive')
. I can’t figure out how to manage these keys and values at the same time. But this could have larger functionality outside this context for other people.
Convert to item pairs and check for containment.
all(item in superset.items() for item in subset.items())
Optimization is left as an exercise for the reader.
>>> d1 = {'a':'2', 'b':'3'}
>>> d2 = {'a':'2', 'b':'3','c':'4'}
>>> all((k in d2 and d2[k]==v) for k,v in d1.iteritems())
True
context:
>>> d1 = {'a':'2', 'b':'3'}
>>> d2 = {'a':'2', 'b':'3','c':'4'}
>>> list(d1.iteritems())
[('a', '2'), ('b', '3')]
>>> [(k,v) for k,v in d1.iteritems()]
[('a', '2'), ('b', '3')]
>>> k,v = ('a','2')
>>> k
'a'
>>> v
'2'
>>> k in d2
True
>>> d2[k]
'2'
>>> k in d2 and d2[k]==v
True
>>> [(k in d2 and d2[k]==v) for k,v in d1.iteritems()]
[True, True]
>>> ((k in d2 and d2[k]==v) for k,v in d1.iteritems())
<generator object <genexpr> at 0x02A9D2B0>
>>> ((k in d2 and d2[k]==v) for k,v in d1.iteritems()).next()
True
>>> all((k in d2 and d2[k]==v) for k,v in d1.iteritems())
True
>>>
for keys and values check use:
set(d1.items()).issubset(set(d2.items()))
if you need to check only keys:
set(d1).issubset(set(d2))
My function for the same purpose, doing this recursively:
def dictMatch(patn, real):
"""does real dict match pattern?"""
try:
for pkey, pvalue in patn.iteritems():
if type(pvalue) is dict:
result = dictMatch(pvalue, real[pkey])
assert result
else:
assert real[pkey] == pvalue
result = True
except (AssertionError, KeyError):
result = False
return result
In your example, dictMatch(d1, d2)
should return True even if d2 has other stuff in it, plus it applies also to lower levels:
d1 = {'a':'2', 'b':{3: 'iii'}}
d2 = {'a':'2', 'b':{3: 'iii', 4: 'iv'},'c':'4'}
dictMatch(d1, d2) # True
Notes: There could be even better solution which avoids the if type(pvalue) is dict
clause and applies to even wider range of cases (like lists of hashes etc). Also recursion is not limited here so use at your own risk. đŸ˜‰
Note for people that need this for unit testing: there’s also an assertDictContainsSubset()
method in Python’s TestCase
class.
It’s however deprecated in 3.2, not sure why, maybe there’s a replacement for it.
This function works for non-hashable values. I also think that it is clear and easy to read.
def isSubDict(subDict,dictionary):
for key in subDict.keys():
if (not key in dictionary) or (not subDict[key] == dictionary[key]):
return False
return True
In [126]: isSubDict({1:2},{3:4})
Out[126]: False
In [127]: isSubDict({1:2},{1:2,3:4})
Out[127]: True
In [128]: isSubDict({1:{2:3}},{1:{2:3},3:4})
Out[128]: True
In [129]: isSubDict({1:{2:3}},{1:{2:4},3:4})
Out[129]: False
For completeness, you can also do this:
def is_subdict(small, big):
return dict(big, **small) == big
However, I make no claims whatsoever concerning speed (or lack thereof) or readability (or lack thereof).
Update: As pointed out by Boris’ comment, this trick does not work if your small dict has non-string keys and you’re using Python >= 3 (or in other words: in the face of arbitrarily typed keys, it only works in legacy Python 2.x).
If you are using Python 3.9 or newer, though, you can make it work both with non-string typed keys as well as get an even neater syntax.
Provided your code already has both dictionaries as variables, it’s very concise to check for this inline:
if big | small == big:
# do something
Otherwise, or if you prefer a reusable function as above, you can use this:
def is_subdict(small, big):
return big | small == big
The working principle is the same as the first function, only this time around making use of the union operator that was extended to support dicts.
In Python 3, you can use dict.items()
to get a set-like view of the dict items. You can then use the <=
operator to test if one view is a “subset” of the other:
d1.items() <= d2.items()
In Python 2.7, use the dict.viewitems()
to do the same:
d1.viewitems() <= d2.viewitems()
In Python 2.6 and below you will need a different solution, such as using all()
:
all(key in d2 and d2[key] == d1[key] for key in d1)
This seemingly straightforward issue costs me a couple hours in research to find a 100% reliable solution, so I documented what I’ve found in this answer.
-
“Pythonic-ally” speaking,
small_dict <= big_dict
would be the most intuitive way, but too bad that it won’t work.{'a': 1} < {'a': 1, 'b': 2}
seemingly works in Python 2, but it is not reliable because the official documention explicitly calls it out. Go search “Outcomes other than equality are resolved consistently, but are not otherwise defined.” in this section. Not to mention, comparing 2 dicts in Python 3 results in a TypeError exception. -
The second most-intuitive thing is
small.viewitems() <= big.viewitems()
for Python 2.7 only, andsmall.items() <= big.items()
for Python 3. But there is one caveat: it is potentially buggy. If your program could potentially be used on Python <=2.6, itsd1.items() <= d2.items()
are actually comparing 2 lists of tuples, without particular order, so the final result will be unreliable and it becomes a nasty bug in your program. I am not keen to write yet another implementation for Python<=2.6, but I still don’t feel comfortable that my code comes with a known bug (even if it is on an unsupported platform). So I abandon this approach. -
I settle down with @blubberdiblub ‘s answer (Credit goes to him):
def is_subdict(small, big):
return dict(big, **small) == bigIt is worth pointing out that, this answer relies on the
==
behavior between dicts, which is clearly defined in official document, hence should work in every Python version. Go search:
Here’s a general recursive solution for the problem given:
import traceback
import unittest
def is_subset(superset, subset):
for key, value in subset.items():
if key not in superset:
return False
if isinstance(value, dict):
if not is_subset(superset[key], value):
return False
elif isinstance(value, str):
if value not in superset[key]:
return False
elif isinstance(value, list):
if not set(value) <= set(superset[key]):
return False
elif isinstance(value, set):
if not value <= superset[key]:
return False
else:
if not value == superset[key]:
return False
return True
class Foo(unittest.TestCase):
def setUp(self):
self.dct = {
'a': 'hello world',
'b': 12345,
'c': 1.2345,
'd': [1, 2, 3, 4, 5],
'e': {1, 2, 3, 4, 5},
'f': {
'a': 'hello world',
'b': 12345,
'c': 1.2345,
'd': [1, 2, 3, 4, 5],
'e': {1, 2, 3, 4, 5},
'g': False,
'h': None
},
'g': False,
'h': None,
'question': 'mcve',
'metadata': {}
}
def tearDown(self):
pass
def check_true(self, superset, subset):
return self.assertEqual(is_subset(superset, subset), True)
def check_false(self, superset, subset):
return self.assertEqual(is_subset(superset, subset), False)
def test_simple_cases(self):
self.check_true(self.dct, {'a': 'hello world'})
self.check_true(self.dct, {'b': 12345})
self.check_true(self.dct, {'c': 1.2345})
self.check_true(self.dct, {'d': [1, 2, 3, 4, 5]})
self.check_true(self.dct, {'e': {1, 2, 3, 4, 5}})
self.check_true(self.dct, {'f': {
'a': 'hello world',
'b': 12345,
'c': 1.2345,
'd': [1, 2, 3, 4, 5],
'e': {1, 2, 3, 4, 5},
}})
self.check_true(self.dct, {'g': False})
self.check_true(self.dct, {'h': None})
def test_tricky_cases(self):
self.check_true(self.dct, {'a': 'hello'})
self.check_true(self.dct, {'d': [1, 2, 3]})
self.check_true(self.dct, {'e': {3, 4}})
self.check_true(self.dct, {'f': {
'a': 'hello world',
'h': None
}})
self.check_false(
self.dct, {'question': 'mcve', 'metadata': {'author': 'BPL'}})
self.check_true(
self.dct, {'question': 'mcve', 'metadata': {}})
self.check_false(
self.dct, {'question1': 'mcve', 'metadata': {}})
if __name__ == "__main__":
unittest.main()
NOTE: The original code would fail in certain cases, credits for the fixing goes to @olivier-melançon
I know this question is old, but here is my solution for checking if one nested dictionary is a part of another nested dictionary. The solution is recursive.
def compare_dicts(a, b):
for key, value in a.items():
if key in b:
if isinstance(a[key], dict):
if not compare_dicts(a[key], b[key]):
return False
elif value != b[key]:
return False
else:
return False
return True
If you don’t mind using pydash
there is is_match
there which does exactly that:
import pydash
a = {1:2, 3:4, 5:{6:7}}
b = {3:4.0, 5:{6:8}}
c = {3:4.0, 5:{6:7}}
pydash.predicates.is_match(a, b) # False
pydash.predicates.is_match(a, c) # True
A short recursive implementation that works for nested dictionaries:
def compare_dicts(a,b):
if not a: return True
if isinstance(a, dict):
key, val = a.popitem()
return isinstance(b, dict) and key in b and compare_dicts(val, b.pop(key)) and compare_dicts(a, b)
return a == b
This will consume the a and b dicts. If anyone knows of a good way to avoid that without resorting to partially iterative solutions as in other answers, please tell me. I would need a way to split a dict into head and tail based on a key.
This code is more usefull as a programming exercise, and probably is a lot slower than other solutions in here that mix recursion and iteration. @Nutcracker’s solution is pretty good for nested dictionaries.
Here is a solution that also properly recurses into lists and sets contained within the dictionary. You can also use this for lists containing dicts etc…
def is_subset(subset, superset):
if isinstance(subset, dict):
return all(key in superset and is_subset(val, superset[key]) for key, val in subset.items())
if isinstance(subset, list) or isinstance(subset, set):
return all(any(is_subset(subitem, superitem) for superitem in superset) for subitem in subset)
# assume that subset is a plain value if none of the above match
return subset == superset
When using python 3.10, you can use python’s new match statements to do the typechecking:
def is_subset(subset, superset):
match subset:
case dict(_): return all(key in superset and is_subset(val, superset[key]) for key, val in subset.items())
case list(_) | set(_): return all(any(is_subset(subitem, superitem) for superitem in superset) for subitem in subset)
# assume that subset is a plain value if none of the above match
case _: return subset == superset
Use this wrapper object that provides partial comparison and nice diffs:
class DictMatch(dict):
""" Partial match of a dictionary to another one """
def __eq__(self, other: dict):
assert isinstance(other, dict)
return all(other[name] == value for name, value in self.items())
actual_name = {'praenomen': 'Gaius', 'nomen': 'Julius', 'cognomen': 'Caesar'}
expected_name = DictMatch({'praenomen': 'Gaius'}) # partial match
assert expected_name == actual_name # True
Most of the answers will not work if within dict there are some arrays of other dicts, here is a solution for this:
def d_eq(d, d1):
if not isinstance(d, (dict, list)):
return d == d1
if isinstance(d, list):
return all(d_eq(a, b) for a, b in zip(d, d1))
return all(d.get(i) == d1[i] or d_eq(d.get(i), d1[i]) for i in d1)
def is_sub(d, d1):
if isinstance(d, list):
return any(is_sub(i, d1) for i in d)
return d_eq(d, d1) or (isinstance(d, dict) and any(is_sub(b, d1) for b in d.values()))
print(is_sub(dct_1, dict_2))
Taken from How to check if dict is subset of another complex dict
Another way of doing this:
>>> d1 = {'a':'2', 'b':'3'}
>>> d2 = {'a':'2', 'b':'3','c':'4'}
>>> d3 = {'a':'1'}
>>> set(d1.items()).issubset(d2.items())
True
>>> set(d3.items()).issubset(d2.items())
False
With Python 3.9, this is what I use:
def dict_contains_dict(small: dict, big: dict):
return (big | small) == big