Test if python Counter is contained in another Counter
Question:
How to test if a python Counter
is contained in another one using the following definition:
A Counter a
is contained in a Counter b
if, and only if, for every key k
in a
, the value a[k]
is less or equal to the value b[k]
. The Counter({'a': 1, 'b': 1})
is contained in Counter({'a': 2, 'b': 2})
but it is not contained in Counter({'a': 2, 'c': 2})
.
I think it is a poor design choice but in python 2.x the comparison operators (<
, <=
, >=
, >
) do not use the previous definition, so the third Counter is considered greater-than the first. In python 3.x, instead, Counter
is an unorderable type.
Answers:
Update 2023: Counter
supports rich comparison operators as of python 3.10
, so this works:
container <= contained
Historical answer for python < 3.10:
The best I came up with is to convert the definition i gave in code:
def contains(container, contained):
return all(container[x] >= contained[x] for x in contained)
But if feels strange that python don’t have an out-of-the-box solution and I have to write a function for every operator (or make a generic one and pass the comparison function).
While Counter
instances are not comparable with the <
and >
operators, you can find their difference with the -
operator. The difference never returns negative counts, so if A - B
is empty, you know that B
contains all the items in A
.
def contains(larger, smaller):
return not smaller - larger
For all the keys in smaller Counter
make sure that no value is greater than its counterpart in the bigger Counter
:
def containment(big, small):
return not any(v > big[k] for (k, v) in small.iteritems())
>>> containment(Counter({'a': 2, 'b': 2}), Counter({'a': 1, 'b': 1}))
True
>>> containment(Counter({'a': 2, 'c': 2, 'b': 3}), Counter({'a': 2, 'b': 2}))
True
>>> print containment(Counter({'a': 2, 'b': 2}), Counter({'a': 2, 'b': 2, 'c':1}))
False
>>> print containment(Counter({'a': 2, 'c': 2}), Counter({'a': 1, 'b': 1})
False
Another, fairly succinct, way to express this:
"Counter A is a subset of Counter B" is equivalent to (A & B) == A
.
That’s because the intersection (&
) of two Counters has the counts of elements common to both. That’ll be the same as A
if every element of A
(counting multiplicity) is also in B
; otherwise it will be smaller.
Performance-wise, this seems to be about the same as the not A - B
method proposed by Blckknght. Checking each key as in the answer of enrico.bacis is considerably faster.
As a variation, you can also check that the union is equal to the larger Counter (so nothing was added): (A | B) == B
. This is noticeably slower for some largish multisets I tested (1,000,000 elements).
How to test if a python Counter
is contained in another one using the following definition:
A Counter
a
is contained in a Counterb
if, and only if, for every keyk
ina
, the valuea[k]
is less or equal to the valueb[k]
. TheCounter({'a': 1, 'b': 1})
is contained inCounter({'a': 2, 'b': 2})
but it is not contained inCounter({'a': 2, 'c': 2})
.
I think it is a poor design choice but in python 2.x the comparison operators (<
, <=
, >=
, >
) do not use the previous definition, so the third Counter is considered greater-than the first. In python 3.x, instead, Counter
is an unorderable type.
Update 2023: Counter
supports rich comparison operators as of python 3.10
, so this works:
container <= contained
Historical answer for python < 3.10:
The best I came up with is to convert the definition i gave in code:
def contains(container, contained):
return all(container[x] >= contained[x] for x in contained)
But if feels strange that python don’t have an out-of-the-box solution and I have to write a function for every operator (or make a generic one and pass the comparison function).
While Counter
instances are not comparable with the <
and >
operators, you can find their difference with the -
operator. The difference never returns negative counts, so if A - B
is empty, you know that B
contains all the items in A
.
def contains(larger, smaller):
return not smaller - larger
For all the keys in smaller Counter
make sure that no value is greater than its counterpart in the bigger Counter
:
def containment(big, small):
return not any(v > big[k] for (k, v) in small.iteritems())
>>> containment(Counter({'a': 2, 'b': 2}), Counter({'a': 1, 'b': 1}))
True
>>> containment(Counter({'a': 2, 'c': 2, 'b': 3}), Counter({'a': 2, 'b': 2}))
True
>>> print containment(Counter({'a': 2, 'b': 2}), Counter({'a': 2, 'b': 2, 'c':1}))
False
>>> print containment(Counter({'a': 2, 'c': 2}), Counter({'a': 1, 'b': 1})
False
Another, fairly succinct, way to express this:
"Counter A is a subset of Counter B" is equivalent to (A & B) == A
.
That’s because the intersection (&
) of two Counters has the counts of elements common to both. That’ll be the same as A
if every element of A
(counting multiplicity) is also in B
; otherwise it will be smaller.
Performance-wise, this seems to be about the same as the not A - B
method proposed by Blckknght. Checking each key as in the answer of enrico.bacis is considerably faster.
As a variation, you can also check that the union is equal to the larger Counter (so nothing was added): (A | B) == B
. This is noticeably slower for some largish multisets I tested (1,000,000 elements).