Conditional counting in Python
Question:
not sure this was asked before, but I couldn’t find an obvious answer. I’m trying to count the number of elements in a list that are equal to a certain value. The problem is that these elements are not of a built-in type. So if I have
class A:
def __init__(self, a, b):
self.a = a
self.b = b
stuff = []
for i in range(1,10):
stuff.append(A(i/2, i%2))
Now I would like a count of the list elements whose field b = 1. I came up with two solutions:
print [e.b for e in stuff].count(1)
and
print len([e for e in stuff if e.b == 1])
Which is the best method? Is there a better alternative? It seems that the count() method does not accept keys (at least in Python version 2.5.1.
Many thanks!
Answers:
print sum(1 for e in L if e.b == 1)
I would prefer the second one as it’s only looping over the list once.
If you use count()
you’re looping over the list once to get the b
values, and then looping over it again to see how many of them equal 1.
A neat way may to use reduce()
:
reduce(lambda x,y: x + (1 if y.b == 1 else 0),list,0)
The documentation tells us that reduce()
will:
Apply function of two arguments cumulatively to the items of iterable, from left to right, so as to reduce the iterable to a single value.
So we define a lambda
that adds one the accumulated value only if the list item’s b
attribute is 1.
sum(x.b == 1 for x in L)
A boolean (as resulting from comparisons such as x.b == 1
) is also an int
, with a value of 0
for False
, 1
for True
, so arithmetic such as summation works just fine.
This is the simplest code, but perhaps not the speediest (only timeit
can tell you for sure;-). Consider (simplified case to fit well on command lines, but equivalent):
$ py26 -mtimeit -s'L=[1,2,1,3,1]*100' 'len([x for x in L if x==1])'
10000 loops, best of 3: 56.6 usec per loop
$ py26 -mtimeit -s'L=[1,2,1,3,1]*100' 'sum(x==1 for x in L)'
10000 loops, best of 3: 87.7 usec per loop
So, for this case, the “memory wasteful” approach of generating an extra temporary list and checking its length is actually solidly faster than the simpler, shorter, memory-thrifty one I tend to prefer. Other mixes of list values, Python implementations, availability of memory to “invest” in this speedup, etc, can affect the exact performance, of course.
To hide reduce
details, you may define a count
function:
def count(condition, stuff):
return reduce(lambda s, x:
s + (1 if condition(x) else 0), stuff, 0)
Then you may use it by providing the condition for counting:
n = count(lambda i: i.b, stuff)
Given the input
name = ['ball', 'jeans', 'ball', 'ball', 'ball', 'jeans']
price = [1, 4, 1, 1, 1, 4]
weight = [2, 2, 2, 3, 2, 2]
First create a defaultdict
to record the occurrence
from collections import defaultdict
occurrences = defaultdict(int)
Increment the count
for n, p, w in zip(name, price, weight):
occurrences[(n, p, w)] += 1
Finally count the ones that appear more than once (True
will yield 1)
print(sum(cnt > 1 for cnt in occurrences.values())
not sure this was asked before, but I couldn’t find an obvious answer. I’m trying to count the number of elements in a list that are equal to a certain value. The problem is that these elements are not of a built-in type. So if I have
class A:
def __init__(self, a, b):
self.a = a
self.b = b
stuff = []
for i in range(1,10):
stuff.append(A(i/2, i%2))
Now I would like a count of the list elements whose field b = 1. I came up with two solutions:
print [e.b for e in stuff].count(1)
and
print len([e for e in stuff if e.b == 1])
Which is the best method? Is there a better alternative? It seems that the count() method does not accept keys (at least in Python version 2.5.1.
Many thanks!
print sum(1 for e in L if e.b == 1)
I would prefer the second one as it’s only looping over the list once.
If you use count()
you’re looping over the list once to get the b
values, and then looping over it again to see how many of them equal 1.
A neat way may to use reduce()
:
reduce(lambda x,y: x + (1 if y.b == 1 else 0),list,0)
The documentation tells us that reduce()
will:
Apply function of two arguments cumulatively to the items of iterable, from left to right, so as to reduce the iterable to a single value.
So we define a lambda
that adds one the accumulated value only if the list item’s b
attribute is 1.
sum(x.b == 1 for x in L)
A boolean (as resulting from comparisons such as x.b == 1
) is also an int
, with a value of 0
for False
, 1
for True
, so arithmetic such as summation works just fine.
This is the simplest code, but perhaps not the speediest (only timeit
can tell you for sure;-). Consider (simplified case to fit well on command lines, but equivalent):
$ py26 -mtimeit -s'L=[1,2,1,3,1]*100' 'len([x for x in L if x==1])'
10000 loops, best of 3: 56.6 usec per loop
$ py26 -mtimeit -s'L=[1,2,1,3,1]*100' 'sum(x==1 for x in L)'
10000 loops, best of 3: 87.7 usec per loop
So, for this case, the “memory wasteful” approach of generating an extra temporary list and checking its length is actually solidly faster than the simpler, shorter, memory-thrifty one I tend to prefer. Other mixes of list values, Python implementations, availability of memory to “invest” in this speedup, etc, can affect the exact performance, of course.
To hide reduce
details, you may define a count
function:
def count(condition, stuff):
return reduce(lambda s, x:
s + (1 if condition(x) else 0), stuff, 0)
Then you may use it by providing the condition for counting:
n = count(lambda i: i.b, stuff)
Given the input
name = ['ball', 'jeans', 'ball', 'ball', 'ball', 'jeans']
price = [1, 4, 1, 1, 1, 4]
weight = [2, 2, 2, 3, 2, 2]
First create a defaultdict
to record the occurrence
from collections import defaultdict
occurrences = defaultdict(int)
Increment the count
for n, p, w in zip(name, price, weight):
occurrences[(n, p, w)] += 1
Finally count the ones that appear more than once (True
will yield 1)
print(sum(cnt > 1 for cnt in occurrences.values())