Conditional counting in Python

Question:

not sure this was asked before, but I couldn’t find an obvious answer. I’m trying to count the number of elements in a list that are equal to a certain value. The problem is that these elements are not of a built-in type. So if I have

class A:
    def __init__(self, a, b):
        self.a = a
        self.b = b

stuff = []
for i in range(1,10):
    stuff.append(A(i/2, i%2))

Now I would like a count of the list elements whose field b = 1. I came up with two solutions:

print [e.b for e in stuff].count(1)

and

print len([e for e in stuff if e.b == 1])

Which is the best method? Is there a better alternative? It seems that the count() method does not accept keys (at least in Python version 2.5.1.

Many thanks!

Asked By: nicolaum

||

Answers:

print sum(1 for e in L if e.b == 1)
Answered By: Roger Pate

I would prefer the second one as it’s only looping over the list once.

If you use count() you’re looping over the list once to get the b values, and then looping over it again to see how many of them equal 1.

A neat way may to use reduce():

reduce(lambda x,y: x + (1 if y.b == 1 else 0),list,0)

The documentation tells us that reduce() will:

Apply function of two arguments cumulatively to the items of iterable, from left to right, so as to reduce the iterable to a single value.

So we define a lambda that adds one the accumulated value only if the list item’s b attribute is 1.

Answered By: Dave Webb
sum(x.b == 1 for x in L)

A boolean (as resulting from comparisons such as x.b == 1) is also an int, with a value of 0 for False, 1 for True, so arithmetic such as summation works just fine.

This is the simplest code, but perhaps not the speediest (only timeit can tell you for sure;-). Consider (simplified case to fit well on command lines, but equivalent):

$ py26 -mtimeit -s'L=[1,2,1,3,1]*100' 'len([x for x in L if x==1])'
10000 loops, best of 3: 56.6 usec per loop
$ py26 -mtimeit -s'L=[1,2,1,3,1]*100' 'sum(x==1 for x in L)'
10000 loops, best of 3: 87.7 usec per loop

So, for this case, the “memory wasteful” approach of generating an extra temporary list and checking its length is actually solidly faster than the simpler, shorter, memory-thrifty one I tend to prefer. Other mixes of list values, Python implementations, availability of memory to “invest” in this speedup, etc, can affect the exact performance, of course.

Answered By: Alex Martelli

To hide reduce details, you may define a count function:

def count(condition, stuff):
    return reduce(lambda s, x: 
                  s + (1 if condition(x) else 0), stuff, 0)

Then you may use it by providing the condition for counting:

n = count(lambda i: i.b, stuff)
Answered By: Calvin

Given the input

name = ['ball', 'jeans', 'ball', 'ball', 'ball', 'jeans']
price = [1, 4, 1, 1, 1, 4]
weight = [2, 2, 2, 3, 2, 2]

First create a defaultdict to record the occurrence

from collections import defaultdict
occurrences = defaultdict(int)

Increment the count

for n, p, w in zip(name, price, weight):
    occurrences[(n, p, w)] += 1

Finally count the ones that appear more than once (True will yield 1)

print(sum(cnt > 1 for cnt in occurrences.values())
Answered By: Alan Wu
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.