Numpy find number of occurrences in a 2D array

Question:

Is there a numpy function to count the number of occurrences of a certain value in a 2D numpy array. E.g.

np.random.random((3,3))

array([[ 0.68878371,  0.2511641 ,  0.05677177],
       [ 0.97784099,  0.96051717,  0.83723156],
       [ 0.49460617,  0.24623311,  0.86396798]])

How do I find the number of times 0.83723156 occurs in this array?

Asked By: user308827

||

Answers:

arr = np.random.random((3,3))
# find the number of elements that get really close to 1.0
condition = arr == 0.83723156
# count the elements
np.count_nonzero(condition)

The value of condition is a list of booleans representing whether each element of the array satisfied the condition. np.count_nonzero counts how many nonzero elements are in the array. In the case of booleans it counts the number of elements with a True value.

To be able to deal with floating point accuracy, you could do something like this instead:

condition = np.fabs(arr - 0.83723156) < 0.001
Answered By: Ritwik Bose

To count the number of times x appears in any array, you can simply sum the boolean array that results from a == x:

>>> col = numpy.arange(3)
>>> cols = numpy.tile(col, 3)
>>> (cols == 1).sum()
3

It should go without saying, but I’ll say it anyway: this is not very useful with floating point numbers unless you specify a range, like so:

>>> a = numpy.random.random((3, 3))
>>> ((a > 0.5) & (a < 0.75)).sum()
2

This general principle works for all sorts of tests. For example, if you want to count the number of floating point values that are integral:

>>> a = numpy.random.random((3, 3)) * 10
>>> a
array([[ 7.33955747,  0.89195947,  4.70725211],
       [ 6.63686955,  5.98693505,  4.47567936],
       [ 1.36965745,  5.01869306,  5.89245242]])
>>> a.astype(int)
array([[7, 0, 4],
       [6, 5, 4],
       [1, 5, 5]])
>>> (a == a.astype(int)).sum()
0
>>> a[1, 1] = 8
>>> (a == a.astype(int)).sum()
1

You can also use np.isclose() as described by Imanol Luengo, depending on what your goal is. But often, it’s more useful to know whether values are in a range than to know whether they are arbitrarily close to some arbitrary value.

The problem with isclose is that its default tolerance values (rtol and atol) are arbitrary, and the results it generates are not always obvious or easy to predict. To deal with complex floating point arithmetic, it does even more floating point arithmetic! A simple range is much easier to reason about precisely. (This is an expression of a more general principle: first, do the simplest thing that could possibly work.)

Still, isclose and its cousin allclose have their uses. I usually use them to see if a whole array is very similar to another whole array, which doesn’t seem to be your question.

Answered By: senderle

For floating point arrays np.isclose is much better option than either comparing with the exactly same element or defining a custom range.

>>> a = np.array([[ 0.68878371,  0.2511641 ,  0.05677177],
                  [ 0.97784099,  0.96051717,  0.83723156],
                  [ 0.49460617,  0.24623311,  0.86396798]])

>>> np.isclose(a, 0.83723156).sum()
1

Note that real numbers are not represented exactly in a computer, that is why np.isclose will work while == doesn’t:

>>> (0.1 + 0.2) == 0.3
False

Instead:

>>> np.isclose(0.1 + 0.2, 0.3)
True
Answered By: Imanol Luengo

If it may be of use to anyone: for very large 2D arrays, if you want to count how many time all elements appear within the entire array, one could flatten the array into a list and then count how many times each element appeared:

from itertools import chain
import collections
from collections import Counter

#large array is called arr
flatten_arr = list(chain.from_iterable(arr))
dico_nodeid_appearence = Counter(flatten_arr)
#how may times x appeared in the arr
dico_nodeid_appearence[x]
Answered By: miki
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.