How do I get a list of indices of non zero elements in a list?

Question:

I have a list that will always contain only ones and zeroes.
I need to get a list of the non-zero indices of the list:

a = [0, 1, 0, 1, 0, 0, 0, 0]
b = []
for i in range(len(a)):
    if a[i] == 1:  b.append(i)
print b

What would be the ‘pythonic’ way of achieving this ?

Asked By: George Profenza

||

Answers:

[i for i, e in enumerate(a) if e != 0]

Since THC4k mentioned compress (available in python2.7+)

>>> from itertools import compress, count
>>> x = [0, 1, 0, 1, 0, 0, 0, 0]
>>> compress(count(), x)
<itertools.compress object at 0x8c3666c>   
>>> list(_)
[1, 3]
Answered By: John La Rooy

Not really a "new" answer but numpy has this built in as well.

import numpy as np
a = [0, 1, 0, 1, 0, 0, 0, 0]
nonzeroind = np.nonzero(a)[0] # the return is a little funny so I use the [0]
print nonzeroind
[1 3]
Answered By: Brian Larsen

Just wished to add explanation for ‘funny’ output from the previous asnwer. Result is a tuple that contains vectors of indexes for each dimension of the matrix. In this case user is processing what is considered a vector in numpy, so output is tuple with one element.

import numpy as np
a = [0, 1, 0, 1, 0, 0, 0, 0]
nonzeroind = np.nonzero(a) 
print nonzeroind
(array([1, 3]),)
Answered By: Lexa

Time comparison of the two answers w.r.t length of the list

a = [int(random.random()>0.5) for i in range(10)]
%timeit [i for i, e in enumerate(a) if e != 0]
683 ns ± 14 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit np.nonzero(a)[0]
4.43 µs ± 102 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

a = [int(random.random()>0.5) for i in range(1000)]
%timeit [i for i, e in enumerate(a) if e != 0]
53.1 µs ± 2.02 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit np.nonzero(a)[0]
73.8 µs ± 2.71 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

a = [int(random.random()>0.5) for i in range(100000)]
%timeit [i for i, e in enumerate(a) if e != 0]
5.86 ms ± 79.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.nonzero(a)[0]
6.61 ms ± 14.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

With list length of 100000, changing the amount of ones in the list

a = [int(random.random()>0.1) for i in range(100000)]
%timeit [i for i, e in enumerate(a) if e != 0]
6.45 ms ± 28.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.nonzero(a)[0]
5.74 ms ± 9.25 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

a = [int(random.random()>0.9) for i in range(100000)]
%timeit [i for i, e in enumerate(a) if e != 0]
4.69 ms ± 12.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.nonzero(a)[0]
5.74 ms ± 6.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Amount of ones affects only the first option. np.nonzero() is better with high amount of non-zero elements. If the length is less than 10000, the first option is faster.

Answered By: asrvnon
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.