Length of generator output

Question:

Python provides a nice method for getting length of an eager iterable, len(x) that is. But I couldn’t find anything similar for lazy iterables represented by generator comprehensions and functions. Of course, it is not hard to write something like:

def iterlen(x):
  n = 0
  try:
    while True:
      next(x)
      n += 1
  except StopIteration: pass
  return n

But I can’t get rid of a feeling that I’m reimplementing a bicycle.

(While I was typing the function, a thought struck my mind: maybe there really is no such function, because it “destroys” its argument. Not an issue for my case, though).

P.S.: concerning the first answers – yes, something like len(list(x)) would work too, but that drastically increases the usage of memory.

P.P.S.: re-checked… Disregard the P.S., seems I made a mistake while trying that, it works fine. Sorry for the trouble.

Asked By: Maxim

||

Answers:

There isn’t one because you can’t do it in the general case – what if you have a lazy infinite generator? For example:

def fib():
    a, b = 0, 1
    while True:
        a, b = b, a + b
        yield a

This never terminates but will generate the Fibonacci numbers. You can get as many Fibonacci numbers as you want by calling next().

If you really need to know the number of items there are, then you can’t iterate through them linearly one time anyway, so just use a different data structure such as a regular list.

Answered By: Adam Rosenfield

You can use enumerate() to loop through the generated data stream, then return the last number — the number of items.

I tried to use itertools.count() with itertools.izip() but no luck. This is the best/shortest answer I’ve come up with:

#!/usr/bin/python

import itertools

def func():
    for i in 'yummy beer':
        yield i

def icount(ifunc):
    size = -1 # for the case of an empty iterator
    for size, _ in enumerate(ifunc()):
        pass
    return size + 1

print list(func())
print 'icount', icount(func)

# ['y', 'u', 'm', 'm', 'y', ' ', 'b', 'e', 'e', 'r']
# icount 10

Kamil Kisiel’s solution is way better:

def count_iterable(i):
    return sum(1 for e in i)
Answered By:

The easiest way is probably just sum(1 for _ in gen) where gen is your generator.

Answered By: Matt Dunham
def count(iter):
    return sum(1 for _ in iter)

Or better yet:

def count(iter):
    try:
        return len(iter)
    except TypeError:
        return sum(1 for _ in iter)

If it’s not iterable, it will throw a TypeError.

Or, if you want to count something specific in the generator:

def count(iter, key=None):
    if key:
        if callable(key):
            return sum(bool(key(x)) for x in iter)
        return sum(x == key for x in iter)
    try:
        return len(iter)
    except TypeError:
        return sum(1 for _ in iter)
Answered By: mpen

By definition, only a subset of generators will return after a certain number of arguments (have a pre-defined length), and even then, only a subset of these finite generators have a predictable end (accessing the generator can have side-effects which could stop the generator earlier).

If you wish to implement length methods for your generator, you have to first define what you consider the “length” (is it the total number of elements? the number of remaining elements?), then wrap your generator in a class. Here’s an example:

class MyFib(object):
    """
    A class iterator that iterates through values of the
    Fibonacci sequence, until, optionally, a maximum length is reached.
    """

    def __init__(self, length):
        self._length = length
        self._i = 0

     def __iter__(self):
        a, b = 0, 1
        while not self._length or self._i < self._length:
            a, b = b, a + b
            self._i += 1
            yield a

    def __len__(self):
        "This method returns the total number of elements"
        if self._length:
            return self._length
        else:
            raise NotImplementedError("Infinite sequence has no length")
            # or simply return None / 0 depending
            # on implementation

Here is how to use it:

In [151]: mf = MyFib(20)

In [152]: len(mf)
Out[152]: 20

In [153]: l = [n for n in mf]

In [154]: len(l)
Out[154]: 20

In [155]: l
Out[155]: 
[1,
 1,
 2,
...
6765]


In [156]: mf0 = MyFib(0)

In [157]: len(mf0)
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-157-2e89b32ad3e4> in <module>()
----> 1 len(mf0)

/tmp/ipython_edit_TWcV1I.py in __len__(self)
     22             return self._length
     23         else:
---> 24             raise NotImplementedError
     25             # or simply return None / 0 depending
     26             # on implementation

NotImplementedError: 

In [158]: g = iter(mf0)

In [159]: l0 = [g.next(), g.next(), g.next()]

In [160]: l0
Out[160]: [1, 1, 2]
Answered By: sleblanc

Use reduce(function, iterable[, initializer]) for a memory efficient purely functional solution:

>>> iter = "This string has 30 characters."
>>> reduce(lambda acc, e: acc + 1, iter, 0)
30
Answered By: OlivierBlanvillain

This is a hack, but if you really want to have len work on a general iterable (consuming it in the way), you can create your own version of len.

The len function is essentially equivalent to the following (though implementations usually provide some optimizations to avoid the extra lookup):

def len(iterable):
    return iterable.__len__()

Therefore we can define our new_len to try that, and if __len__ does not exist, count the number of elements ourselves by consuming the iterable:

def new_len(iterable):
    try:
      return iterable.__len__()
    except AttributeError:
      return sum(1 for _ in iterable)

The above works in Python 2/3, and (as far as I know) should cover every conceivable kind of iterable.

Answered By: yoniLavi

Try the more_itertools package for a simple solution. Example:

>>> import more_itertools

>>> it = iter("abcde")                                         # sample generator
>>> it
<str_iterator at 0x4ab3630>

>>> more_itertools.ilen(it)
5

See this post for another applied example.

Answered By: pylang

For those who would like to know the summary of that discussion. The final top scores for counting a 50 million-lengthed generator expression using:

  • len(list(gen)),
  • len([_ for _ in gen]),
  • sum(1 for _ in gen),
  • ilen(gen) (from more_itertools),
  • reduce(lambda c, i: c + 1, gen, 0),

sorted by performance of execution (including memory consumption), will make you surprised:

#1: test_list.py:8: 0.492 KiB
    gen = (i for i in data*1000); t0 = monotonic(); len(list(gen))
('list, sec', 1.9684218849870376)

#2: test_list_compr.py:8: 0.867 KiB
    gen = (i for i in data*1000); t0 = monotonic(); len([i for i in gen])
('list_compr, sec', 2.5885991149989422)

#3: test_sum.py:8: 0.859 KiB
    gen = (i for i in data*1000); t0 = monotonic(); sum(1 for i in gen); t1 = monotonic()
('sum, sec', 3.441088170016883)

#4: more_itertools/more.py:413: 1.266 KiB
    d = deque(enumerate(iterable, 1), maxlen=1)
   
    test_ilen.py:10: 0.875 KiB
    gen = (i for i in data*1000); t0 = monotonic(); ilen(gen)
('ilen, sec', 9.812256851990242)

#5: test_reduce.py:8: 0.859 KiB
    gen = (i for i in data*1000); t0 = monotonic(); reduce(lambda counter, i: counter + 1, gen, 0)
('reduce, sec', 13.436614598002052)

So, len(list(gen)) is the most frequent and less memory consumable

Answered By: Alex-Bogdanov
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.