Iterate in chunks

A common idiom is to consume an iterable in chunks. There are a whole lot of ways to do it, but most of them are either a bit clumsy, or they return the chunks in a non-lazy fashion, which breaks the generator idiom. There is even a recipe in the standard documentation of the itertools module which is still the best I could find:

from itertools import izip_longest

def chunks(iterable, size, fillvalue=None):
    return izip_longest(fillvalue=fillvalue, *([iter(iterable)] * size))

This one very clever and as concise at it gets. It has some drawbacks, though:

  1. It breaks with the “be explicit rather than implicit” rule by composing the already quite advanced transposition trick with a round-robin scheme on one and the same iterator.
  2. The chunks are not “lazy” generators but tuples.
  3. It requires fill values which to my use cases do more harm than good.

Thus, I’d like to add my own two cents here. In order to be lazy and avoid fill values, you need to peek whether the iterator is already exhausted:

from itertools import chain, islice

def chunks(iterable, size):
    iterable = iter(iterable)
    while True:
        yield chain([next(iterable)], islice(iterable, size - 1))

This is chunks() in action:

>>> for chunk in chunks(range(9), size=5):
...     print list(chunk)
[0, 1, 2, 3, 4]
[5, 6, 7, 8]


comments powered by Disqus