Fast way of counting non-zero bits in positive integer

Question:

I need a fast way to count the number of bits in an integer in python. My current solution is

bin(n).count("1")

but I am wondering if there is any faster way of doing this?

Asked By: zidarsk8

||

Answers:

It turns out your starting representation is a list of lists of ints which are either 1 or 0. Simply count them in that representation.


The number of bits in an integer is constant in python.

However, if you want to count the number of set bits, the fastest way is to create a list conforming to the following pseudocode: [numberofsetbits(n) for n in range(MAXINT)]

This will provide you a constant time lookup after you have generated the list. See @PaoloMoretti’s answer for a good implementation of this. Of course, you don’t have to keep this all in memory – you could use some sort of persistent key-value store, or even MySql. (Another option would be to implement your own simple disk-based storage).

Answered By: Marcin

Here’s a Python implementation of the population count algorithm, as explained in this post:

def numberOfSetBits(i):
    i = i - ((i >> 1) & 0x55555555)
    i = (i & 0x33333333) + ((i >> 2) & 0x33333333)
    return (((i + (i >> 4) & 0xF0F0F0F) * 0x1010101) & 0xffffffff) >> 24

It will work for 0 <= i < 0x100000000.

Answered By: Óscar López

According to this post, this seems to be one the fastest implementation of the Hamming weight (if you don’t mind using about 64KB of memory).

#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
POPCOUNT_TABLE16 = [0] * 2**16
for index in range(len(POPCOUNT_TABLE16)):
    POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]

def popcount32_table16(v):
    return (POPCOUNT_TABLE16[ v        & 0xffff] +
            POPCOUNT_TABLE16[(v >> 16) & 0xffff])

On Python 2.x you should replace range with xrange.

Edit

If you need better performance (and your numbers are big integers), have a look at the GMP library. It contains hand-written assembly implementations for many different architectures.

gmpy is A C-coded Python extension module that wraps the GMP library.

>>> import gmpy
>>> gmpy.popcount(2**1024-1)
1024
Answered By: Paolo Moretti

You can adapt the following algorithm:

def CountBits(n):
  n = (n & 0x5555555555555555) + ((n & 0xAAAAAAAAAAAAAAAA) >> 1)
  n = (n & 0x3333333333333333) + ((n & 0xCCCCCCCCCCCCCCCC) >> 2)
  n = (n & 0x0F0F0F0F0F0F0F0F) + ((n & 0xF0F0F0F0F0F0F0F0) >> 4)
  n = (n & 0x00FF00FF00FF00FF) + ((n & 0xFF00FF00FF00FF00) >> 8)
  n = (n & 0x0000FFFF0000FFFF) + ((n & 0xFFFF0000FFFF0000) >> 16)
  n = (n & 0x00000000FFFFFFFF) + ((n & 0xFFFFFFFF00000000) >> 32) # This last & isn't strictly necessary.
  return n

This works for 64-bit positive numbers, but it’s easily extendable and the number of operations growth with the logarithm of the argument (i.e. linearly with the bit-size of the argument).

In order to understand how this works imagine that you divide the entire 64-bit string into 64 1-bit buckets. Each bucket’s value is equal to the number of bits set in the bucket (0 if no bits are set and 1 if one bit is set). The first transformation results in an analogous state, but with 32 buckets each 2-bit long. This is achieved by appropriately shifting the buckets and adding their values (one addition takes care of all buckets since no carry can occur across buckets – n-bit number is always long enough to encode number n). Further transformations lead to states with exponentially decreasing number of buckets of exponentially growing size until we arrive at one 64-bit long bucket. This gives the number of bits set in the original argument.

Answered By: Adam Zalcman

For arbitrary-length integers, bin(n).count("1") is the fastest I could find in pure Python.

I tried adapting Óscar’s and Adam’s solutions to process the integer in 64-bit and 32-bit chunks, respectively. Both were at least ten times slower than bin(n).count("1") (the 32-bit version took about half again as much time).

On the other hand, gmpy popcount() took about 1/20th of the time of bin(n).count("1"). So if you can install gmpy, use that.

To answer a question in the comments, for bytes I’d use a lookup table. You can generate it at runtime:

counts = bytes(bin(x).count("1") for x in range(256))  # py2: use bytearray

Or just define it literally:

counts = (b'x00x01x01x02x01x02x02x03x01x02x02x03x02x03x03x04'
          b'x01x02x02x03x02x03x03x04x02x03x03x04x03x04x04x05'
          b'x01x02x02x03x02x03x03x04x02x03x03x04x03x04x04x05'
          b'x02x03x03x04x03x04x04x05x03x04x04x05x04x05x05x06'
          b'x01x02x02x03x02x03x03x04x02x03x03x04x03x04x04x05'
          b'x02x03x03x04x03x04x04x05x03x04x04x05x04x05x05x06'
          b'x02x03x03x04x03x04x04x05x03x04x04x05x04x05x05x06'
          b'x03x04x04x05x04x05x05x06x04x05x05x06x05x06x06x07'
          b'x01x02x02x03x02x03x03x04x02x03x03x04x03x04x04x05'
          b'x02x03x03x04x03x04x04x05x03x04x04x05x04x05x05x06'
          b'x02x03x03x04x03x04x04x05x03x04x04x05x04x05x05x06'
          b'x03x04x04x05x04x05x05x06x04x05x05x06x05x06x06x07'
          b'x02x03x03x04x03x04x04x05x03x04x04x05x04x05x05x06'
          b'x03x04x04x05x04x05x05x06x04x05x05x06x05x06x06x07'
          b'x03x04x04x05x04x05x05x06x04x05x05x06x05x06x06x07'
          b'x04x05x05x06x05x06x06x07x05x06x06x07x06x07x07x08')

Then it’s counts[x] to get the number of 1 bits in x where 0 ≤ x ≤ 255.

Answered By: kindall

You said Numpy was too slow. Were you using it to store individual bits? Why not extend the idea of using ints as bit arrays but use Numpy to store those?

Store n bits as an array of ceil(n/32.) 32-bit ints. You can then work with the numpy array the same (well, similar enough) way you use ints, including using them to index another array.

The algorithm is basically to compute, in parallel, the number of bits set in each cell, and them sum up the bitcount of each cell.

setup = """
import numpy as np
#Using Paolo Moretti's answer http://stackoverflow.com/a/9829855/2963903
POPCOUNT_TABLE16 = np.zeros(2**16, dtype=int) #has to be an array

for index in range(len(POPCOUNT_TABLE16)):
    POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]

def popcount32_table16(v):
    return (POPCOUNT_TABLE16[ v        & 0xffff] +
            POPCOUNT_TABLE16[(v >> 16) & 0xffff])

def count1s(v):
    return popcount32_table16(v).sum()

v1 = np.arange(1000)*1234567                       #numpy array
v2 = sum(int(x)<<(32*i) for i, x in enumerate(v1)) #single int
"""
from timeit import timeit

timeit("count1s(v1)", setup=setup)        #49.55184188873349
timeit("bin(v2).count('1')", setup=setup) #225.1857464598633

Though I’m surprised no one suggested you write a C module.

Answered By: leewz

You can use the algorithm to get the binary string [1] of an integer but instead of concatenating the string, counting the number of ones:

def count_ones(a):
    s = 0
    t = {'0':0, '1':1, '2':1, '3':2, '4':1, '5':2, '6':2, '7':3}
    for c in oct(a)[1:]:
        s += t[c]
    return s

[1] https://wiki.python.org/moin/BitManipulation

Answered By: Manuel

I really like this method. Its simple and pretty fast but also not limited in the bit length since python has infinite integers.

It’s actually more cunning than it looks, because it avoids wasting time scanning the zeros. For example it will take the same time to count the set bits in 1000000000000000000000010100000001 as in 1111.

def get_bit_count(value):
   n = 0
   while value:
      n += 1
      value &= value-1
   return n
Answered By: Robotbugs
#Python prg to count set bits
#Function to count set bits
def bin(n):
    count=0
    while(n>=1):
        if(n%2==0):
            n=n//2
        else:
            count+=1
            n=n//2
    print("Count of set bits:",count)
#Fetch the input from user
num=int(input("Enter number: "))
#Output
bin(num)

Answered By: Praveen Narala

It’s possible to combine a lookup table with int.to_bytes (Python 3 only):

popcount8bit = bytes([popcount(x) for x in range(1<<8)])  # use any method to initialize this lookup table
popcount = lambda x: sum(map(popcount8bit.__getitem__,
                             x.to_bytes((x.bit_length()+7)//8, "little")))

This solution unfortunately is about 20% slower than bin(x).count('1') on Python 3, but twice faster on PyPy3.


This is a benchmark script, compares several different solutions presented here, for different number of bits:

from __future__ import print_function  #for Python 2

import sys
from timeit import timeit
import random

def popcount(x): return bin(x).count("1")

version3=sys.version.startswith("3")

for numBit in (2, 4, 8, 16, 31, 32, 63, 64, 1000, 10000):
    maximum=int((1<<numBit)-1)  #int cast just in case it overflows to long in Python 2

    functions=[
            (popcount, "bin count"),
            (lambda x: "{:b}".format(x).count("1"), "format string count"),
            ]

    try:
        import gmpy
        functions.append((gmpy.popcount, "gmpy"))
    except ImportError:
        pass

    if sys.version.startswith("3"):
        exec('''functions.append((lambda x: f"{x:b}".count("1"), "f-string count"))''')

    if numBit<=16:
        table1=[popcount(x) for x in range(maximum+1)]
        functions.append((lambda x: table1[x], "lookup list"))
        functions.append((table1.__getitem__, "lookup list without lambda"))

        table2="".join(map(chr, table1))
        functions.append((lambda x: ord(table2[x]), "lookup str"))

        if version3:
            table3=bytes(table1)
            functions.append((lambda x: table3[x], "lookup bytes"))

            if numBit==8:
                functions.append((
                        b'x00x01x01x02x01x02x02x03x01x02x02x03x02x03x03x04'
                        b'x01x02x02x03x02x03x03x04x02x03x03x04x03x04x04x05'
                        b'x01x02x02x03x02x03x03x04x02x03x03x04x03x04x04x05'
                        b'x02x03x03x04x03x04x04x05x03x04x04x05x04x05x05x06'
                        b'x01x02x02x03x02x03x03x04x02x03x03x04x03x04x04x05'
                        b'x02x03x03x04x03x04x04x05x03x04x04x05x04x05x05x06'
                        b'x02x03x03x04x03x04x04x05x03x04x04x05x04x05x05x06'
                        b'x03x04x04x05x04x05x05x06x04x05x05x06x05x06x06x07'
                        b'x01x02x02x03x02x03x03x04x02x03x03x04x03x04x04x05'
                        b'x02x03x03x04x03x04x04x05x03x04x04x05x04x05x05x06'
                        b'x02x03x03x04x03x04x04x05x03x04x04x05x04x05x05x06'
                        b'x03x04x04x05x04x05x05x06x04x05x05x06x05x06x06x07'
                        b'x02x03x03x04x03x04x04x05x03x04x04x05x04x05x05x06'
                        b'x03x04x04x05x04x05x05x06x04x05x05x06x05x06x06x07'
                        b'x03x04x04x05x04x05x05x06x04x05x05x06x05x06x06x07'
                        b'x04x05x05x06x05x06x06x07x05x06x06x07x06x07x07x08'
                        .__getitem__, "lookup bytes hard coded 8 bit"))
                table_hardcoded=(
                        b'x00x01x01x02x01x02x02x03x01x02x02x03x02x03x03x04'
                        b'x01x02x02x03x02x03x03x04x02x03x03x04x03x04x04x05'
                        b'x01x02x02x03x02x03x03x04x02x03x03x04x03x04x04x05'
                        b'x02x03x03x04x03x04x04x05x03x04x04x05x04x05x05x06'
                        b'x01x02x02x03x02x03x03x04x02x03x03x04x03x04x04x05'
                        b'x02x03x03x04x03x04x04x05x03x04x04x05x04x05x05x06'
                        b'x02x03x03x04x03x04x04x05x03x04x04x05x04x05x05x06'
                        b'x03x04x04x05x04x05x05x06x04x05x05x06x05x06x06x07'
                        b'x01x02x02x03x02x03x03x04x02x03x03x04x03x04x04x05'
                        b'x02x03x03x04x03x04x04x05x03x04x04x05x04x05x05x06'
                        b'x02x03x03x04x03x04x04x05x03x04x04x05x04x05x05x06'
                        b'x03x04x04x05x04x05x05x06x04x05x05x06x05x06x06x07'
                        b'x02x03x03x04x03x04x04x05x03x04x04x05x04x05x05x06'
                        b'x03x04x04x05x04x05x05x06x04x05x05x06x05x06x06x07'
                        b'x03x04x04x05x04x05x05x06x04x05x05x06x05x06x06x07'
                        b'x04x05x05x06x05x06x06x07x05x06x06x07x06x07x07x08')
                functions.append((
                        table_hardcoded.__getitem__, "lookup bytes hard coded 8 bit local variable"))
            functions.append((table3.__getitem__, "lookup bytes without lambda"))

    if version3:
        popcount8bit=bytes([popcount(x) for x in range(1<<8)]) #bytes because benchmark says that it's fastest
        functions.append((
            lambda x: sum(popcount8bit[x] for x in x.to_bytes((x.bit_length()+7)//8, "big")),
            "to_bytes"
            ))
        functions.append((
            lambda x: sum(map(popcount8bit.__getitem__, x.to_bytes((x.bit_length()+7)//8, "big"))),
            "to_bytes without list comprehension"
            ))
        functions.append((
            lambda x: sum(map(popcount8bit.__getitem__, x.to_bytes((x.bit_length()+7)//8, "little"))),
            "to_bytes little endian, without list comprehension"
            ))

    #for x in (2, 4, 8, 16, 32, 64):
    #   table1=[popcount(x) for x in range(1<<8)]


    print("====== numBit=", numBit)
    data=[]
    numRepeat=10**7//(numBit+100)
    for popcountFunction, description in functions:
        random.seed(10) #make randint returns the same value
        data.append((
            timeit(lambda: popcountFunction(random.randint(0, maximum)), number=numRepeat),
            description
            ))

    time1, name1=data[0]
    assert name1=="bin count"
    data.sort()
    maxLength=0
    for time, description in data:
        maxLength=max(maxLength, len(description))
    for time, description in data:
        print("{:{}} -> {:2f} = {} * {:2f}".format(description, maxLength+2, time, name1, time/time1))

It works with both Python 2 and 3; however, if a solution is unavailable for Python 2, it’s not measured.

Some solutions are not listed here.

Result:

  • Python 2: "lookup list without lambda" is the fastest (25% faster than "bin count", 6% faster than "lookup list" (with lambda)) for <= 16 bits, larger than that "bin count" is the fastest. (I didn’t install gmpy for Python 2)
  • Python 3: Roughly the same result.
    • "Lookup bytes without lambda" is comparable (+/-2% compared to "lookup list without lambda").
    • gmpy is faster than "bin count` in all cases, but slower than "lookup list without lambda" for about 5% with numBit <= 16.
    • "to_bytes" is comparable.
    • Using f-string is about 10% slower than "bin count".
  • PyPy3: The lambda no longer incurs much cost, and the to_bytes version becomes much faster (twice faster than "bin count"); however, I could not get gmpy to install.
Answered By: user202729

Python 3.10 introduces int.bit_count():

>>> n = 19
>>> bin(n)
'0b10011'
>>> n.bit_count()
3
>>> (-n).bit_count()
3

This is functionally equivalent to bin(n).count("1") but should be ~6 times faster. See also Issue29882.

Answered By: Chris_Rands

@Robotbugs’ answer, but wrapped in numba’s njit decorator was faster than the gmpy in my case.

@njit(int64(uint64))
def get_bit_count(bitboard):
    n = 0
    bitboard = int64(bitboard)
    while bitboard:
        n += 1
        bitboard &= bitboard - 1
    return n

I had to set uint64 as argument type to avoid OverlowError.

Answered By: ArekKulczycki
class popcount_lk:
    """ Creates an instance for calculating the population count of
        bitstring, based on a lookup table of 8 bits. """

    def __init__(self):
        """ Creates a large lookup table of the Hamming weight of every 8 bit integer. """
        self.lookup_table = bytes.maketrans(bytes(range(1<<8)),bytes((bin(i).count('1') for i in range(1<<8))))
        self.byteorder = sys.byteorder
    
    def __call__(self,x):
        """ Breaks x, which is a python integer type, into chuncks of 8 bits.
        Calls the lookup table to get the population count of each chunck and returns
        the aggregated population count. """

        return sum(x.to_bytes((x.bit_length()>>3)+1,self.byteorder).translate(self.lookup_table))

popcount = popcount_lk
print(popcount(56437865483765))

This should be 3 times faster than bin(56437865483765).count('1') in CPython and PyPy3.

Answered By: Ariad
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.