Can this binary integer be parsed faster?

Question:

I have a sparse array of length 64, encoded with a 64-bit integer binaryPattern.

For example:

binaryPattern = 0b0000000000000000000000000000000000000000000000000000000000001010

represents this array:

[0, value1, 0, value2, 0, 0, 0,,,,]

I need to recover the indexes of valueX, from binaryPattern as fast as possible.

I wrote various function attempts named getIndexesXXX(), but those functions, apart from being slow themselves, have performances that change depending on conditions like the percentage of zeros, or if the ones are at the start or at the end of the binaryPattern

I’m using Python 3.10, so I have access to the functions like (int).bit_length().

Note that performance changes between debugging and running without debugging. What matters is when non debugging.

This is the code I have:

binaryPattern written in binary, is 1 when the array is non-zero, and is 0 when the array is zero.

getIndexes() is the naivest attempt: it checks the bits one by one. It is faster when a few bits are clumped together at the start.

getIndexes2() and getIndexes2_5() check which is the first nonzero bit, and avoid checking zeros unnecessarily.

getIndexes3() uses numpy to calculate all the bit shifts. Is brute force, but is faster on some cases, like when there is a high percentage of ones.

getIndexes4() is similar to getIndexes3(), but uses list comprehension to find the non zero bits.

from numpy.random import default_rng
import time
import numpy as np


class sparseArray():
    _indexes = np.arange(64, dtype=np.uint64)
    _weights = 2**_indexes
    _dtypesInt = [np.uint8, np.uint16, np.uint32, np.uint64]

    def __init__(self, values: np.array, indexes: np.array) -> None:
        self.array = values
        self.binaryPattern = int(np.sum(self._weights[indexes]))
        # print(bin(self.binaryPattern)[2:])

    def getIndexes(self):
        answer = np.zeros(len(self.array), dtype=np.uint8)
        binaryMask = self.binaryPattern
        n = 0
        index = 0
        while binaryMask > 0:
            if binaryMask & 1:
                answer[n] = index
                n += 1

            binaryMask = binaryMask >> 1
            index += 1
        return answer

    def getIndexes2(self):
        answer = np.zeros(len(self.array), dtype=np.uint8)
        binaryMask = int(self.binaryPattern)
        n = 0
        index = -1
        while binaryMask > 0:
            # isolates first nonzero bit
            first1 = binaryMask & ~(binaryMask - 1)
            index += first1.bit_length()
            answer[n] = index
            n += 1
            binaryMask //= 2*first1
        return answer

    not1 = ~np.array([1], dtype=np.uint64)[0]

    def getIndexes2_5(self):
        answer = np.zeros(len(self.array), dtype=np.uint8)
        binaryMask = int(self.binaryPattern)
        indexes = self._indexes
        n = 0
        index = -1
        while binaryMask > 0:
            # isolates first zero bit
            first0 = ~(binaryMask | (~binaryMask - 1))
            # isolates first nonzero bit
            first1 = binaryMask & ~(binaryMask - 1)
            # if the sequence of 0 is larger than the sequence of 1...
            if first1*4 > first0:
                index += first1.bit_length()
                answer[n] = index
                n += 1
                binaryMask //= 2*first1
            else:
                length1 = first0.bit_length()-1
                index += 1
                answer[n:n+length1] = indexes[index:index+length1]
                index += length1-1
                n += length1
                binaryMask //= first0
        return answer

    def getIndexes3(self):
        bitshift = ((self.binaryPattern//self._weights) & 1) == 1
        return self._indexes[bitshift]

    def getIndexes3_5(self):
        lengthBP = self.binaryPattern.bit_length()
        # creates a boolean array=True on bits=1
        bitshift = ((self.binaryPattern//self._weights[:lengthBP]) & 1) == 1
        return self._indexes[:lengthBP][bitshift]

    def getIndexes4(self):
        i = [x == "1" for x in bin(x.binaryPattern)[-1:1:-1]]
        return self._indexes[:len(i)][i]


rng = default_rng()
length = 32 #how many ones will have the binaryPattern
indexes = np.sort(rng.choice(64, size=length, replace=False))

#getIndexes is faster for this type of data, which may be frequent:
#x = sparseArray(np.array([1, 4, 8, 10, 10]), indexes=np.array([0, 2, 4, 5, 7]))

x = sparseArray(np.arange(length), indexes=indexes)
print(bin(x.binaryPattern)[2:])
print(x.getIndexes())
print(x.getIndexes2())
print(x.getIndexes2_5())
print(x.getIndexes3())
print(x.getIndexes3_5())
print(x.getIndexes4())

start = time.time()
for _ in range(100000):
    x.getIndexes()
print(f"time for getIndexes: {time.time()-start:.2f}")

start = time.time()
for _ in range(100000):
    x.getIndexes2()
print(f"time for getIndexes2: {time.time()-start:.2f}")

start = time.time()
for _ in range(100000):
    x.getIndexes2_5()
print(f"time for getIndexes2_5: {time.time()-start:.2f}")

start = time.time()
for _ in range(100000):
    x.getIndexes3()
print(f"time for getIndexes3: {time.time()-start:.2f}")

start = time.time()
for _ in range(100000):
    x.getIndexes3_5()
print(f"time for getIndexes3_5: {time.time()-start:.2f}")

start = time.time()
for _ in range(100000):
    x.getIndexes4()
print(f"time for getIndexes4: {time.time()-start:.2f}")
Asked By: Colim

||

Answers:

Not sure if it has to use numpy or how fast it ends up to be but you could try to split the pattern into eight 8-bits blocks, use table look-up to decode (in parallel?) each block and concatenate all the results. Perhaps like:

t0 = [[j+0*8 for j in range(8) if i >> j & 1 == 1] for i in range(256)]
t1 = [[j+1*8 for j in range(8) if i >> j & 1 == 1] for i in range(256)]
t2 = [[j+2*8 for j in range(8) if i >> j & 1 == 1] for i in range(256)]
t3 = [[j+3*8 for j in range(8) if i >> j & 1 == 1] for i in range(256)]
t4 = [[j+4*8 for j in range(8) if i >> j & 1 == 1] for i in range(256)]
t5 = [[j+5*8 for j in range(8) if i >> j & 1 == 1] for i in range(256)]
t6 = [[j+6*8 for j in range(8) if i >> j & 1 == 1] for i in range(256)]
t7 = [[j+7*8 for j in range(8) if i >> j & 1 == 1] for i in range(256)]

def indices(pattern):
    p0 = pattern >> 0*8 & 255
    p1 = pattern >> 1*8 & 255
    p2 = pattern >> 2*8 & 255
    p3 = pattern >> 3*8 & 255
    p4 = pattern >> 4*8 & 255
    p5 = pattern >> 5*8 & 255
    p6 = pattern >> 6*8 & 255
    p7 = pattern >> 7*8 & 255
    return t0[p0] + t1[p1] + t2[p2] + t3[p3] + t4[p4] + t5[p5] + t6[p6] + t7[p7]

binaryPattern = 0b0000000000000000000000000000000000000000000000000000000000001010

print(indices(binaryPattern))
Answered By: Ecir Hana

This variation of getIndexes3 seems to be almost twice as fast:

    def getIndexesK(self):
        bitshift = self.binaryPattern & self._weights == self._weights
        return self._indexes[bitshift]
Answered By: Kelly Bundy

This is a benchmark of all solutions, including Kelly Bundy, and Ecir Hana.

Kelly Bundi gets second best for more than 15 ones in the pattern, but Ecir Hana beats everything by a ridiculous margin, on all quantities of ones.

Thanks to both, for great code.

benchmark

This is the code used for benchmarking

from numpy.random import default_rng
import time
import numpy as np
import matplotlib.pyplot as plt

# arrays for sparsing the binaryPattern
range8 = range(8)
range256 = range(256)
t0 = [[j+0*8 for j in  range8 if i >> j & 1 == 1] for i in  range256]
t1 = [[j+1*8 for j in  range8 if i >> j & 1 == 1] for i in  range256]
t2 = [[j+2*8 for j in  range8 if i >> j & 1 == 1] for i in  range256]
t3 = [[j+3*8 for j in  range8 if i >> j & 1 == 1] for i in  range256]
t4 = [[j+4*8 for j in  range8 if i >> j & 1 == 1] for i in  range256]
t5 = [[j+5*8 for j in  range8 if i >> j & 1 == 1] for i in  range256]
t6 = [[j+6*8 for j in  range8 if i >> j & 1 == 1] for i in  range256]
t7 = [[j+7*8 for j in  range8 if i >> j & 1 == 1] for i in  range256]


class sparseArray():
    _indexes = np.arange(64, dtype=np.uint64)
    _weights = 2**_indexes
    _dtypesInt = [np.uint8, np.uint16, np.uint32, np.uint64]

    def __init__(self, values: np.array, indexes: np.array) -> None:
        self.array = values
        self.binaryPattern = int(np.sum(self._weights[indexes]))
        # print(bin(self.binaryPattern)[2:])

    def getIndexes(self):
        answer = np.zeros(len(self.array), dtype=np.uint8)
        binaryMask = self.binaryPattern
        n = 0
        index = 0
        while binaryMask > 0:
            if binaryMask & 1:
                answer[n] = index
                n += 1

            binaryMask = binaryMask >> 1
            index += 1
        return answer

    def getIndexes2(self):
        answer = np.zeros(len(self.array), dtype=np.uint8)
        binaryMask = int(self.binaryPattern)
        n = 0
        index = -1
        while binaryMask > 0:
            # isolates first nonzero bit
            first1 = binaryMask & ~(binaryMask - 1)
            index += first1.bit_length()
            answer[n] = index
            n += 1
            binaryMask //= 2*first1
        return answer

    def getIndexes2_5(self):
        answer = np.zeros(len(self.array), dtype=np.uint8)
        binaryMask = int(self.binaryPattern)
        indexes = self._indexes
        n = 0
        index = -1
        while binaryMask > 0:
            # isolates first zero bit
            first0 = ~(binaryMask | (~binaryMask - 1))
            # isolates first nonzero bit
            first1 = binaryMask & ~(binaryMask - 1)
            # if the sequence of 0 is larger than the sequence of 1...
            if first1*4 > first0:
                index += first1.bit_length()
                answer[n] = index
                n += 1
                binaryMask //= 2*first1
            else:
                length1 = first0.bit_length()-1
                index += 1
                answer[n:n+length1] = indexes[index:index+length1]
                index += length1-1
                n += length1
                binaryMask //= first0
        return answer

    def getIndexes3(self):
        bitshift = ((self.binaryPattern//self._weights) & 1) == 1
        return self._indexes[bitshift]

    def getIndexes3_5(self):
        lengthBP = self.binaryPattern.bit_length()
        # creates a boolean array=True on bits=1
        bitshift = ((self.binaryPattern//self._weights[:lengthBP]) & 1) == 1
        return self._indexes[:lengthBP][bitshift]

    def getIndexes4(self):
        i = [x == "1" for x in bin(x.binaryPattern)[-1:1:-1]]
        return self._indexes[:len(i)][i]

    def getIndexesKBundy(self):
        bitshift = self.binaryPattern & self._weights == self._weights
        return self._indexes[bitshift]

    def getIndexesEHana(self):
        pattern = self.binaryPattern
        p0 = pattern >> 0*8 & 255
        p1 = pattern >> 1*8 & 255
        p2 = pattern >> 2*8 & 255
        p3 = pattern >> 3*8 & 255
        p4 = pattern >> 4*8 & 255
        p5 = pattern >> 5*8 & 255
        p6 = pattern >> 6*8 & 255
        p7 = pattern >> 7*8 & 255
        return np.frombuffer(bytes((t0[p0] + t1[p1] + t2[p2] + t3[p3] + t4[p4] + t5[p5] + t6[p6] + t7[p7])), np.int8)

rng = default_rng()
oneBits = list(range(1, 64))

# dummy sparseArray to extract functions names
x = sparseArray(np.zeros(1), np.zeros(1).astype(np.uint8))
functions = [x.getIndexes, x.getIndexes2, x.getIndexes2_5,
             x.getIndexes3, x.getIndexes3_5, x.getIndexes4,
             x.getIndexesKBundy, x.getIndexesEHana]
duration = {func.__name__: np.zeros(64) for func in functions}

for length in oneBits:
    indexes = np.sort(rng.choice(64, size=length, replace=False))

    #x = sparseArray(np.array([1, 4, 8, 10, 10]), indexes=np.array([0, 2, 4, 5, 7]))
    x = sparseArray(np.arange(length), indexes=indexes)

    functions = [x.getIndexes, x.getIndexes2, x.getIndexes2_5,
                 x.getIndexes3, x.getIndexes3_5, x.getIndexes4,
                 x.getIndexesKBundy, x.getIndexesEHana]

    print(bin(x.binaryPattern)[2:])
    for func in functions:
        print(func())

    print(f"{length=}")
    for func in functions:
        start = time.time()
        for _ in range(100000):
            func()
        dT = time.time()-start
        duration[func.__name__][length] = dT
        print(f"time for {func.__name__}: {dT:.2f}")

for fun, timme in duration.items():
    plt.plot(x._indexes[timme != 0], timme[timme != 0],
             label=fun, linewidth=2, marker="o")
    plt.legend()

plt.xlabel("number of 1 in pattern")
plt.ylabel("time")
plt.show()

I added Kely BUndi’s version of Ecir hana solution.

This is the result with debugging:

enter image description here

this is without debugging

enter image description here

Answered By: Colim

Faster one based on Ecir’s, using strings and an f-string instead of lists and adding them.

s0, s1, s2, s3, s4, s5, s6, s7 = [
    [bytes([j+k*8 for j in  range8 if i >> j & 1 == 1]).decode() for i in  range256]
    for k in range(8)
]

...

    def getIndexesEHanaKBundy(self):
        pattern = self.binaryPattern
        p0 = pattern >> 0*8 & 255
        p1 = pattern >> 1*8 & 255
        p2 = pattern >> 2*8 & 255
        p3 = pattern >> 3*8 & 255
        p4 = pattern >> 4*8 & 255
        p5 = pattern >> 5*8 & 255
        p6 = pattern >> 6*8 & 255
        p7 = pattern >> 7*8 & 255
        return np.frombuffer(f'{s0[p0]}{s1[p1]}{s2[p2]}{s3[p3]}{s4[p4]}{s5[p5]}{s6[p6]}{s7[p7]}'.encode(), np.int8)

Using Colim’s answer’s benchmark:

benchmark plot

Answered By: Kelly Bundy

You’ve already accepted the other answer, so I’ll just leave it as is but the problem with it is that t0[p0] + ... allocates new lists even if they are not necessary. You could also try this, hopefully it ends up a bit faster:

t0 = [[j+0*16 for j in range(16) if i >> j & 1 == 1] for i in range(65536)]
t1 = [[j+1*16 for j in range(16) if i >> j & 1 == 1] for i in range(65536)]
t2 = [[j+2*16 for j in range(16) if i >> j & 1 == 1] for i in range(65536)]
t3 = [[j+3*16 for j in range(16) if i >> j & 1 == 1] for i in range(65536)]

def indices(pattern):
    p0 = pattern >> 0*16 & 65535
    p1 = pattern >> 1*16 & 65535
    p2 = pattern >> 2*16 & 65535
    p3 = pattern >> 3*16 & 65535
    result = []
    result.extend(t0[p0])
    result.extend(t1[p1])
    result.extend(t2[p2])
    result.extend(t3[p3])
    return result

binaryPattern = 0b0000000000000000000000000000000000000000000000000000000000001010

print(indices(binaryPattern))

Edit: Now I see Kelly’s comment, "(t0[p0] + t1[p1] +…) with [*t0[p0], *t1[p1], …]" might be worthwhile as well. Thanks Kelly!

Answered By: Ecir Hana
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.