Numpy matrix multiplication but instead of multiplying it XOR's elements

Question:

as stated in the title I want to make a calculation where instead of multiplying corresponding elements I binary XOR them and then add them.
Example for illustration:

Example for illustration:

EDIT: The big picture above IS the calculation but here we go: Take first row from the left [1 0 1] and first column from top matrix [1 0 0]. 1 XOR 1 = 0, 0 XOR 0 = 0, 1 XOR 0 = 1. Add them all 0 + 0 + 1 = 1. First row from the left [1 0 1], second column [0 0 0]: 1 XOR 0 = 1, 0 XOR 0 = 0, 1 XOR 0 = 1. Add them all 1 + 0 + 1 = 2. And so on

Is it possible to do that in numpy?

Asked By: Kyatt

||

Answers:

You can just make a combination of two loops and Numpy 1D xor-sum, like below:

Try it online!

import numpy as np
m1 = np.array([[1, 0, 0], [0, 0, 0], [0, 0, 0]])
m2 = np.array([[1, 0, 1], [0, 0, 1], [1, 1, 1]])
mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
for i in range(mr.shape[0]):
    for j in range(mr.shape[1]):
        mr[i, j] = np.sum(m1[:, j] ^ m2[i, :])
print(mr)

Output:

[[1 2 2]
 [2 1 1]
 [2 3 3]]

As @MadPhysicist suggested you can use Numba JIT-optimizer (pip install numba) to boost code above and you’ll get very fast code for you operations with small memory consumption:

Try it online!

import numpy as np, numba
@numba.njit(cache = True)
def matxor(m1, m2):
    mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
    for i in range(mr.shape[0]):
        for j in range(mr.shape[1]):
            mr[i, j] = np.sum(m1[:, j] ^ m2[i, :])
    return mr
m1 = np.array([[1, 0, 0], [0, 0, 0], [0, 0, 0]])
m2 = np.array([[1, 0, 1], [0, 0, 1], [1, 1, 1]])
print(matxor(m1, m2))

Also Numba code above can be boosted up to 44x times more thanks to following great improvements suggested and coded by @max9111:

import numpy as np, numba

m1 = np.random.randint(low=0, high=1,size=1_000_000).reshape(1_000,1_000)
m2 = np.random.randint(low=0, high=1,size=1_000_000).reshape(1_000,1_000)

#@Arty
@numba.njit(cache = True)
def matxor_1(m1, m2):
    mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
    for i in range(mr.shape[0]):
        for j in range(mr.shape[1]):
            mr[i, j] = np.sum(m1[:, j] ^ m2[i, :])
    return mr

%timeit matxor_1(m1, m2)
#1.06 s ± 9.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

#Aligned memory acces (real transpose the ascontiguousarray is important)
@numba.njit(cache = True)
def matxor_2(m1, m2):
    mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
    m1_T=np.ascontiguousarray(m1.T)

    for i in range(mr.shape[0]):
        for j in range(mr.shape[1]):
            mr[i, j] = np.sum(m1_T[j, :] ^ m2[i, :])
    return mr

%timeit matxor_2(m1, m2)
#312 ms ± 7.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

#Writing out the inner loop
@numba.njit(fastmath=True,cache = True)
def matxor_3(m1, m2):
    mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
    m1_T=np.ascontiguousarray(m1.T)

    for i in range(mr.shape[0]):
        for j in range(mr.shape[1]):
            acc=0
            for k in range(m2.shape[1]):
                acc+=m1_T[j, k] ^ m2[i, k]
            mr[i, j] = acc
    return mr

%timeit matxor_3(m1, m2)
#125 ms ± 3.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

#Parallelization
@numba.njit(fastmath=True,cache = True,parallel=True)
def matxor_4(m1, m2):
    mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
    m1_T=np.ascontiguousarray(m1.T)

    for i in numba.prange(mr.shape[0]):
        for j in range(mr.shape[1]):
            acc=0
            for k in range(m2.shape[1]):
                acc+=m1_T[j, k] ^ m2[i, k]
            mr[i, j] = acc
    return mr

%timeit matxor_4(m1, m2)
#23.8 ms ± 711 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
print(np.allclose(matxor_1(m1, m2),matxor_2(m1, m2)))
#True
print(np.allclose(matxor_1(m1, m2),matxor_3(m1, m2)))
#True
print(np.allclose(matxor_1(m1, m2),matxor_4(m1, m2)))
#True
Answered By: Arty

Try this:

M1 = np.array([[1, 0, 0], [0, 0, 0], [0, 0, 0]])
M2 = np.array([[1, 0, 1], [0, 0, 1], [1, 1, 1]])
(M1 ^ M2[:,None]).sum(-1)

Output:

array([[1, 2, 2],
       [2, 1, 1],
       [2, 3, 3]])

EDIT

If you want to preallocate memory:

intermediary = np.empty((3,3,3), dtype=np.int32)
np.bitwise_xor(M1, M2[:,None], out=intermediary).sum(-1)
Answered By: Kevin

This is just a longer comment on Artys answer. There are a few things to speed up the Numba function.

Steps to improve performance

import numpy as np, numba

m1 = np.random.randint(low=0, high=1,size=1_000_000).reshape(1_000,1_000)
m2 = np.random.randint(low=0, high=1,size=1_000_000).reshape(1_000,1_000)

#@Arty
@numba.njit(cache = True)
def matxor_1(m1, m2):
    mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
    for i in range(mr.shape[0]):
        for j in range(mr.shape[1]):
            mr[i, j] = np.sum(m1[:, j] ^ m2[i, :])
    return mr

%timeit matxor_1(m1, m2)
#1.06 s ± 9.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

#Aligned memory acces (real transpose the ascontiguousarray is important)
@numba.njit(cache = True)
def matxor_2(m1, m2):
    mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
    m1_T=np.ascontiguousarray(m1.T)

    for i in range(mr.shape[0]):
        for j in range(mr.shape[1]):
            mr[i, j] = np.sum(m1_T[j, :] ^ m2[i, :])
    return mr

%timeit matxor_2(m1, m2)
#312 ms ± 7.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

#Writing out the inner loop
@numba.njit(fastmath=True,cache = True)
def matxor_3(m1, m2):
    mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
    m1_T=np.ascontiguousarray(m1.T)

    for i in range(mr.shape[0]):
        for j in range(mr.shape[1]):
            acc=0
            for k in range(m2.shape[1]):
                acc+=m1_T[j, k] ^ m2[i, k]
            mr[i, j] = acc
    return mr

%timeit matxor_3(m1, m2)
#125 ms ± 3.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

#Parallelization
@numba.njit(fastmath=True,cache = True,parallel=True)
def matxor_4(m1, m2):
    mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
    m1_T=np.ascontiguousarray(m1.T)

    for i in numba.prange(mr.shape[0]):
        for j in range(mr.shape[1]):
            acc=0
            for k in range(m2.shape[1]):
                acc+=m1_T[j, k] ^ m2[i, k]
            mr[i, j] = acc
    return mr

%timeit matxor_4(m1, m2)
#23.8 ms ± 711 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
print(np.allclose(matxor_1(m1, m2),matxor_2(m1, m2)))
#True
print(np.allclose(matxor_1(m1, m2),matxor_3(m1, m2)))
#True
print(np.allclose(matxor_1(m1, m2),matxor_4(m1, m2)))
#True
Answered By: max9111

I was trying to find something that is simpler for me to understand and runs faster. So, I came up with this.

To try and avoid loops, we can convert this problem into a matrix multiplication problem. Matrix multiplications are highly optimized in libraries like numpy(I guess).

XOR operation followed by summing calculates number of dissimilar bits between two bit vectors. We could count this value indirectly by using bipolar vectors, i.e. converting the 1,0 bit vectors to 1,-1 vectors.

Steps:

  1. The matrix can be seen as a stack of bit vectors of length m. You can assume these bit vectors to be the rows of M1 and the columns of M2.

  2. Obtain the product matrix M1 x M2.

  3. Recover the number of dissimilar bits from the values in the product matrix using the following formula.

    Each bit vector has m bits. m = length of bit vector
    When comparing two bit vectors,
    let k = number of similar bits
    => m-k = number of dissimilar bits
    let r = intermediate result = result obtained by taking dot product of two 1,-1 vectors
    clearly,
    r = 1*k + (-1) (m-k)
    r = k -m + k
    r = 2k -m  --(1)
    OR
    k = (m+r)/2 --(2) = sum(XNOR)
    
    Given bit vectors, we know m. We can compute r as descibed above.
    We can compute k using (2)
    
    But we need the number of dissimilar bits (m-k)
    m-k = m - ((m+r)/2) = (2m-m-r)/2 = (m-r)/2
    number of dissimilar bits = (m-k) = (m-r)/2 = sum(XOR)
    

Doing the same at the matrix level, we get:

Actual code:

import numpy as np

def matXOR_matXNOR(M1,M2):

    N1 = 2*M1 - 1; N2 = 2*M2 - 1 #convert to bipolar vectors
    m1, n = N1.shape #N2 is n x m2 matrix
    intr = np.matmul(N1,N2) #m1 x m2 matrix -> intermediate result

    xorMat = (m-intr)//2
    xnorMat = (m+intr)//2

    return xorMat, xnorMat
Answered By: Manu Ramesh
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.