Numba @guvectorise returns garbage values

Question:

The code below is a test for calculating distance between points in a periodic system.

import itertools
import time
import numpy as np
import numba
from numba import njit

@njit(cache=True)
def get_dr(i=np.array([]),j=np.array([]),cellsize=np.array([])):
    k=np.zeros(3,dtype=np.float64)

    for idx, _ in enumerate(cellsize):
        k[idx] = (j[idx]-i[idx])-cellsize[idx]*np.round((j[idx]-i[idx])/cellsize[idx])
    return np.linalg.norm(k)


@numba.guvectorize(["void(float64[:],float64[:],float64[:],float64)"],
             "(m),(m),(m)->()",nopython=True,cache=True)
def get_dr_vec(i,j,cellsize,dr):
    dr=0.0
    k=np.zeros(3,dtype=np.float64)
    for idx, _ in enumerate(cellsize):
        k[idx] = (j[idx]-i[idx])-cellsize[idx]*np.round((j[idx]-i[idx])/cellsize[idx])

    dr=np.sqrt(np.square(k[0])+np.square(k[1])+np.square(k[2]))

N, dim = 50, 3  # 50 particles in 3D
vec = np.random.random((N, dim))

cellsize=np.array([26.4,26.4,70.0])
rList=[];rList2=[]

start = time.perf_counter()
for (pI, pJ) in itertools.product(vec, vec):
    rList.append(get_dr(pI,pJ,cellsize))
end =time.perf_counter()
print("Time {:.3g}s".format(end-start))

newvec1=[];newvec2=[]
start = time.perf_counter()
for (pI, pJ) in itertools.product(vec, vec):
    newvec1.append(pI)
    newvec2.append(pJ)
cellsizeVec=np.full(shape=np.shape(newvec1),fill_value=cellsize,dtype=float)
rList2=get_dr_vec(np.array(newvec1),np.array(newvec2),cellsizeVec)
end =time.perf_counter()
print("Time {:.3g}s".format(end-start))
print(rList2)
exit()

Compared to get_dr() which shows the correct result, get_dr_vec() shows garbage and nan values. The function get_dr_vec() is calculating the correct value for dr, but it returns garbage values with correct dimensions. Can someone suggest any ideas on how to resolve this issue?

Asked By: mykd

||

Answers:

You made a small mistake in the guvectorize function call. Guvectorize does not want you to redefine the output variable, the output array/scalar must be filled in instead. The code below should work:

@numba.guvectorize(["void(float64[:],float64[:],float64[:],float64[:])"],
              "(m),(m),(m)->()",nopython=True, cache=True)
def get_dr_vec(i,j,cellsize,dr):
    k=np.zeros(3,dtype=np.float64)
    for idx, _ in enumerate(cellsize):
        k[idx] = (j[idx]-i[idx])-cellsize[idx]*np.round((j[idx]-i[idx])/cellsize[idx])
    
    # The mistake was on this line. You had "dr =", but it should be "dr[0] ="
    dr[0] = np.sqrt(np.square(k[0])+np.square(k[1])+np.square(k[2]))

The reason that dr = does not work is because guvectorize already allocates the dr array before you call the function. dr = messes things up because it places a new dr array in a new place in memory, so when numba looks at the original place in memory, where it expects to find an array, it instead finds nothing. dr[0] = does work, because that way, we can fill in the values in the original place in memory, where numba expect the values to be.

If it is still not 100% clear i recommend that you look through the numba documentation on this topic.

Those "garbage values" you were seeing was the output array that was never filled, similar to what you would see if you would call print(np.empty(10))

Answered By: Rafnus
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.