Apply slicing, conditionals to Sparse Arrays with Pallalization in Python

Question:

Apply slicing, conditionals to Sparse Arrays with Pallalization

I want to do something like dynamic programming on sparse array.
could you check the following example function,which I would like to implement for Sparse Array
(the first example is for numpy.array)

First,importing modules
from numba import jit
import numpy as np
from scipy import sparse as sp
from numba import prange

then the first example

@jit(parallel=True, nopython=True)
def mytest_csc(inptmat):
    something = np.zeros(inptmat.shape[1])
    for i in prange(inptmat.shape[1]):
        target=0
        partmat = inptmat[:, i]
        for j in range(len(partmat)):
            counter=0
            if partmat[j] > 0:
                new_val = partmat[j] / (partmat[j] + something[j])
                target = (something[j] + new_val) / (counter + 1)
                counter+=1
        something[i] = target
    return something

In the above function,

  1. slicing/indexing sparse array
  2. add and mulitiplication
  3. nested for-loop
  4. with Parallelization by Numba’s prange

were done.

here is my question,how can I implement this for Sparse Array like scipy.sparse.csc_matrix?

the following is what I have tried.
This function can accept np.array or scipy.sparse.csc_matrix as the input,but it cannot be parallelized…


def mytest_csc2(inptmat):
    something = np.zeros(inptmat.shape[1])
    for i in prange(inptmat.shape[1]):
        target=0
        partmat = inptmat[:, i]
        for j in range(len(partmat)):
            counter=0
            if partmat[j] > 0:
                new_val = partmat[j] / (partmat[j] + something[j])
                target = (something[j] + new_val) / (counter + 1)
                counter+=1
        something[i] = target
    return something

The parallalization is must.
here is the speeds of the above functions.
in the example I made 100100 matrix,
but in fact I need to process the significant big matrix like 100000
100000.
so I can’t avoid Sparse Array…

inptmat=np.zeros((100,100)) #test input matrix,normal numpy array
%%timeit
mytest_csc(inptmat)

16.1 µs ± 125 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

inptmat=sp.csc_matrix(inptmat) #test input matrix,scipy.sparse.csc_matrix
%%timeit
mytest_csc2(inptmat)

1.39 s ± 70.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I need to optimize the test function2 so that it can work fast as possible as the example with Numba.

Asked By: Xminer

||

Answers:

This problem was solved with two vectorized operations of scipy.sparse and partialization of data without self-implementation

Answered By: Xminer