Numeric calculation of nullspace with Sympy is slow

Question:

I want to obtain the bases for kernel (nullspace) of a matrix using the Row Reduced Echelon Form (RREF).

Although scipy has the function to calculate nullspace, it does not give me sparse vectors because it uses singular value decomposition (SVD).

#make an example
import random
import numpy as np
random.seed(1)
m=100
n=120
matrix = [1 if random.uniform(0, 1)>0.9 else 0 for i in range (m*n)]
matrix =np.array (matrix).reshape((m, n))

#calculate nullspace using scipy
from scipy import linalg
ns2=linalg.null_space(matrix)
print (ns2)#This is not sparse, which is not what I want to obtain.

Instead, I used the function in sympy.

import sympy
ns2=sympy.Matrix(matrix).nullspace()
if ns2:
    ns2=np.concatenate([np.array(vec, dtype=float) for vec in ns2], axis=1)
print (ns2)

However, this takes a lot of time, probably because Sympy are struggling to find the algebraic solution.

So, I tried to use lambdify, which converts SymPy expressions into NumPy functions. This canbe wrapped with numba to make them faster, as suggested here. https://github.com/numba/numba/issues/4795

I tried to convert SymPy expressions into NumPy functions

f = sympy.lambdify(x, sympy.Matrix(np.array(x)).nullspace() , "numpy")

but this did not work.

---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last) in ----> 1 f = sympy.lambdify(x, sympy.Matrix(np.array(x)).nullspace() , "numpy")

/opt/anaconda3/lib/python3.8/site-packages/sympy/matrices/dense.py in new(cls, *args, **kwargs) 297 298 def new(cls, *args, **kwargs): --> 299 return cls._new(*args, **kwargs) 300 301 @classmethod

/opt/anaconda3/lib/python3.8/site-packages/sympy/matrices/dense.py in _new(cls, copy, *args, **kwargs) 308 rows, cols, flat_list = args 309 else: --> 310 rows, cols, flat_list = cls._handle_creation_inputs(*args, **kwargs) 311 flat_list = list(flat_list) # create a shallow copy 312 self = object.new(cls)

/opt/anaconda3/lib/python3.8/site-packages/sympy/matrices/matrices.py in _handle_creation_inputs(cls, *args, **kwargs) 982 # Matrix(numpy.ones((2, 2))) 983 elif hasattr(args[0], "array"): --> 984 return cls._handle_ndarray(args[0]) 985 986 # Matrix([1, 2, 3]) or Matrix([[1, 2], [3, 4]])

/opt/anaconda3/lib/python3.8/site-packages/sympy/matrices/matrices.py in _handle_ndarray(cls, arg) 906 return arr.shape[0], 1, flat_list 907 else: --> 908 raise NotImplementedError( 909 "SymPy supports just 1D and 2D matrices") 910

NotImplementedError: SymPy supports just 1D and 2D matrices

Is there any ways to speed up the calculation?

Asked By: Apppii092

||

Answers:

SymPy has a faster representation of matrices (DomainMatrix) that can do this much more quickly. In future it will be used automatically for an operation like this but for now you would need to use it explicitly. It gives the same output that you would get from the Matrix.nullspace method except a lot faster.

Here is a demonstration of using DomainMatrix with a smaller matrix:

In [72]: import sympy

In [73]: M = sympy.randMatrix(10, 12, min=0, max=1, percent=90)

In [74]: M
Out[74]: 
⎡1  0  1  1  0  1  1  0  0  0  0  1⎤
⎢                                  ⎥
⎢0  1  1  0  1  1  0  1  0  1  0  1⎥
⎢                                  ⎥
⎢0  0  0  0  0  0  0  1  0  0  1  0⎥
⎢                                  ⎥
⎢1  0  0  0  0  1  1  1  0  0  0  0⎥
⎢                                  ⎥
⎢1  0  1  0  1  0  1  0  1  0  1  0⎥
⎢                                  ⎥
⎢0  0  0  0  0  0  1  1  1  0  1  0⎥
⎢                                  ⎥
⎢0  0  1  0  1  0  1  1  0  0  0  0⎥
⎢                                  ⎥
⎢1  1  0  0  0  0  0  0  0  1  0  0⎥
⎢                                  ⎥
⎢1  1  0  0  1  1  0  0  1  0  1  0⎥
⎢                                  ⎥
⎣0  1  0  0  1  0  1  0  0  1  0  1⎦

In [75]: M.nullspace()
Out[75]: 
⎡⎡-3/4 ⎤  ⎡1/4 ⎤⎤
⎢⎢     ⎥  ⎢    ⎥⎥
⎢⎢ 5/2 ⎥  ⎢3/2 ⎥⎥
⎢⎢     ⎥  ⎢    ⎥⎥
⎢⎢ 7/4 ⎥  ⎢3/4 ⎥⎥
⎢⎢     ⎥  ⎢    ⎥⎥
⎢⎢-11/4⎥  ⎢-7/4⎥⎥
⎢⎢     ⎥  ⎢    ⎥⎥
⎢⎢ -2  ⎥  ⎢ -1 ⎥⎥
⎢⎢     ⎥  ⎢    ⎥⎥
⎢⎢ 1/2 ⎥  ⎢-1/2⎥⎥
⎢⎢     ⎥, ⎢    ⎥⎥
⎢⎢ 5/4 ⎥  ⎢1/4 ⎥⎥
⎢⎢     ⎥  ⎢    ⎥⎥
⎢⎢ -1  ⎥  ⎢ 0  ⎥⎥
⎢⎢     ⎥  ⎢    ⎥⎥
⎢⎢-5/4 ⎥  ⎢-1/4⎥⎥
⎢⎢     ⎥  ⎢    ⎥⎥
⎢⎢-7/4 ⎥  ⎢-7/4⎥⎥
⎢⎢     ⎥  ⎢    ⎥⎥
⎢⎢  1  ⎥  ⎢ 0  ⎥⎥
⎢⎢     ⎥  ⎢    ⎥⎥
⎣⎣  0  ⎦  ⎣ 1  ⎦⎦

In [76]: from sympy.polys.matrices import DomainMatrix

In [77]: dM = DomainMatrix.from_Matrix(M)

In [78]: dM.to_field().nullspace().transpose().to_Matrix()
Out[78]: 
⎡-3/4   1/4 ⎤
⎢           ⎥
⎢ 5/2   3/2 ⎥
⎢           ⎥
⎢ 7/4   3/4 ⎥
⎢           ⎥
⎢-11/4  -7/4⎥
⎢           ⎥
⎢ -2     -1 ⎥
⎢           ⎥
⎢ 1/2   -1/2⎥
⎢           ⎥
⎢ 5/4   1/4 ⎥
⎢           ⎥
⎢ -1     0  ⎥
⎢           ⎥
⎢-5/4   -1/4⎥
⎢           ⎥
⎢-7/4   -7/4⎥
⎢           ⎥
⎢  1     0  ⎥
⎢           ⎥
⎣  0     1  ⎦

Timing comparisons:

In [89]: for m in [10, 20, 30, 40]:
    ...:     M = sympy.randMatrix(m, int(m*1.2), min=0, max=1, percent=90)
    ...:     dM = DomainMatrix.from_Matrix(M).to_field()
    ...:     print()
    ...:     print()
    ...:     print('Size:', M.shape)
    ...:     print()
    ...:     print('DomainMatrix.nullspace():')
    ...:     %time ok = dM.nullspace()
    ...:     print()
    ...:     print('Matrix.nullspace():')
    ...:     %time ok = M.nullspace()
    ...: 


Size: (10, 12)

DomainMatrix.nullspace():
CPU times: user 1.28 ms, sys: 37 µs, total: 1.32 ms
Wall time: 1.36 ms

Matrix.nullspace():
CPU times: user 30.7 ms, sys: 0 ns, total: 30.7 ms
Wall time: 30.3 ms


Size: (20, 24)

DomainMatrix.nullspace():
CPU times: user 1.88 ms, sys: 0 ns, total: 1.88 ms
Wall time: 1.89 ms

Matrix.nullspace():
CPU times: user 242 ms, sys: 3.94 ms, total: 246 ms
Wall time: 246 ms


Size: (30, 36)

DomainMatrix.nullspace():
CPU times: user 9.27 ms, sys: 0 ns, total: 9.27 ms
Wall time: 9.28 ms

Matrix.nullspace():
CPU times: user 1min 19s, sys: 133 ms, total: 1min 19s
Wall time: 1min 19s


Size: (40, 48)

DomainMatrix.nullspace():
CPU times: user 24.3 ms, sys: 4 µs, total: 24.3 ms
Wall time: 24.3 ms

Matrix.nullspace():
^C---------------------------------------------------------------------------
KeyboardInterrupt  

I didn’t wait for the last result here but you can see that for 30×36 the time difference is 9 milliseconds compared to 1 minute so it is much faster for larger matrices.

Answered By: Oscar Benjamin