Can i rewrite this code to make it work faster?

Question:

Is it actually possible to make this run faster? I need to get half of all possible grids (all elements can be either -1 or 1) of size 4*Lx (for counting energies in Ising model).

def get_grid(Lx):
    a = list()
    count = 0
    t = list(product([1,-1], repeat=Lx))
    for i in range(len(t)):
        for j in range(len(t)):
            for k in range(len(t)):
                for l in range(len(t)):
                    count += 1
                    a.append([t[i], t[j], t[k], t[l]])
                    if count == 2**(Lx*4)/2:
                        return np.array(a)

Tried using numba, but that didn’t work out.

Asked By: DisplayName1234

||

Answers:

First of all, Numba does not like lists. If you want an efficient code, then you need to operate on arrays (except when you really do not know the size at runtime and estimating it is hard/slow). Here the size of the output array is already known so it is better to preallocate it and then fill it. Numba does not like much high-level features like generators, you should prefer using basic loops which are faster (as long as they are executed in a JITed function). The Cartesian product can be replaced by the efficient computation of an array based on the bits of an increasing integer. The whole computation is mainly memory-bound so it is better to use small integer datatypes like uint8 which take 4 times less space in RAM (and thus about 4 times faster to fill). Here is the resulting code:

import numpy as np
import numba as nb

@nb.njit('int8[:,:,:](int64,)')
def get_grid_numba(Lx):
    t = np.empty((2**Lx, Lx), dtype=np.int8)
    for i in range(2**Lx):
        for j in range(Lx):
            t[i, Lx-1-j] = 1 - 2 * ((i >> j) & 1)
    outSize = 2**(Lx*4 - 1)
    out = np.empty((outSize, 4, Lx), dtype=np.int8)
    cur = 0
    for i in range(len(t)):
        for j in range(len(t)):
            for k in range(len(t)):
                for l in range(len(t)):
                    out[cur, 0, :] = t[i, :]
                    out[cur, 1, :] = t[j, :]
                    out[cur, 2, :] = t[k, :]
                    out[cur, 3, :] = t[l, :]
                    cur += 1
                    if cur == outSize:
                        return out
    return out

For Lx=4, the initial code takes 66.8 ms while this Numba code takes 0.36 ms on my i5-9600KF processor. It is thus 185 times faster.


Note that the size of the output array exponentially grows very quickly. For Lx=7, the output shape is (134217728, 4, 7) and it takes 3.5 GiB of RAM. The Numba code takes 2.47 s to generate it, that is 1.4 GiB/s. If this is not enough to you, then you can write specific implementation from Lx=1 to Lx=8, use loops for the out slice assignment and even use multiple threads for Lx>=5. For small arrays, you can pre-compute them once. This should be an order of magnitude faster.

Answered By: Jérôme Richard
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.