Optimize this linear transformation for images with Numpy

Question:

Good evening,

I’m trying to learn NumPy and have written a simple Linear transformation that applies to an image using for loops:

import numpy as np

M = np.array([
    [width, 0],
    [0, height]
])

T = np.array([
    [1, 3],
    [0, 1]
])

def transform_image(M, T):
    T_rel_M = abs(M @ T)
    new_img = np.zeros(T_rel_M.sum(axis=1).astype("int")).T
    
    for i in range(0, 440):
        for j in range(0, 440):
            x = np.array([j, i])
            coords = (T @ x)
            x = coords[0]
            y = coords[1]
            new_img[y, -x] = image[i, -j]
    
    return new_img

plt.imshow(transform_image(M, T))

It does what I want and spits out a transformation that is correct, except that I think there is a way to do this without the loops.

I tried doing some stuff with meshgrid but I couldn’t figure out how to get the pixels from the image in the same way I do it in the loop (using i and j). I think I figured out how to apply the transformation but then getting the pixels from the image in the correct spots wouldn’t work.

Any ideas?

EDIT:
Great help with below solutions, lezaf’s solution was very similar to what I tried before, the only step missing that I couldn’t figure out was assigning the pixels from the old to the new image. I made some changes to the code to exclude transposing, and also added a astype("int") so it works with float values in the T matrix:

def transform_image(M, T):
    T_rel_M = abs(M @ T)
    new_img = np.zeros(T_rel_M.sum(axis=1).astype("int")).T
    x_combs = np.array(np.meshgrid(np.arange(width), np.arange(height))).reshape(2,-1)
    coords = (T @ x_combs).astype("int")
    new_img[coords[1, :], -coords[0, :]] = image[x_combs[1, :], -x_combs[0, :]]
    
    return new_img
Asked By: Caj M

||

Answers:

A more efficient solution is the following:

def transform_image(M, T):
    T_rel_M = abs(M @ T)
    new_img = np.zeros(T_rel_M.sum(axis=1).astype("int")).T

    # This one replaces the double for-loop
    x_combs = np.array(np.meshgrid(np.arange(440), np.arange(440))).T.reshape(-1,2)
    # Calculate the new coordinates
    coords = (T@x_combs.T)
    # Apply changes to new_img
    new_img[coords[1, :], -coords[0, :]] = image[x_combs[:, 1], -x_combs[:,0]]

I updated my solution removing the for-loop, so now is a lot more straightforward.

After this change, the time of the optimized code is 50 ms compared to the initial 3.06 s of the code in question.

Answered By: lezaf

There seems to have some confusions between width/height, x/y, … so not 100% my code won’t need adaptation. But I think, the main idea is the one you are looking for

def transform_image(M, T):
    T_rel_M = abs(M @ T)
    j,i=np.meshgrid(range(width), range(height))
    ji=np.array((j.flatten(), i.flatten()))
    coords = (T@ji).astype(int)
    new_img=np.zeros((coords[1].max()+1, coords[0].max()+1), dtype=np.uint8)
    new_img[coords[1], coords[0]] = image.flatten()

The main idea here is to build a set of coordinates of the input image with meshgrid. I don’t want a 2d-array of coordinates. Just a list of coordinates (a list of pairs i,j). Hence the flatten. So ji is a huge 2×N array, N being the number of pixels (so width×height).
coords is the transformation of all those coordinates.
Since your original code seemed to have some inconsistency with size (the rotated image did not fit in the new_img), I choose the easy way to compute the size of new_img, and just compute the max of those coordinates (a bit overkill: the max of the four corners would be enough)

And then, I use this set of coordinates as indexes for new_img, to which I affect the matching image, that is image flatten

So, no for loop at all.

(Note that I’ve dropped the -x thing also. Just because I struggled to understand. I could have putted it back now that I have a working solution. But I am not 100% sure if it wasn’t there because you also tried/errored some strange adjustment. But anyway, I think what you were looking for is how to use meshgrid to create a set of coordinates and process them without loop. Even if you may need to adapt my solution, you have it: flatten the coordinates of meshgrid, transform them with a matrix multiplication, and use them as index for places of all pixels of the original image)

Edit : variant

def transform_image(M, T):
    T_rel_M = abs(M @ T)
    ji=np.array(np.meshgrid(range(width), range(height)))
    coords = np.einsum('ik,kjl', T, ji).astype(int)
    new_img=np.zeros((max(coords[1,0,-1],coords[1,-1,0], coords[1,-1,-1])+1, max(coords[0,0,-1], coords[0,-1,0], coords[0,-1,-1])+1), dtype=np.uint8)
    new_img[coords[1].flatten(), coords[0].flatten()] = image.flatten()    
    return new_img

The idea is the same. But instead of flattening directly ji original coordinates, I keep them as is. Then use einsum to perform a matrix multiplication on a 3D array (which returns also a 2d 2×width×height arrays, whose each [:,j,i] value is just the transformation of [j,i]. So, it is just the same as previous @, except that it works even if, instead of having a 2×N set of coordinates we have a 2×width×height one).
Which has 2 advantages

  • Apparently it is sensibly faster to create ji than way
  • It allows the usage of just corners to find the size of the new image, as I’ve mentioned before (that was more difficult when coords was flatten from its creation).

Timing

Solution Timing
Yours 4.5 s
lezaf’s 3.2 s
This one 49 ms
The variant 41 ms
Answered By: chrslg