Optimize this linear transformation for images with Numpy
Question:
Good evening,
I’m trying to learn NumPy and have written a simple Linear transformation that applies to an image using for loops:
import numpy as np
M = np.array([
[width, 0],
[0, height]
])
T = np.array([
[1, 3],
[0, 1]
])
def transform_image(M, T):
T_rel_M = abs(M @ T)
new_img = np.zeros(T_rel_M.sum(axis=1).astype("int")).T
for i in range(0, 440):
for j in range(0, 440):
x = np.array([j, i])
coords = (T @ x)
x = coords[0]
y = coords[1]
new_img[y, -x] = image[i, -j]
return new_img
plt.imshow(transform_image(M, T))
It does what I want and spits out a transformation that is correct, except that I think there is a way to do this without the loops.
I tried doing some stuff with meshgrid but I couldn’t figure out how to get the pixels from the image in the same way I do it in the loop (using i and j). I think I figured out how to apply the transformation but then getting the pixels from the image in the correct spots wouldn’t work.
Any ideas?
EDIT:
Great help with below solutions, lezaf’s solution was very similar to what I tried before, the only step missing that I couldn’t figure out was assigning the pixels from the old to the new image. I made some changes to the code to exclude transposing, and also added a astype("int") so it works with float values in the T matrix:
def transform_image(M, T):
T_rel_M = abs(M @ T)
new_img = np.zeros(T_rel_M.sum(axis=1).astype("int")).T
x_combs = np.array(np.meshgrid(np.arange(width), np.arange(height))).reshape(2,-1)
coords = (T @ x_combs).astype("int")
new_img[coords[1, :], -coords[0, :]] = image[x_combs[1, :], -x_combs[0, :]]
return new_img
Answers:
A more efficient solution is the following:
def transform_image(M, T):
T_rel_M = abs(M @ T)
new_img = np.zeros(T_rel_M.sum(axis=1).astype("int")).T
# This one replaces the double for-loop
x_combs = np.array(np.meshgrid(np.arange(440), np.arange(440))).T.reshape(-1,2)
# Calculate the new coordinates
coords = (T@x_combs.T)
# Apply changes to new_img
new_img[coords[1, :], -coords[0, :]] = image[x_combs[:, 1], -x_combs[:,0]]
I updated my solution removing the for-loop, so now is a lot more straightforward.
After this change, the time of the optimized code is 50 ms
compared to the initial 3.06 s
of the code in question.
There seems to have some confusions between width/height, x/y, … so not 100% my code won’t need adaptation. But I think, the main idea is the one you are looking for
def transform_image(M, T):
T_rel_M = abs(M @ T)
j,i=np.meshgrid(range(width), range(height))
ji=np.array((j.flatten(), i.flatten()))
coords = (T@ji).astype(int)
new_img=np.zeros((coords[1].max()+1, coords[0].max()+1), dtype=np.uint8)
new_img[coords[1], coords[0]] = image.flatten()
The main idea here is to build a set of coordinates of the input image with meshgrid. I don’t want a 2d-array of coordinates. Just a list of coordinates (a list of pairs i,j). Hence the flatten. So ji
is a huge 2×N array, N being the number of pixels (so width×height).
coords
is the transformation of all those coordinates.
Since your original code seemed to have some inconsistency with size (the rotated image did not fit in the new_img
), I choose the easy way to compute the size of new_img
, and just compute the max of those coordinates (a bit overkill: the max of the four corners would be enough)
And then, I use this set of coordinates as indexes for new_img
, to which I affect the matching image, that is image
flatten
So, no for loop at all.
(Note that I’ve dropped the -x thing also. Just because I struggled to understand. I could have putted it back now that I have a working solution. But I am not 100% sure if it wasn’t there because you also tried/errored some strange adjustment. But anyway, I think what you were looking for is how to use meshgrid to create a set of coordinates and process them without loop. Even if you may need to adapt my solution, you have it: flatten the coordinates of meshgrid, transform them with a matrix multiplication, and use them as index for places of all pixels of the original image)
Edit : variant
def transform_image(M, T):
T_rel_M = abs(M @ T)
ji=np.array(np.meshgrid(range(width), range(height)))
coords = np.einsum('ik,kjl', T, ji).astype(int)
new_img=np.zeros((max(coords[1,0,-1],coords[1,-1,0], coords[1,-1,-1])+1, max(coords[0,0,-1], coords[0,-1,0], coords[0,-1,-1])+1), dtype=np.uint8)
new_img[coords[1].flatten(), coords[0].flatten()] = image.flatten()
return new_img
The idea is the same. But instead of flattening directly ji
original coordinates, I keep them as is. Then use einsum
to perform a matrix multiplication on a 3D array (which returns also a 2d 2×width×height arrays, whose each [:,j,i]
value is just the transformation of [j,i]
. So, it is just the same as previous @
, except that it works even if, instead of having a 2×N set of coordinates we have a 2×width×height one).
Which has 2 advantages
- Apparently it is sensibly faster to create ji than way
- It allows the usage of just corners to find the size of the new image, as I’ve mentioned before (that was more difficult when
coords
was flatten from its creation).
Timing
Solution
Timing
Yours
4.5 s
lezaf’s
3.2 s
This one
49 ms
The variant
41 ms
Good evening,
I’m trying to learn NumPy and have written a simple Linear transformation that applies to an image using for loops:
import numpy as np
M = np.array([
[width, 0],
[0, height]
])
T = np.array([
[1, 3],
[0, 1]
])
def transform_image(M, T):
T_rel_M = abs(M @ T)
new_img = np.zeros(T_rel_M.sum(axis=1).astype("int")).T
for i in range(0, 440):
for j in range(0, 440):
x = np.array([j, i])
coords = (T @ x)
x = coords[0]
y = coords[1]
new_img[y, -x] = image[i, -j]
return new_img
plt.imshow(transform_image(M, T))
It does what I want and spits out a transformation that is correct, except that I think there is a way to do this without the loops.
I tried doing some stuff with meshgrid but I couldn’t figure out how to get the pixels from the image in the same way I do it in the loop (using i and j). I think I figured out how to apply the transformation but then getting the pixels from the image in the correct spots wouldn’t work.
Any ideas?
EDIT:
Great help with below solutions, lezaf’s solution was very similar to what I tried before, the only step missing that I couldn’t figure out was assigning the pixels from the old to the new image. I made some changes to the code to exclude transposing, and also added a astype("int") so it works with float values in the T matrix:
def transform_image(M, T):
T_rel_M = abs(M @ T)
new_img = np.zeros(T_rel_M.sum(axis=1).astype("int")).T
x_combs = np.array(np.meshgrid(np.arange(width), np.arange(height))).reshape(2,-1)
coords = (T @ x_combs).astype("int")
new_img[coords[1, :], -coords[0, :]] = image[x_combs[1, :], -x_combs[0, :]]
return new_img
A more efficient solution is the following:
def transform_image(M, T):
T_rel_M = abs(M @ T)
new_img = np.zeros(T_rel_M.sum(axis=1).astype("int")).T
# This one replaces the double for-loop
x_combs = np.array(np.meshgrid(np.arange(440), np.arange(440))).T.reshape(-1,2)
# Calculate the new coordinates
coords = (T@x_combs.T)
# Apply changes to new_img
new_img[coords[1, :], -coords[0, :]] = image[x_combs[:, 1], -x_combs[:,0]]
I updated my solution removing the for-loop, so now is a lot more straightforward.
After this change, the time of the optimized code is 50 ms
compared to the initial 3.06 s
of the code in question.
There seems to have some confusions between width/height, x/y, … so not 100% my code won’t need adaptation. But I think, the main idea is the one you are looking for
def transform_image(M, T):
T_rel_M = abs(M @ T)
j,i=np.meshgrid(range(width), range(height))
ji=np.array((j.flatten(), i.flatten()))
coords = (T@ji).astype(int)
new_img=np.zeros((coords[1].max()+1, coords[0].max()+1), dtype=np.uint8)
new_img[coords[1], coords[0]] = image.flatten()
The main idea here is to build a set of coordinates of the input image with meshgrid. I don’t want a 2d-array of coordinates. Just a list of coordinates (a list of pairs i,j). Hence the flatten. So ji
is a huge 2×N array, N being the number of pixels (so width×height).
coords
is the transformation of all those coordinates.
Since your original code seemed to have some inconsistency with size (the rotated image did not fit in the new_img
), I choose the easy way to compute the size of new_img
, and just compute the max of those coordinates (a bit overkill: the max of the four corners would be enough)
And then, I use this set of coordinates as indexes for new_img
, to which I affect the matching image, that is image
flatten
So, no for loop at all.
(Note that I’ve dropped the -x thing also. Just because I struggled to understand. I could have putted it back now that I have a working solution. But I am not 100% sure if it wasn’t there because you also tried/errored some strange adjustment. But anyway, I think what you were looking for is how to use meshgrid to create a set of coordinates and process them without loop. Even if you may need to adapt my solution, you have it: flatten the coordinates of meshgrid, transform them with a matrix multiplication, and use them as index for places of all pixels of the original image)
Edit : variant
def transform_image(M, T):
T_rel_M = abs(M @ T)
ji=np.array(np.meshgrid(range(width), range(height)))
coords = np.einsum('ik,kjl', T, ji).astype(int)
new_img=np.zeros((max(coords[1,0,-1],coords[1,-1,0], coords[1,-1,-1])+1, max(coords[0,0,-1], coords[0,-1,0], coords[0,-1,-1])+1), dtype=np.uint8)
new_img[coords[1].flatten(), coords[0].flatten()] = image.flatten()
return new_img
The idea is the same. But instead of flattening directly ji
original coordinates, I keep them as is. Then use einsum
to perform a matrix multiplication on a 3D array (which returns also a 2d 2×width×height arrays, whose each [:,j,i]
value is just the transformation of [j,i]
. So, it is just the same as previous @
, except that it works even if, instead of having a 2×N set of coordinates we have a 2×width×height one).
Which has 2 advantages
- Apparently it is sensibly faster to create ji than way
- It allows the usage of just corners to find the size of the new image, as I’ve mentioned before (that was more difficult when
coords
was flatten from its creation).
Timing
Solution | Timing |
---|---|
Yours | 4.5 s |
lezaf’s | 3.2 s |
This one | 49 ms |
The variant | 41 ms |