How to compare against and modify values of NumPy array

Question:

I am trying to convert a numpy array to a .vox file. .vox files have a limit where they can only store 255 unique colors. My numpy array is being somewhat randomly generated, so it’s length and values are not always the same. However, it’s shape is always (N, 3) and the color values are usually similar. For instance, if there is a "red" part of the array, most of the reds are close enough to be visually the same. I’ve created another numpy array with a set of 19 sample colors equally spaced between 13 points in the RGB color space, which produces a shape of (247, 3).

eg. ([13, 0, 0], [26, 0, 0], [39, 0, 0], [52, 0, 0], [65, 0, 0], [78, 0, 0], [91, 0, 0],
[104, 0, 0], [117, 0, 0], [130, 0, 0], [143, 0, 0], [156, 0, 0], [169, 0, 0], [182, 0, 0],
[195, 0, 0], [208, 0, 0], [221, 0, 0], [234, 0, 0], [247, 0, 0]) x 13 other sets

How can I compare every color in my original numpy array to my array of sample colors and change its value to the closest match? It is ok if the length of the array is greater than 255 so long as there are only 255 or less unique colors.

Answers:

The most usual way to compare everything to everything (and, generally speaking to do in numpy the equivalent of nested for loops) is to use broadcasting.

Let’s consider a smaller example

colorTable = np.array([[0,0,0], [120,0,0], [0,120,0], [0,0,120], [255,255,255]])
randomColors = np.array([[10,10,10], [255,0,0], [140,140,140], [0,0,130], [20,200,80]])

So, the idea is to compare all colors from randomColors to all from colorTable.

Numpy broadcasting consist in assigning one different axis to each dimension you want to iterated in nested implicit for loop.

For example, before applying to our case

a=np.array([1,2,3])
b=np.array([4,5,6,7])
a[:,None]*b[None, :]
# array([[ 4,  5,  6,  7],
#        [ 8, 10, 12, 14],
#        [12, 15, 18, 21]])

See that we places ourselves in 2D, making a a column of 3 numbers, and b a row of 4 numbers, and letting numpy broadcasting peform the 12 matching multiplications.

So, in our case,

colorTable[:,None,:]-randomColors[None,:,:]

computes the difference between each color (in axis 0) of colorTable, and each color of randomColor (in axis 1). Note that axis 2 are the 3 r,g,b. Since this axis is present in both operands, no broadcasting here.

array([[[ -10,  -10,  -10],
        [-255,    0,    0],
        [-140, -140, -140],
        [   0,    0, -130],
        [ -20, -200,  -80]],

       [[ 110,  -10,  -10],
        [-135,    0,    0],
        [ -20, -140, -140],
        [ 120,    0, -130],
        [ 100, -200,  -80]],

       [[ -10,  110,  -10],
        [-255,  120,    0],
        [-140,  -20, -140],
        [   0,  120, -130],
        [ -20,  -80,  -80]],

       [[ -10,  -10,  110],
        [-255,    0,  120],
        [-140, -140,  -20],
        [   0,    0,  -10],
        [ -20, -200,   40]],

       [[ 245,  245,  245],
        [   0,  255,  255],
        [ 115,  115,  115],
        [ 255,  255,  125],
        [ 235,   55,  175]]])

As you can see, this is a 3D array, that you can see as a 2D array of rgb triplets (1 color of color table in each row, 1 color of randomColors in each column)

((colorTable[:,None,:]-randomColors[None,:,:])**2).sum(axis=2)

sum the square of this difference along axis 2. So what we have here is, for each pair (r,g,b), (r’,g’,b’) of color from both array, is (r-r’)²+(g-g’)²+(b-b’)².

array([[   300,  65025,  58800,  16900,  46800],
       [ 12300,  18225,  39600,  31300,  56400],
       [ 12300,  79425,  39600,  31300,  13200],
       [ 12300,  79425,  39600,    100,  42000],
       [180075, 130050,  39675, 145675,  88875]])

This is a 2D array of square of euclidean distance between each color of colorTable (on each row) and each color of randomColors (on each column).

If we want to find the index in colorTable of the closest color to randomColors[3], all we have to do is to compute argmin of column 3 of this table.

((colorTable[:,None,:]-randomColors[None,:,:])**2).sum(axis=2)[:,3].argmin()

Result is, correctly, 3.

Or, even better, we can do that for all columns, by telling argmin to compute minimum only along axis 0, that is along rows, that is along all color of colorTable

((colorTable[:,None,:]-randomColors[None,:,:])**2).sum(axis=2).argmin(axis=0)
# array([0, 1, 1, 3, 2])

You can see that the result is, correctly, for each column, that is each color of randomColors, the index of the color of colorTable that is closest (for euclidean distance) to id. That is, the index of the smallest number in each column of the previous table

So, all that remains here, is to extract the color of colorTable matching this index

colorTable[((colorTable[:,None,:]-randomColors[None,:,:])**2).sum(axis=2).argmin(axis=0)]

Giving a table of the same shape as randomColors (that is having as many rows as the previous result have indexes), made of colors from colorTable (the one closest to the each rows)

array([[  0,   0,   0],
       [120,   0,   0],
       [120,   0,   0],
       [  0,   0, 120],
       [  0, 120,   0]])

Note that the result is not always intuitive. (140,140,140) is closest to (120,0,0) than it is to (255,255,255)

But that is a matter of defining the distance.

Answered By: chrslg