performing operation on matched columns of NumPy arrays

Question:

I am new to python and programming in general and ran into a question:
I have two NumPy arrays of the same shape: they are 2D arrays, of the dimensions 1000 x 2000.

I wish to compare the values of each column in array A with the values in array B. The important part is that not every column of A should be compared to every column in B, but rather the same columns of A & B should be compared to one another, as in: A[:,0] should be compared to B[:,0], A[:,1] should be compared to B[:,1],… etc.

This was easier to do when I had one dimensional arrays: I used zip(A, B), so I could run the following for-loop:

A = np.array([2,5,6,3,7])
B = np.array([1,3,9,4,8])

res_list = []
For number1, number2 in zip(A, B):
    if number1 > number2:
    comment1 = “bigger”
    res_list.append(comment1)
    if number1 < number2:
    comment2 = “smaller”
    res_list.append(comment2)

res_list

In [702]: res_list
Out[702]: ['bigger', 'bigger', 'smaller', 'smaller', 'smaller']

however, I am not sure how to best do this on the 2D array. As output, I am aiming for a list with 2000 sublists (the 2000 cols), to later count the numbers of instances of "bigger" and "smaller" for each of the columns.

I am very thankful for any input.

So far I have tried to use np.nditer in a double for loop, but it returned all the possible column combinations. I would specifically desire to only combine the "matching" columns.

an approximation of the input (but I have: 1000 rows and 2000 cols)

In [709]: A
Out[709]: 
array([[2, 5, 6, 3, 7],
       [6, 2, 9, 2, 3],
       [2, 1, 4, 5, 7]])

In [710]: B
Out[710]: 
array([[1, 3, 9, 4, 8],
       [4, 8, 2, 3, 1],
       [3, 7, 1, 8, 9]])

As desired output, I want to compare the values of the arrays A & B column-wise (only the "matching" columns, not all columns with all columns, as I tried to explain above), and store them in the a nested list (number of "sublists" should correspond to the number of columns):

res_list = [["bigger", "bigger", "smaller"], ["bigger", "smaller", "smaller"], ["smaller", "bigger", "bigger], ["smaller", "smaller", "smaller"], ...]
Asked By: tom

||

Answers:

From the example input and output, I see that you want to do an element wise comparison, and store the values per columns. From your code you understand the 1D variant of this problem, so the question seems to be how to do it in 2D.

Solution 1

In order to achieve this, we have to make the 2D problem, a 1D problem, so you can do what you already did. If for example the columns would become rows, then you can redo your zip strategy for every row.

In otherwords, if we can turn:

a = np.array(
    [[2, 5, 6, 3, 7],
     [6, 2, 9, 2, 3],
     [2, 1, 4, 5, 7]]
)

into:

array([[2, 6, 2],
       [5, 2, 1],
       [6, 9, 4],
       [3, 2, 5],
       [7, 3, 7]])

we can iterate over a and b, at the same time, and get our 1D version of the problem. Swapping the x and y axis of the matrix like this, is called transposing, and is very common, the operation for numpy is a.T, (docs ndarry.T).

Now we use your code onces for the outer loop of iterating over all the rows (after transposing, all the rows actually hold the column values). After which we use the code on those values, because every row is a 1D numpy array.

result = []

# Outer loop, to go over the columns of `a` and `b` at the same time.
for row_a, row_b in zip(a.T, b.T):

    result_col = []
    # Inner loop to compare a whole column element wise.
    for col_a, col_b in zip(row_a, row_b):
        result_col.append('bigger' if col_a > col_b else 'smaller')
    result.append(result_col)

Note: I use a ternary operator to assign smaller and bigger.

Solution 2

As indicated before you are only looking at 2 values that are in the same position for both arrays, this is called an elementwise comparison. Since we are only interested in the values that are at the exact same position, and we know the output shape of our result array (input 1000×2000, output will be 2000×1000), we can also iterate over all the elements using their index.

Now some quick handy shortcuts,

  • a.shape holds the dimensions of the array, therefore a.shape will be (1000, 2000).
  • using [::-1] will reverse the order, similar to reverse()
  • Combining a.shape[::-1] will hold (2000, 1000), our expected output shape.
  • np.ndindex provides indexing, based on the number of dimensions provided.
  • An *, performs tuple unpacking, so using it like np.ndindex(*a.shape), is equivalent to np.ndindex(1000, 2000).

Therefore we can use their index (from np.ndindex) and turn the x and y around to write the result to the correct location in the output array:

a = np.random.randint(0, 255, (1000, 2000))
b = np.random.randint(0, 255, (1000, 2000))
result = np.zeros(a.shape[::-1], dtype=object)

for rows, columns in np.ndindex(*a.shape):
    result[columns, rows] = 'bigger' if a[rows, columns] > b[rows, columns] else 'smaller'

print(result)

This will lead to the same result. Similarly we could also first transpose the a and b array, drop the [::-1] in the result array, and swap the assignment result[columns, rows] back to result[rows, columns].

Edit


Thinking about it a bit longer, you are only interested in doing a comparison between two array of the same shape (dimension). For this numpy already has a good solution, np.where(cond, <true>, <false>).

So the entire problem can be reduced to:

answer = np.where(a > b, 'bigger', 'smaller').T

Note the .T to transpose the solution, such that the answer has the columns in the rows.

Answered By: Thymen