How to make a for loop in python, which is called multiple times consecutively, execute faster?

Question:

I’d like to state off the bat that I don’t have a lot of experience with numPy, and deeper explanation would be appreciated(even obvious ones).
Here’s my issue:

converted_X = X

for col in X:
    curr_data = X[col]
    i = 0
    for pix in curr_data:
        inv_pix = 255.0 - pix
        curr_data[i] = inv_pix
        i+=1
    converted_X[col] = curr_data.values

Context: X is a DataFrame with images of handwritten digits (70k images, 784 pixels/image).
The entire point of doing this is to change the black background to white and white numbers to black.
The only problem I’m facing with this is that it’s taking a ridiculously long time. I tried using rich.Progress() to track its execution, and it’s an astonishing 4 hour ETA.
Also, I’m executing this code block in the jupyter notebook extension of VSCode (Might help).

I know it probably has to do with a ton of inefficiencies and under-usage of numPy functionality, but I need guidance.

Thanks in advance.

Asked By: Hrishav Saha

||

Answers:

Never ever write for loop in python on numpy data, that is how you make them faster.
Most of the times, there are ways to have numpy do the for loop for you (meaning, process data by batch. Obviously, there is still a for loop. But not one you wrote in python)

Here, it seems you are trying to compute an inverted image, whose pixels are 255-original pixel.

Just write inverted_image = 255-image

Addition: note that as a python array, numpy arrays are quite inefficient. If you use them just as 2D arrays, that you read and write with low level instruction (settings values individually), then, most of the time, even good’ol python lists are faster. For example, in your case (I’ve just tried), on my machine, your code is 9 times slower with ndarrays than the exact same code, using directly python list of list of values.
The whole point of ndarrays is that they are faster because you can use them with numpy functions that deal with the whole data in batch for you. And that would not be feasible as easily with python lists.

Answered By: chrslg

If X is a numpy array, you can do the following, without any loops:

converted_X = 255.0 - X
Answered By: AndrzejO
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.