Calculate entropy of image using NumPy

Question:

I’m trying to calculate the entropy of an image using this formula.
In the case of a picture, Pi is the number of occurrences of the pixel I divided by the number of pixels. (This code is for gray level images only)

I wrote this code which has no errors, but it is too slow. How can I optimize it?

def Picalc(img,pixel):
    counter = 0
    for row in img:
        for col in row:
            if img[col].any()==pixel:
                counter+=1
    return counter / (np.shape(img)[0]*np.shape(img)[1])

def compute_entropy(img):
    im=imageio.imread(img)
    sum=0
    for i in range (0,256):
        if Picalc(im,i) != 0:
            sum += -Picalc(im, i) * np.log2(Picalc(im, i))
    return sum

Then i called the function:

print(compute_entropy('image.png'))
Asked By: amit

||

Answers:

The main problem (since the reason why "it returns nothing" is just because it is too slow), is that you should never, ever, iterates pixels yourself.

Python is one of the slowest language on Earth (it is very efficient for an interpreted language, but still, it is an interpreted language). The reason why many fast application are written in python, is because we always make sure that the most time consuming loops are not really in python, but occur inside the internal code (in C) of some libraries.

That the whole point of numpy for example.

So, if you need to compute something on all pixels of an image (a numpy array), be sure that it is numpy that iterates on all pixels and does what ever need on all of them).

I surmise that this code

def Picalc(img,pixel):
    counter = 0
    for row in img:
        for col in row:
            if img[col].any()==pixel:
                counter+=1
    return counter / (np.shape(img)[0]*np.shape(img)[1])

should really be

def Picalc(img,pixel):
    counter = 0
    for row in img:
        for col in row:
            if col==pixel:
                counter+=1
    return counter / (np.shape(img)[0]*np.shape(img)[1])

(would not make it faster. But at least it would work. Otherwise you are counting the number of time that the value of a pixel is also the ordinate of a row that contains at least one non-black pixel. So, on most image, that means the total number of pixels, since all pixels have a value between 0 and 255… unless there is a completely black area in the 256 1st lines of the image, that is true for all pixels. Plus, if the image height is less than 255, that is even an indexation error. And, anyway, and more importantly, it means nothing relevant)

And that code contains 2 for loops that should be avoided. Numpy’s role is to compute things on a whole batch of value, instead of one by one.
For example, if you want to test if a pixel value is 12, for all pixel of the image, instead of

for row in img:
    for col in row:
        if col==12:...

You can directly compute img==12, which is a 2d array of booleans, with True value where the pixel are 12.

To count the number of pixels that are 12, you can therefore just

(img==12).sum()

(numpy compare all pixels to 12. Then, sum all the value of the result. The result being an array of True/False value, with the convention True=1/False=0, that sum is the number of pixels that are 12).

Applying that to your problem

def Picalc(img,pixel):
    return (img==pixel).sum() / (img.shape[0]*img.shape[1])

While at it, let’s change also the other function

def compute_entropy(img):
    im=imageio.imread(img)
    mysum=0 # Never name a variable `sum`: that is a python symbol.
    for i in range (0,256):
        pc=Picalc(im,i) # Put that in a variable, otherwise you are doing 3 times the exact same, heavy, computation
        if pc != 0:
            mysum += -pc * np.log2(pc)
    return mysum

Second version

Note that this is using your own logic (iterating all possible values of pixels). But numpy already have a histogram function to compute probabily of occurence of each pixels.

# p[k] is probability of pixel value being k:
p=np.histogram(im, bins=256, range=(0,256), density=True)[0]
# Filter out null values
p=p[p>0]
# Result
return -(p*np.log2(p)).sum()

A variant of that, is, using scipy

import scipy.stats
p=np.histogram(im, bins=256, range=(0,256), density=True)[0]
return scipy.stats.entropy(p, base=2)

Timings

Method Time
Your code 1/2 hour
Optimized version of your code 405 ms
Using histogram 15 ms

So, computation time gain factor = 120000

(Note: scipy.stats.entropy is not faster, since computation time is mainly in the histogram part. Just, it avoids reinventing the entropy formula)

Answered By: chrslg
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.