Improve accuracy of image processing to count fungus spores

Question:

I’m trying to count the amount of spores of a disease from a microscopic sample with Pythony, but so far without much success.

Because the color of the spore is similar to the background, and many are close.

following the photographic microscopy of the sample.

Microscopic photograph of spores

Image processing code:

import numpy as np
import argparse
import imutils
import cv2

ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
                help="path to the input image")
ap.add_argument("-o", "--output", required=True,
                help="path to the output image")
args = vars(ap.parse_args())

counter = {}

image_orig = cv2.imread(args["image"])
height_orig, width_orig = image_orig.shape[:2]

image_contours = image_orig.copy()

colors = ['Yellow']
for color in colors:

    image_to_process = image_orig.copy()

    counter[color] = 0

    if color == 'Yellow':
        lower = np.array([70, 150, 140])  #rgb(151, 143, 80)
        upper = np.array([110, 240, 210])  #rgb(212, 216, 106)

    image_mask = cv2.inRange(image_to_process, lower, upper)

    image_res = cv2.bitwise_and(
        image_to_process, image_to_process, mask=image_mask)

    image_gray = cv2.cvtColor(image_res, cv2.COLOR_BGR2GRAY)
    image_gray = cv2.GaussianBlur(image_gray, (5, 5), 50)

    image_edged = cv2.Canny(image_gray, 100, 200)
    image_edged = cv2.dilate(image_edged, None, iterations=1)
    image_edged = cv2.erode(image_edged, None, iterations=1)

    cnts = cv2.findContours(
        image_edged.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if imutils.is_cv2() else cnts[1]

    for c in cnts:

        if cv2.contourArea(c) < 1100:
            continue

        hull = cv2.convexHull(c)
        if color == 'Yellow':

            cv2.drawContours(image_contours, [hull], 0, (0, 0, 255), 1)

        counter[color] += 1      

print("{} esporos {}".format(counter[color], color))

cv2.imwrite(args["output"], image_contours)

The algorithm counted 11 spores

But in the image contains 27 spores

Result from image processing shows spores are grouped spores are grouped

How do I make this more accurate?

Asked By: Georg Augusto

||

Answers:

These fungal spores have a roughly equal size, If you don’t care about quite exact accuracy, instead of jumping down the rabbit hole of expanding boundaries and watershed, what you could do is make a very simple change to your current algorithms and get a ton more accuracy.

The spores in this scene appear to have a similar size, and roughly homogeneous shape. Given that, you may be able to use the area of your contours to find the approximate number of spores that would occupy said area using the average area of spores. Spores cannot completely fill these arbitrary shapes, so you’ll have to take that into account. You would accomplish that by finding the background color, and removing the area the background color takes up from contour area. In scenes like this you should get really close to the real answer for cell area.

So to recap:

  • Find average area of spore

  • Find background color

  • Find contour area

  • Subtract background color pixels/area from contour

  • approximate_spore_count = ceil(contour_area / (average_area_of_spore))

You use ceil here to take care of the fact that you may have spores that are smaller than the average that are found individually, though you could also put a specific condition to handle this, however then you have to make a decision if you want to count the fraction of a spore or round to an integer with contour area > average area of spore.

However, you might note that if you can figure out the background color and your spores are roughly equally shaped and of homogeneous color, you would do way better in performance to simply subtract the area of the background color from the entire image and divide the average spore size from the area that is left. This would be way faster than using dilation.

Another thing you should consider, though I don’t think it will necessarily fix your clumping issue, is to use OpenCV’s built in Blob detection, which if you go for the area approach, might be able to help you with the edge cases that the gradient in your background might present. Using blob detection you can just detect blobs, and divide the total blob area by average spore area. You can follow this tutorial to understand how to use it in python. You may also find the success of a simple contour approach using opencv’s contours helpful for your use case.

TLDR: Your spores are about the same size and shade of color, your background is roughly homogeneous, use average spore area and divide the area taken up by the spore colors to get a much more accurate count

Adendum:

If you are having trouble finding the average spore area, then if you had any idea of the average "lonelyness" of a spore (clearly seperated-ness) you could use that to then sort the contours/blobs by area, then take the bottom n% of spores according to the "loneliness" probability (n), and average those. As long as "loneliness" is not largely dependent on spore size, this should be a pretty accurate measurement of average spore size. This works because if you assume uniform distribution of spores being "lonely" then you can think of it as a random sample in its own right, and if you know the average percentage of loneliness, then you will likely get a very high percentage of lonely spores if you take that %n of the sorted spores by size (or shrink n slightly to have a lower chance of accidentally grabbing big spores). You would theoretically only need to do this once if you knew the zoom factor.

Answered By: Krupip

First, some preliminary code that we’ll use below:

import numpy as np
import cv2
from matplotlib import pyplot as plt
from skimage.morphology import extrema
from skimage.morphology import watershed as skwater

def ShowImage(title,img,ctype):
  if ctype=='bgr':
    b,g,r = cv2.split(img)       # get b,g,r
    rgb_img = cv2.merge([r,g,b])     # switch it to rgb
    plt.imshow(rgb_img)
  elif ctype=='hsv':
    rgb = cv2.cvtColor(img,cv2.COLOR_HSV2RGB)
    plt.imshow(rgb)
  elif ctype=='gray':
    plt.imshow(img,cmap='gray')
  elif ctype=='rgb':
    plt.imshow(img)
  else:
    raise Exception("Unknown colour type")
  plt.title(title)
  plt.show()

For reference, here’s your original image:

#Read in image
img         = cv2.imread('cells.jpg')
ShowImage('Original',img,'bgr')

Original image

Otsu’s method is one way to segment colours. The method assumes that the intensity of the pixels of the image can be plotted into a bimodal histogram, and finds an optimal separator for that histogram. I apply the method below.

#Convert to a single, grayscale channel
gray        = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
#Threshold the image to binary using Otsu's method
ret, thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
ShowImage('Grayscale',gray,'gray')
ShowImage('Applying Otsu',thresh,'gray')

Grayscale cells
Tresholded cells

All those little speckles are annoying, we can get rid of them by dilating:

#Adjust iterations until desired result is achieved
kernel  = np.ones((3,3),np.uint8)
dilated = cv2.dilate(thresh, kernel, iterations=5)
ShowImage('Dilated',dilated,'gray')

With noise eliminated

We now need to identify the peaks of the watershed and give them separate labels. The goal of this is to generate a set of pixels such that each of the cells has a pixel within it and no two cells have their identifier pixels touching.

To achieve this, we perform a distance transformation and then filter out distances that are too far from the center of the cell.

#Calculate distance transformation
dist         = cv2.distanceTransform(dilated,cv2.DIST_L2,5)
ShowImage('Distance',dist,'gray')

Distance Transformation

#Adjust this parameter until desired separation occurs
fraction_foreground = 0.6
ret, sure_fg = cv2.threshold(dist,fraction_foreground*dist.max(),255,0)
ShowImage('Surely Foreground',sure_fg,'gray')

Foreground isolation

Each area of white in the above image is, as far as the algorithm is concerned, a separate cell.

Now we identify unknown regions, the regions which will be labeled by the watershed algorithm, by subtracting off the maxima:

# Finding unknown region
unknown = cv2.subtract(dilated,sure_fg.astype(np.uint8))
ShowImage('Unknown',unknown,'gray')

Unknown regions

The unknown regions should form complete donuts around each cell.

Next, we give each of the distinct regions resulting from the distance transform unique labels and then mark the unknown regions before finally performing the watershed transform:

# Marker labelling
ret, markers = cv2.connectedComponents(sure_fg.astype(np.uint8))
ShowImage('Connected Components',markers,'rgb')

# Add one to all labels so that sure background is not 0, but 1
markers = markers+1

# Now, mark the region of unknown with zero
markers[unknown==np.max(unknown)] = 0

ShowImage('markers',markers,'rgb')

dist    = cv2.distanceTransform(dilated,cv2.DIST_L2,5)
markers = skwater(-dist,markers,watershed_line=True)

ShowImage('Watershed',markers,'rgb')

Connected components
Uncertain area
Separate cells

Now the total number of cells is the number of unique markers minus 1 (to ignore the background):

len(set(markers.flatten()))-1

In this case, we get 23.

You can make this more or less accurate by adjusting the distance threshold, degree of dilation, maybe using h-maxima (locally-thresholded maxima). But beware of overfitting; that is, don’t assume that tuning for a single image will give you the best results everywhere.

Estimating uncertainty

You could also algorithmically vary the parameters slightly to get a sense of the uncertainty in the count. That might looks like this

import numpy as np
import cv2
import itertools
from matplotlib import pyplot as plt
from skimage.morphology import extrema
from skimage.morphology import watershed as skwater

def CountCells(dilation=5, fg_frac=0.6):
  #Read in image
  img         = cv2.imread('cells.jpg')
  #Convert to a single, grayscale channel
  gray        = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
  #Threshold the image to binary using Otsu's method
  ret, thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
  #Adjust iterations until desired result is achieved
  kernel  = np.ones((3,3),np.uint8)
  dilated = cv2.dilate(thresh, kernel, iterations=dilation)
  #Calculate distance transformation
  dist         = cv2.distanceTransform(dilated,cv2.DIST_L2,5)
  #Adjust this parameter until desired separation occurs
  fraction_foreground = fg_frac
  ret, sure_fg = cv2.threshold(dist,fraction_foreground*dist.max(),255,0)
  # Finding unknown region
  unknown = cv2.subtract(dilated,sure_fg.astype(np.uint8))
  # Marker labelling
  ret, markers = cv2.connectedComponents(sure_fg.astype(np.uint8))
  # Add one to all labels so that sure background is not 0, but 1
  markers = markers+1
  # Now, mark the region of unknown with zero
  markers[unknown==np.max(unknown)] = 0    
  markers = skwater(-dist,markers,watershed_line=True)
  return len(set(markers.flatten()))-1

#Smaller numbers are noisier, which leads to many small blobs that get
#thresholded out (undercounting); larger numbers result in possibly fewer blobs,
#which can also cause undercounting.
dilations = [4,5,6] 
#Small numbers equal less separation, so undercounting; larger numbers equal
#more separation or drop-outs. This can lead to over-counting initially, but
#rapidly to under-counting.
fracs     = [0.5, 0.6, 0.7, 0.8] 

for params in itertools.product(dilations,fracs):
  print("Dilation={0}, FG frac={1}, Count={2}".format(*params,CountCells(*params)))

Giving the result:

Dilation=4, FG frac=0.5, Count=22
Dilation=4, FG frac=0.6, Count=23
Dilation=4, FG frac=0.7, Count=17
Dilation=4, FG frac=0.8, Count=12
Dilation=5, FG frac=0.5, Count=21
Dilation=5, FG frac=0.6, Count=23
Dilation=5, FG frac=0.7, Count=20
Dilation=5, FG frac=0.8, Count=13
Dilation=6, FG frac=0.5, Count=20
Dilation=6, FG frac=0.6, Count=23
Dilation=6, FG frac=0.7, Count=24
Dilation=6, FG frac=0.8, Count=14

Taking the median of the count values is one way of incorporating that uncertainty into a single number.

Remember that StackOverflow’s licensing requires that you give appropriate attribution. In academic work, this can be done via citation.

Answered By: Richard