Delete single pixels in binary image

Question:

Suppose I have a binary image, represented as a numpy matrix where a pixel is either background (0) or foreground (1). I’m looking for a way, to delete all pixels in the foreground, that don’t have any nearest neighbour.

Suppose, the image matrix is:

a = np.array([[0,0,1,1],[1,0,0,0]])

The resulting image after single pixel deletion should be

b = np.array([[0,0,1,1],[0,0,0,0]])

My approach so far is doing a combination of openings for all possible directions:

opening1 = ndi.morphology.binary_opening(edge, structure=np.array([[0,1,0],[0,1,0],[0,0,0]]))
opening2 = ndi.morphology.binary_opening(edge, structure=np.array([[0,0,0],[0,1,1],[0,0,0]]))
opening3 = ndi.morphology.binary_opening(edge, structure=np.array([[1,0,0],[0,1,0],[0,0,0]]))
opening4 = ndi.morphology.binary_opening(edge, structure=np.array([[0,0,0],[0,1,0],[0,0,1]]))

opening = opening1 + opening2 + opening3 + opening4

An alternative way would be labeling connected components and delete them by index, however those solutions feel sub-optimal when it comes to computational complexity.

Asked By: Dschoni

||

Answers:

Actually, labelling connected components seems to be the way to go. At least this is how skimage is doing it in the remove_small_objects function here.

Answered By: Dschoni

What about this?

The idea is to create a shift of the image by one pixel in each direction and then to determine if there is a neighbor by looking at a correspondance in any shift.

import numpy as np

a = np.array([[0,0,1,1],[1,0,0,0]])

print(a)

shift_bottom = np.roll(a, -1, axis=0)
shift_top = np.roll(a, 1, axis=0)
shift_right= np.roll(a, 1, axis=1)
shift_left= np.roll(a, -1, axis=1)

b = (a * shift_bottom) | (a * shift_top) | (a * shift_right) | (a * shift_left)

print(b)
Answered By: pierresegonne

The following script seems to work. It removes all islands of size 1, then all islands up to size 3. It could be used to remove larger islands too, but removing islands of size 1 is enough to answer your question. It removes islands of both colors, so not only will you get rid of dirt around the text, but also inside the letters. The script considers pixels with touching corners to be adjacent (remove a few lines of code to only count touching sides as adjacent).

The script loops over all the rows and for each row, it loops over each pixel from left to right and checks if it is an island.

The reason to remove smaller islands first, and then larger, is a case like this:

□□□□
□□■■
□□■□

When searching for islands of size 1, it will find the white size 1 island in the bottom right corner and make it black. Then it will not find any more island.

If it just searched for islands up to size 3 immediately, it would find the black island of size 3 and make it white.

The script assumes that the first command line parameter is the name of a file that numpy can import into an array. The script also assumes that each array element is 0 or 1. The result is written to ‘out.tif’ (overwriting it if it exists).

Note that I don’t normally use Python and the script is not optimized at all. I tried it on a TIFF originating from a scanned A4 page in 300 DPI. It took a while, but the script is already very worth using. Before optimizing it, test cases should be created to detect regressions.

The value 7 is just used during the checking and could be any value of the data type other than 0 or 1.

#! /usr/bin/python
from PIL import Image
import sys
import numpy
imarray = numpy.array(Image.open(sys.argv[1]))
h = imarray.shape[0]
w = imarray.shape[1]
markColor = 7
maxSize = 0

def isIsland(color, i, j):
    global maxSize
    if color != imarray[i, j]:
        return True
    if 0 == maxSize:
        return False
    imarray[i, j] = markColor
    maxSize -= 1
    if 0 < i:
        if 0 < j:
            if not isIsland(color, i - 1, j - 1):
                return False
        if not isIsland(color, i - 1, j):
            return False
        if j + 1 < w:
            if not isIsland(color, i - 1, j + 1):
                return False
    if 0 < j:
        if not isIsland(color, i, j - 1):
            return False
    if j + 1 < w:
        if not isIsland(color, i, j + 1):
            return False
    if i + 1 < h:
        if 0 < j:
            if not isIsland(color, i + 1, j - 1):
                return False
        if not isIsland(color, i + 1, j):
            return False
        if j + 1 < w:
            if not isIsland(color, i + 1, j + 1):
                return False
    return True

def fill(color, i, j):
    if markColor != imarray[i, j]:
        return
    imarray[i, j] = color
    if 0 < i:
        if 0 < j:
            fill(color, i - 1, j - 1)
        fill(color, i - 1, j)
        if j + 1 < w:
            fill(color, i - 1, j + 1)
    if 0 < j:
        fill(color, i, j - 1)
    if j + 1 < w:
        fill(color, i, j + 1)
    if i + 1 < h:
        if 0 < j:
            fill(color, i + 1, j - 1)
        fill(color, i + 1, j)
        if j + 1 < w:
            fill(color, i + 1, j + 1)

for s in [1, 3]:
    for i in range(h):
        for j in range(w):
            islandColor = imarray[i, j];
            maxSize = s
            if isIsland(islandColor, i, j):
                fill((islandColor + 1) % 2, i, j)
            else:
                fill(islandColor, i, j)
Image.fromarray(imarray).save('out.tif')
Answered By: odalman
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.