Detecting moving object in moving camera(monitoring one area mounted on a drone)

Question:

def run(self):
    while True:
        _ret, frame = self.cam.read()
        frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        vis = frame.copy()

        if len(self.tracks) > 0:
            img0, img1 = self.prev_gray, frame_gray
            p0 = np.float32([tr[-1] for tr in self.tracks]).reshape(-1, 1, 2)
            p1, _st, _err = cv2.calcOpticalFlowPyrLK(img0, img1, p0, None, **lk_params)
            p0r, _st, _err = cv2.calcOpticalFlowPyrLK(img1, img0, p1, None, **lk_params)
            d = abs(p0-p0r).reshape(-1, 2).max(-1)
            good = d < 1
            new_tracks = []
            for i in range(len(p1)):
                A.append(math.sqrt((p1[i][0][0])**2 + (p1[i][0][1])**2))
            counts,bins,bars = plt.hist(A)

            for tr, (x, y), good_flag in zip(self.tracks, p1.reshape(-1, 2), good):
                if not good_flag:
                    continue
                tr.append((x, y))
                if len(tr) > self.track_len:
                    del tr[0]
                new_tracks.append(tr)
                cv2.circle(vis, (x, y), 2, (0, 255, 0), -1)
            self.tracks = new_tracks
            cv2.polylines(vis, [np.int32(tr) for tr in self.tracks], False, (0, 255, 0))
            draw_str(vis, (20, 20), 'track count: %d' % len(self.tracks))

        if self.frame_idx % self.detect_interval == 0:
            mask = np.zeros_like(frame_gray)
            mask[:] = 255
            for x, y in [np.int32(tr[-1]) for tr in self.tracks]:
                cv2.circle(mask, (x, y), 5, 0, -1)
            p = cv2.goodFeaturesToTrack(frame_gray, mask = mask, **feature_params)
            if p is not None:
                for x, y in np.float32(p).reshape(-1, 2):
                    self.tracks.append([(x, y)])


        self.frame_idx += 1
        self.prev_gray = frame_gray
        cv2.imshow('lk_track', vis)

        ch = cv2.waitKey(1)
        if ch == 27:
            break

i am using lk_track.py from opencv samples to try and detect a moving object. I am trying to find the camera motion using the histogram of magnitude of optical flow vectors and then calculate the average for similar values which should be directly proportional to the camera motion. I have calculated the magnitude of the vectors and saved it in a list A. Can some suggest on how to find highest similar values from it and calculate the average for only those values?

Asked By: Kiran Bayari

||

Answers:

I created a toy problem to model the approach of binarizing the images by optical flow. This is a massively simplified view of the problem, but gives the general idea well. I’ll split the problem up into a few chunks and give functions for them. If you’re working directly with video, there will be a lot of additional code needed of course, and I just hardcoded a lot of values that you’ll need to turn into parameters.

The first function is just for generating the image sequence. The images are moving through a scene with an object moving inside the sequence. The image sequence is just simply translating through the scene, and the object appears stationary in the sequence, but that means that the object is actually moving in the opposite direction of the camera of course.

import numpy as np
import cv2


def gen_seq():
    """Generate motion sequence with an object"""

    scene = cv2.GaussianBlur(np.uint8(255*np.random.rand(400, 500)), (21, 21), 3)

    h, w = 400, 400
    step = 4
    obj_mask = np.zeros((h, w), np.bool)
    obj_h, obj_w = 50, 50
    obj_x, obj_y = 175, 175
    obj_mask[obj_y:obj_y+obj_h, obj_x:obj_x+obj_w] = True
    obj_data = np.uint8(255*np.random.rand(obj_h, obj_w)).ravel()
    imgs = []
    for i in range(0, 1+w//step, step):
        img = scene[:, i:i+w].copy()
        img[obj_mask] = obj_data
        imgs.append(img)

    return imgs

# generate image sequence
imgs = gen_seq()

# display images
for img in imgs:
    cv2.imshow('Image', img)
    k = cv2.waitKey(100) & 0xFF
    if k == ord('q'):
        break
cv2.destroyWindow('Image')

So here’s the basic image sequence visualized. I just used a random scene, translated through, and added a random object in the center.

Generated image sequence with object

Great! Now we need to calculate the flow between each frame. I used dense flow here, but sparse flow would be more robust for actual images.

def find_flows(imgs):
    """Finds the dense optical flows"""

    optflow_params = [0.5, 3, 15, 3, 5, 1.2, 0]
    prev = imgs[0]
    flows = []
    for img in imgs[1:]:
        flow = cv2.calcOpticalFlowFarneback(prev, img, None, *optflow_params)
        flows.append(flow)
        prev = img

    return flows

# find optical flows between images
flows = find_flows(imgs)

# display flows
h, w = imgs[0].shape[:2]
hsv = np.zeros((h, w, 3), dtype=np.uint8)
hsv[..., 1] = 255

for flow in flows:
    mag, ang = cv2.cartToPolar(flow[..., 0], flow[..., 1])
    hsv[..., 0] = ang*180/np.pi/2
    hsv[..., 2] = cv2.normalize(mag, None, 0, 255, cv2.NORM_MINMAX)
    rgb = cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)
    cv2.imshow('Flow', rgb)
    k = cv2.waitKey(100) & 0xFF
    if k == ord('q'):
        break
cv2.destroyWindow('Flow')

Here I colorized the flow based on it’s angle and magnitude. The angle will determine the color and the magnitude will determine the intensity/brightness of the color. This is the same view the OpenCV tutorial on dense optical flow uses.

Optical flow

Then, we need to binarize this flow so that we get two distinct sets of pixels based on how they’re moving. In the sparse case, this works out the same except you will get two distinct sets of features.

def label_flows(flows):
    """Binarizes the flows by direction and magnitude"""

    criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
    flags = cv2.KMEANS_RANDOM_CENTERS
    h, w = flows[0].shape[:2]

    labeled_flows = []
    for flow in flows:
        flow = flow.reshape(h*w, -1)
        comp, labels, centers = cv2.kmeans(flow, 2, None, criteria, 10, flags)
        n = np.sum(labels == 1)
        camera_motion_label = np.argmax([labels.size-n, n])
        labeled = np.uint8(255*(labels.reshape(h, w) == camera_motion_label))
        labeled_flows.append(labeled)
    return labeled_flows

# binarize the flows
labeled_flows = label_flows(flows)

# display binarized flows
for labeled_flow in labeled_flows:
    cv2.imshow('Labeled Flow', labeled_flow)
    k = cv2.waitKey(100) & 0xFF
    if k == ord('q'):
        break
cv2.destroyWindow('Labeled Flow')

The annoying thing here is the labels will be set randomly, i.e. the labels will be different for each frame. If you visualized the binary image, it would flip between black and white randomly. I’m only using binary labels, 0 and 1, so what I did was considered the label that is assigned to more pixels to be the “camera motion label” and then I set that label to be white in the resulting images, and the other label to be black, that way the camera motion label is always the same in each frame. This may need to be much more sophisticated for working on video feed.

Binarized flow

But here we have it, a binarized flow where the color is just showing the two distinct sets of flow vectors.

Now if we wanted to find the target in this flow, we could invert the image and find the connected components of the binary image. The inversion will make the camera motion the background label (0). Then each of the black blobs will be white and will be labeled, and we could find the blob relating to the largest component which, in this case, will be the target. That will give a mask around the target, and we can draw the contours of that mask on the original images to see the target being detected. I’ll also cut the borders of the image off before finding the connected components so edge effects from dense flow are ignored.

def find_target_in_labeled_flow(labeled_flow):

    labeled_flow = cv2.bitwise_not(labeled_flow)
    bw = 10
    h, w = labeled_flow.shape[:2]
    border_cut = labeled_flow[bw:h-bw, bw:w-bw]
    conncomp, stats = cv2.connectedComponentsWithStats(border_cut, connectivity=8)[1:3]
    target_label = np.argmax(stats[1:, cv2.CC_STAT_AREA]) + 1
    img = np.zeros_like(labeled_flow)
    img[bw:h-bw, bw:w-bw] = 255*(conncomp == target_label)
    return img

for labeled_flow, img in zip(labeled_flows, imgs[:-1]):
    target_mask = find_target_in_labeled_flow(labeled_flow)
    display_img = cv2.merge([img, img, img])
    contours = cv2.findContours(target_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[1]
    display_img = cv2.drawContours(display_img, contours, -1, (0, 255, 0), 2)

    cv2.imshow('Detected Target', display_img)
    k = cv2.waitKey(100) & 0xFF
    if k == ord('q'):
        break

And of course this could get some cleaning up, and you won’t be doing exactly this for sparse flow. You could just define a region of interest around the tracked points.

Detected target

Now, there is still a lot of work to do. You have a binarized flow…you can probably assume that the label which occurs most frequently is the camera motion (like I did) safely. However, you’ll have to make sure that the other label is the object you’re interested in tracking. You’ll have to keep track of it between flows so that if it stops moving, you’ll know where it is as the camera is moving. When you do the k-means step, you’ll want to make sure that the centers from k-means are “far enough” apart so that you know the object is moving or not.

The basic steps for that would be, from the starting frame of the video:

  1. If the two centers are “close”, then you can assume your object is either not in the scene or not moving in the scene.
  2. Once the centers are split enough apart, you’ll have found the object to track. Keep track of the location of the object.
  3. During tracking of the object, verify the location is nearby a prediction. You can use the optical flow velocity vectors from the previous frame to predict the location each pixel/feature in the new frame, so make sure your predictions agree with your tracking result.
  4. If the object stops moving, the centers from k-means should be close. Keep track of the optical flow vectors around the object location and follow them to have a prediction of where the object is again once it resumes moving, and again verify the detected location with this prediction.

I’ve never used these methods before so I’m not sure how robust they are. The typical approach for HOOF or “Histogram of oriented optical flow” is much more advanced than this (see the seminal paper here). Instead of just binarizing, the idea is to use histograms from each frame as a probability distribution, and the way this probability distribution changes over time can be analyzed with the tools from time series analysis, which I assume give a more robust framework to this approach.

Answered By: alkasm

with @alkasm’s answer to avoid the following error:

(-215:Assertion failed) npoints > 0 in function 'drawContours'

simply replace:

contours = cv2.findContours(target_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[1]

with

contours, _ = cv2.findContours(target_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

I can’t comment this below as an answer due to new account with low reputation.

Answered By: ChrisJ
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.