What does it really mean real time object detection?

Question:

So here is the context.

I created an script in python, YOLOv4, OpenCV, CUDA and CUDNN, for object detection and object tracking to count the objects in a video. I intend to use it in real time, but what real time really means? The video I’m using is 1min long and 60FPS originally, but the video after processing is 30FPS on average and takes 3mins to finish. So comparing both videos side by side, one is clearly faster. 30FPS is industry standard for movies and stuff. I’m trying to wrap my head around what real time truly means.

Imagine I need to use this information for traffic lights management or use this to lift a bridge for a passing boat, it should be done automatically. It’s time sensitive or the chaos would be visible. In these cases, what it trully means to be real time?

Asked By: Ivan Trigueiro

||

Answers:

Real-time refers to the fact that a system is able to process and respond to data as it is received, without any significant delay. In the context of your object detection and tracking script, real-time would mean that the system is able to process and respond to new frames of the video as they are received, without a significant delay. This would allow the system to accurately count the objects in the video in near-real-time as the video is being played.

In the case of traffic lights management or lifting a bridge for a passing boat, real-time would mean that the system is able to quickly and accurately process data from sensors and other sources, and use that information to make decisions and take actions in a timely manner. This is important in these scenarios because any significant delay in processing and responding to data could have serious consequences, such as traffic accidents or collisions.

Overall, real-time systems are designed to process and respond to data quickly and accurately, in order to support time-sensitive applications and scenarios.

Answered By: Mohsin Mehmood

First, learn what "real-time" means. Wikipedia: https://en.wikipedia.org/wiki/Real-time_computing

Understand the terms "hard" and "soft" real-time. Understand which aspects of your environment are soft and which require hard real-time.

Understand the response times that your environment requires. Understand the time scales.

This does not involve fuzzy terms like "quick" or "significant" or "accurate". It involves actual quantifiable time spans that depend on your task and its environment, acceptable error rates, …

You did not share any details about your environment. I find it unlikely that you even need 30 fps for any application involving a road intersection.

You only need enough frame rate so you don’t miss objects of interest, and you have fine enough data to track multiple objects with identity without mistaking them for each other.

Example: assume a car moving at 200 km/h. If your camera takes a frame every 1/30 second, the car moves 1.85 meters between frames.

  • How’s your motion blur? What’s the camera’s exposure time? I’d recommend something on the order of a millisecond or better, giving motion blur of 0.05m
  • How’s your tracking? Can it deal with objects "jumping" that far between frames? Does it generate object identity information that is usable for matching (association)?
Answered By: Christoph Rackwitz