Programmatically accessing PTS times in MP4 container

Question

Background

For a research project, we are recording video data from two cameras and feed a synchronization pulse directly into the microphone ADC every second.

Problem

We want to derive a frame time stamp in the clock of the pulse source for each camera frame to relate the camera images temporally. With our current methods (see below), we get a frame offset of around 2 frames between the cameras. Unfortunately, inspection of the video shows that we are clearly 6 frames off (at least at one point) between the cameras.
I assume that this is because we are relating audio and video signal wrong (see below).

Approach I think I need help with

I read that in the MP4 container, there should be PTS times for video and audio. How do we access those programmatically. Python would be perfect, but if we have to call ffmpeg via system calls, we may do that too …

What we currently fail with

The original idea was to find video and audio times as

audio_sample_times = range(N_audiosamples)/audio_sampling_rate
video_frame_times = range(N_videoframes)/video_frame_rate

then identify audio_pulse_times in audio_sample_times base, calculate the relative position of each video_time to the audio_pulse_times around it, and select the same relative value to the corresponding source_pulse_times.

However, a first indication that this approach is problematic is already that for some videos, N_audiosamples/audio_sampling_rate differs from N_videoframes/video_frame_rate by multiple frames.

What I have found by now

OpenCV’s cv2.CAP_PROP_POS_MSEC seems to do exactly what we do, and not access any PTS …

Edit: What I took from the winning answer

container = av.open(video_path)
signal = []
audio_sample_times = []
video_sample_times = []

for frame in tqdm(container.decode(video=0, audio=0)):
    if isinstance(frame, av.audio.frame.AudioFrame):
        sample_times = (frame.pts + np.arange(frame.samples)) / frame.sample_rate
        audio_sample_times += list(sample_times)
        signal_f_ch0 = frame.to_ndarray().reshape((-1, len(frame.layout.channels))).T[0]
        signal += list(signal_f_ch0)
    elif isinstance(frame, av.video.frame.VideoFrame):
        video_sample_times.append(float(frame.pts*frame.time_base))

signal = np.abs(np.array(signal))
audio_sample_times = np.array(audio_sample_times)
video_sample_times = np.array(video_sample_times)

Unfortunately, in my particular case, all pts are consecutive and gapless, so the result is the same as with the naive solution …
By picture clues, we identified a section of ~10s in the videos, somewhere in which they desync, but can’t find any traces of that in the data.

Asked By: mcandril

||

Source

Answer 1

You need to run ffprobe to retrieve the PTS times. I don’t know the exact command, but if you’re ok with another package, try ffmpegio:

pip install ffmpegio-core 

// OR 

pip install ffmpegio // if you also want to use it to read video frames & audio samples

If you’re on Windows, see this doc on where ffmpeg.exe can be found automatically.

Then if you can run

import ffmpegio

frames = ffmpegio.probe.frames('video.mp4', intervals=10)

This will return the frames info as a list of dicts of the first 10 packets (of mixed streams in the order of pts). If you remove the intervals argument, it’ll retrieve every frame (will take a long time).

Inspect each dict of frames and decide which entries you need (say ‘media_type’, ‘stream_index’, pts and pts_time). Then add entries argument containing these:

frames = ffmpegio.probe.frames('video.mp4', intervals=10, 
                               entries=['media_type', 'stream_index', 'pts','pts_time'])

Once you’re happy with what it returns, incorporate to your program.

The intervals argument accepts many different formats, please read the doc.

What this or any other FFmpeg-based approach does not offer you is getting this info with the data frames. You need to read in the frame timing data separately and mesh them with the data yourself. If you prefer a solution with more control (but perhaps more coding) look into pyav, which interfaces the underlying library of FFmpeg. I’m fairly certain you can retrieve pts simultaneously with framedata.

Disclaimer: This function has not been tested extensively. So, you may encounter an issue. If you have, please report on GitHub and I’ll fix it asap.

Answered By: kesh