How to decode a video (memory file / byte string) and step through it frame by frame in python?

Question

I am using python to do some basic image processing, and want to extend it to process a video frame by frame.

I get the video as a blob from a server – .webm encoded – and have it in python as a byte string (b'x1aExdfxa3xa3Bx86x81x01Bxf7x81x01Bxf2x81x04Bxf3x81x08Bx82x88matroskaBx87x81x04Bx85x81x02x18Sx80gx01xffxffxffxffxffxffxffx15Ixa9fx99*xd7xb1x83x0fB@Mx80x86ChromeWAx86Chromex16Txaekxadxaexabxd7x81x01sxc5x87x04xe8xfcx16t^x8cx83x81x01x86x8fV_MPEG4/ISO/AVCxe0x88xb0x82x02x80xbax82x01xe0x1fCxb6ux01xffxffxffxffxffxff ...).

I know that there is cv.VideoCapture, which can do almost what I need. The problem is that I would have to first write the file to disk, and then load it again. It seems much cleaner to wrap the string, e.g., into an IOStream, and feed it to some function that does the decoding.

Is there a clean way to do this in python, or is writing to disk and loading it again the way to go?

Asked By: FirefoxMetzger

||

Source

Answer 1

According to this post, you can’t use cv.VideoCapture for decoding in memory stream.
you may decode the stream by "piping" to FFmpeg.

The solution is a bit complicated, and writing to disk is much simpler, and probably cleaner solution.

I am posting a solution using FFmpeg (and FFprobe).
There are Python bindings for FFmpeg, but the solution is executing FFmpeg as an external application using subprocess module.
(The Python binding is working well with FFmpeg, but piping to FFprobe is not).
I am using Windows 10, and I put ffmpeg.exe and ffprobe.exe in the execution folder (you may set the execution path as well).
For Windows, download the latest (statically liked) stable version.

I created a standalone example that performs the following:

Generate synthetic video, and save it to WebM file (used as input for testing).
Read file into memory as binary data (replace it with your blob from the server).
Pipe the binary stream to FFprobe, for finding the video resolution.
In case the resolution is known from advance, you may skip this part.
Piping to FFprobe makes the solution more complicated than it should have.
Pipe the binary stream to FFmpeg stdin for decoding, and read decoded raw frames from stdout pipe.
Writing to stdin is done in chunks using Python thread.
(The reason for using stdin and stdout instead of named pipes is for Windows compatibility).

Piping architecture:

 --------------------  Encoded      ---------  Decoded      ------------
| Input WebM encoded | data        | ffmpeg  | raw frames  | reshape to |
| stream (VP9 codec) | ----------> | process | ----------> | NumPy array|
 --------------------  stdin PIPE   ---------  stdout PIPE  -------------

Here is the code:

import numpy as np
import cv2
import io
import subprocess as sp
import threading
import json
from functools import partial
import shlex

# Build synthetic video and read binary data into memory (for testing):
#########################################################################
width, height = 640, 480
sp.run(shlex.split('ffmpeg -y -f lavfi -i testsrc=size={}x{}:rate=1 -vcodec vp9 -crf 23 -t 50 test.webm'.format(width, height)))

with open('test.webm', 'rb') as binary_file:
    in_bytes = binary_file.read()
#########################################################################


# https://stackoverflow.com/questions/5911362/pipe-large-amount-of-data-to-stdin-while-using-subprocess-popen/14026178
# https://stackoverflow.com/questions/15599639/what-is-the-perfect-counterpart-in-python-for-while-not-eof
# Write to stdin in chunks of 1024 bytes.
def writer():
    for chunk in iter(partial(stream.read, 1024), b''):
        process.stdin.write(chunk)
    try:
        process.stdin.close()
    except (BrokenPipeError):
        pass  # For unknown reason there is a Broken Pipe Error when executing FFprobe.


# Get resolution of video frames using FFprobe
# (in case resolution is know, skip this part):
################################################################################
# Open In-memory binary streams
stream = io.BytesIO(in_bytes)

process = sp.Popen(shlex.split('ffprobe -v error -i pipe: -select_streams v -print_format json -show_streams'), stdin=sp.PIPE, stdout=sp.PIPE, bufsize=10**8)

pthread = threading.Thread(target=writer)
pthread.start()

pthread.join()

in_bytes = process.stdout.read()

process.wait()

p = json.loads(in_bytes)

width = (p['streams'][0])['width']
height = (p['streams'][0])['height']
################################################################################


# Decoding the video using FFmpeg:
################################################################################
stream.seek(0)

# FFmpeg input PIPE: WebM encoded data as stream of bytes.
# FFmpeg output PIPE: decoded video frames in BGR format.
process = sp.Popen(shlex.split('ffmpeg -i pipe: -f rawvideo -pix_fmt bgr24 -an -sn pipe:'), stdin=sp.PIPE, stdout=sp.PIPE, bufsize=10**8)

thread = threading.Thread(target=writer)
thread.start()


# Read decoded video (frame by frame), and display each frame (using cv2.imshow)
while True:
    # Read raw video frame from stdout as bytes array.
    in_bytes = process.stdout.read(width * height * 3)

    if not in_bytes:
        break  # Break loop if no more bytes.

    # Transform the byte read into a NumPy array
    in_frame = (np.frombuffer(in_bytes, np.uint8).reshape([height, width, 3]))

    # Display the frame (for testing)
    cv2.imshow('in_frame', in_frame)

    if cv2.waitKey(100) & 0xFF == ord('q'):
        break

if not in_bytes:
    # Wait for thread to end only if not exit loop by pressing 'q'
    thread.join()

try:
    process.wait(1)
except (sp.TimeoutExpired):
    process.kill()  # In case 'q' is pressed.
################################################################################

cv2.destroyAllWindows()

Remark:

In case you are getting an error like "file not found: ffmpeg…", try using full path.
For example (in Linux): '/usr/bin/ffmpeg -i pipe: -f rawvideo -pix_fmt bgr24 -an -sn pipe:'

Answered By: Rotem

Answer 2

Two years after Rotem wrote his answer there is now a cleaner / easier way to do this using ImageIO.

Note: Assuming ffmpeg is in your path, you can generate a test video to try this example using: ffmpeg -f lavfi -i testsrc=duration=10:size=1280x720:rate=30 testsrc.webm

import imageio.v3 as iio
from pathlib import Path

webm_bytes = Path("testsrc.webm").read_bytes()

# read all frames from the bytes string
frames = iio.imread(webm_bytes, index=None, format_hint=".webm")
frames.shape
# Output:
#    (300, 720, 1280, 3)

for frame in iio.imiter(webm_bytes, format_hint=".webm"):
    print(frame.shape)

# Output:
#    (720, 1280, 3)
#    (720, 1280, 3)
#    (720, 1280, 3)
#    ...

To use this you’ll need the ffmpeg backend (which implements a solution similar to what Rotem proposed): pip install imageio[ffmpeg]

In response to Rotem’s comment a bit of explanation:

The above snippet uses imageio==2.16.0. The v3 API is an upcoming user-facing API that streamlines reading and writing. The API is available since imageio==2.10.0, however, you will have to use import imageio as iio and use iio.v3.imiter and iio.v3.imread on versions older than 2.16.0.

The ability to read video bytes has existed forever (>5 years and counting) but has (as I am just now realizing) never been documented directly … so I will add a PR for that soon™ 🙂

On older versions (tested on v2.9.0) of ImageIO (v2 API) you can still read video byte strings; however, this is slightly more verbose:

import imageio as iio
import numpy as np
from pathlib import Path

webm_bytes = Path("testsrc.webm").read_bytes()

# read all frames from the bytes string
frames = np.stack(iio.mimread(webm_bytes, format="FFMPEG", memtest=False))

# iterate over frames one by one
reader = iio.get_reader(webm_bytes, format="FFMPEG")
for frame in reader:
    print(frame.shape)
reader.close()

Answered By: FirefoxMetzger

Answer 3

There is a pythonic way to do this by using decord package.

import io
from decord import VideoReader

# This is the bytes object of your video.
video_str 

# Load video
file_obj = io.BytesIO(video_str)
container = decord.VideoReader(file_obj)

# Get the total number of video frames
len(container)
# Access the NDarray of the (i+1)-th frame 
container[i]

You can learn more about decord in decord github repo.

You can learn more about video IO in mmaction repo. See DecordInit for using decord IO.

Answered By: X. Wang

How to decode a video (memory file / byte string) and step through it frame by frame in python?

Question:

Answers: