Concatenate a video, image and audio using ffmpeg

Question:

I am trying to concatenate a group of images with associated audio with a video clip at the start and front of the video. Whenever I concatenate the image with the associated audio it dosen’t playback correctly in VLC media player and only displays the image for a frame before cutting to black and continually playing audio. I came across this github issue: https://github.com/kkroening/ffmpeg-python/issues/274 where the accepted solution was the one I implemented but one of the comments mentioned this issue of incorrect playback and error on youtube.

'''
Generates a clip from an image and a wav file, helper function for export_video
'''
def generate_clip(img):
    transition_cond = os.path.exists("static/transitions/" + img + ".mp4")
    chart_path = os.path.exists("charts/" + img + ".png")
    if transition_cond:
        clip = ffmpeg.input("static/transitions/" + img + ".mp4")
    elif chart_path:
        clip = ffmpeg.input("charts/" + img + ".png")
    else:
        clip = ffmpeg.input("static/transitions/Transition.jpg")
    audio_clip = ffmpeg.input("audio/" + img + ".wav")
    clip = ffmpeg.concat(clip, audio_clip, v=1, a=1)
    clip = ffmpeg.filter(clip, "setdar","16/9")
    return clip

'''
Combines the charts from charts/ and the audio from audio/ to generate one final video that will be uploaded to Youtube
'''
def export_video(CHARTS):
    clips = []
    intro = generate_clip("Intro")
    clips.append(intro)

    for key in CHARTS.keys():
        value = CHARTS.get(key)
        value.insert(0, key)
        subclip = []
        for img in value:
            subclip.append(generate_clip(img))
        concat_clip = ffmpeg.concat(*subclip)
        clips.append(concat_clip)
    
    outro = generate_clip("Outro")
    clips.append(outro)

    concat_clip = ffmpeg.concat(*clips)
    concat_clip.output("export/export.mp4").run(overwrite_output=True)
Asked By: Santa

||

Answers:

It is unfortunate concat filter does not offer the shortest option like overlay. Anyway, the issue here is that image2 demuxer uses 25 fps by default, so a video stream with one image only lasts for 1/25 seconds long. There are a several ways to address this, but you first need to get the duration of the paired audio files. To incorporate the duration information to the ffmpeg command, you can:

  1. Use tpad filter for each video (in series with setdar) to make the video duration to match the audio. Padded amount should be 1/25 seconds less than the audio duration.
  2. Specify -loop 1 input option so the image will loop (indefinitely) and then specify an additional -t {duration} input option to limit the number of loops. Caution that the video duration may not be exact.
  3. Specify -r {1/duration} so the image will last as long as the audio and use fps filter on each input to the output frame rate.

I’m not familiar with ffmpeg-python so I cannot provide its solution, but if you’re interested, I’d be happy to post an equivalent code with my ffmpegio package.

[edit]
ffmpegio Solution

Here is how I’d code the 3rd solution with ffmpegio:

import ffmpegio

def generate_clip(img):
    """
    Generates a clip from an image and a wav file, 
    helper function for export_video
    """

    transition_cond = path.exists("static/transitions/" + img + ".mp4")

    chart_path = path.exists("charts/" + img + ".png")
    if transition_cond:
        video_file = "static/transitions/" + img + ".mp4"
    elif chart_path:
        video_file = "charts/" + img + ".png"
    else:
        video_file = "static/transitions/Transition.jpg"
    audio_file = "audio/" + img + ".wav"

    video_opts = {}
    if not transition_cond:
        # audio_streams_basic() returns audio duration in seconds as Fraction
        # set the "framerate" of the video to be the reciprocal
        info = ffmpegio.probe.audio_streams_basic(audio_file)
        video_opts["r"] = 1 / info[0]["duration"]

    return [(video_file, video_opts), (audio_file, None)]


def export_video(CHARTS):
    """
    Combines the charts from charts/ and the audio from audio/ 
    to generate one final video that will be uploaded to Youtube
    """
    # get all input files (video/audio pairs)
    clips = [
        generate_clip("Intro"),
        *(generate_clip(img) for key, value in CHARTS.items() for img in value),
        generate_clip("Outro"),
    ]

    # number of clips
    nclips = len(clips)

    # filter chains to set DAR and fps of all video streams
    vfilters = (f"[{2*n}:v]setdar=16/9,fps=30[v{n}]" for n in range(nclips))

    # concatenation filter input: [v0][1:a][v1][3:a][v2][5:a]...
    concatfilter = "".join((f"[v{n}][{2*n+1}:a]" for n in range(nclips))) + f"concat=n={nclips}:v=1:a=1[vout][aout]"

    # form the full filtergraph
    fg = ";".join((*vfilters, concatfilter))

    # set output file and options
    output = ("export/export.mp4", {"map": ["[vout]", "[aout]"]})

    # run ffmpeg
    ffmpegio.ffmpegprocess.run(
        {
            "inputs": [input for pair in clips for input in pair],
            "outputs": [output],
            "global_options": {"filter_complex": fg},
        },
        overwrite=True,
    )

Since this code does not use the read/write features, ffmpegio-core package suffices:

pip install ffmpegio-core

Make sure that FFmpeg binary can be found by ffmpegio. See the installation doc.

Here are the direct links to the documentations of the functions used:

The code has not been fully validated. If you encounter a problem, it might be the easiest to post it on the GitHub Discussions to proceed.

Answered By: kesh
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.