Check if a file type is a media file?

Question:

I am trying to loop through a list of files, and return those files that are media files (images, video, gif, audio, etc.).

Seeing as there are a lot of media types, is there a library or perhaps better way to check this, than listing all types then checking a file against that list?

Here’s what I’m doing so far:

import os
types = [".mp3", ".mpeg", ".gif", ".jpg", ".jpeg"]
files = ["test.mp3", "test.tmp", "filename.mpg", ".AutoConfig"]

media_files = []
for file in files:
    root, extention = os.path.splitext(file)
    print(extention)
    if extention in types:
        media_files.append(file)

print("Found media files are:")
print(media_files)

But note it didn’t include filename.mpg, since I forgot to put .mpg in my types list. (Or, more likely, I didn’t expect that list to include a .mpg file, so didn’t think to list it out.)

Asked By: BruceWayne

||

Answers:

For this purpose you need to get internet media type for file, split it by / character and check if it starts with audio,video,image.

Here is a sample code:

import mimetypes
mimetypes.init()

mimestart = mimetypes.guess_type("test.mp3")[0]

if mimestart != None:
    mimestart = mimestart.split('/')[0]

    if mimestart in ['audio', 'video', 'image']:
        print("media types")

NOTE: This method assume the file type by its extension and don’t open the actual file, it is based only on the file extension.

Creating a module

If you want to create a module that checks if the file is a media file you need to call the init function at the start of the module.

Here is an example of how to create the module:

ismediafile.py

import mimetypes
mimetypes.init()

def isMediaFile(fileName):
    mimestart = mimetypes.guess_type(fileName)[0]

    if mimestart != None:
        mimestart = mimestart.split('/')[0]

        if mimestart in ['audio', 'video', 'image']:
            return True
    
    return False

and there how to use it:

main.py

from ismediafile import isMediaFile

if __name__ == "__main__":
    if isMediaFile("test.mp3"):
        print("Media file")
    else:
        print("not media file")
Answered By: Cpp Forever

There is another method that is based not on the file extension but on the file contents using the media type library pypi.org/project/python-libmagic:

Here is the sample code for this library:

import magic

magic = magic.Magic()
mimestart = magic.from_file("test.mp3").split('/')[0]

if mimestart in ['audio', 'video', 'image']:
    print("media types")

NOTE: for using this code sample you need to install python-libmagic using pip.

Answered By: Cpp Forever

You may list media files as follows:

import os

def lsmedia(mypath):
    img_fm = (".tif", ".tiff", ".jpg", ".jpeg", ".gif", ".png", ".eps", 
          ".raw", ".cr2", ".nef", ".orf", ".sr2", ".bmp", ".ppm", ".heif")
    vid_fm = (".flv", ".avi", ".mp4", ".3gp", ".mov", ".webm", ".ogg", ".qt", ".avchd")
    aud_fm = (".flac", ".mp3", ".wav", ".wma", ".aac")
    media_fms = {"image": img_fm, "video": vid_fm, "audio": aud_fm}

    fns = lambda path, media : [fn for fn in os.listdir(path) if any(fn.lower().endswith(media_fms[media]) for ext in media_fms[media])]
    img_fns, vid_fns, aud_fns = fns(mypath, "image"), fns(mypath, "video"), fns(mypath, "audio")

    print(f"State of media in '{mypath}'")
    print("Images: ", len(img_fns), " | Videos: ", len(vid_fns), "| Audios: ", len(aud_fns))
    
    return (img_fns, vid_fns, aud_fns)

mypath = "/home/DATA_Lia/data_02/sample" # define dir
(imgs, vids, auds) = lsmedia(mypath)

output:

State of media in '/home/DATA_Lia/data_02/sample'
Images:  24  | Videos:  3 | Audios:  5
Answered By: San Askaruly

Another option would be to leverage FFmpeg, which supports most media formats in existence. This can be especially useful when wanting to know more about the media type of each file.

Using the ffprobe-python package (pip install ffprobe-python):

from ffprobe import FFProbe

# try probing the file with ffmpeg
# if no streams are found, it's not in a format that ffmpeg can read
# -> not considered media file
media_files = [file for file in files if len(FFProbe(file).streams)]

This approach may be considerably slower than just reading the file extensions or MIME types, as it may ingest the complete file. On the other hand, it would be possible to have more information on the type of media that is contained, and the metadata.

Selecting only files containing audio:

has_audio = [file for file in files if len(FFProbe(file).audio)]

Similar for images and videos:

has_img_or_vid = [file for file in files if len(FFProbe(file).video)]

Or collecting the codec names:

codecs = {file: [s.codec_name for s in FFProbe(f).streams] for f in files}
Answered By: w-m
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.