Check if a file type is a media file?
Question:
I am trying to loop through a list of files, and return those files that are media files (images, video, gif, audio, etc.).
Seeing as there are a lot of media types, is there a library or perhaps better way to check this, than listing all types then checking a file against that list?
Here’s what I’m doing so far:
import os
types = [".mp3", ".mpeg", ".gif", ".jpg", ".jpeg"]
files = ["test.mp3", "test.tmp", "filename.mpg", ".AutoConfig"]
media_files = []
for file in files:
root, extention = os.path.splitext(file)
print(extention)
if extention in types:
media_files.append(file)
print("Found media files are:")
print(media_files)
But note it didn’t include filename.mpg
, since I forgot to put .mpg
in my types
list. (Or, more likely, I didn’t expect that list to include a .mpg
file, so didn’t think to list it out.)
Answers:
For this purpose you need to get internet media type for file, split it by / character and check if it starts with audio,video,image.
Here is a sample code:
import mimetypes
mimetypes.init()
mimestart = mimetypes.guess_type("test.mp3")[0]
if mimestart != None:
mimestart = mimestart.split('/')[0]
if mimestart in ['audio', 'video', 'image']:
print("media types")
NOTE: This method assume the file type by its extension and don’t open the actual file, it is based only on the file extension.
Creating a module
If you want to create a module that checks if the file is a media file you need to call the init function at the start of the module.
Here is an example of how to create the module:
ismediafile.py
import mimetypes
mimetypes.init()
def isMediaFile(fileName):
mimestart = mimetypes.guess_type(fileName)[0]
if mimestart != None:
mimestart = mimestart.split('/')[0]
if mimestart in ['audio', 'video', 'image']:
return True
return False
and there how to use it:
main.py
from ismediafile import isMediaFile
if __name__ == "__main__":
if isMediaFile("test.mp3"):
print("Media file")
else:
print("not media file")
There is another method that is based not on the file extension but on the file contents using the media type library pypi.org/project/python-libmagic:
Here is the sample code for this library:
import magic
magic = magic.Magic()
mimestart = magic.from_file("test.mp3").split('/')[0]
if mimestart in ['audio', 'video', 'image']:
print("media types")
NOTE: for using this code sample you need to install python-libmagic using pip.
You may list media files as follows:
import os
def lsmedia(mypath):
img_fm = (".tif", ".tiff", ".jpg", ".jpeg", ".gif", ".png", ".eps",
".raw", ".cr2", ".nef", ".orf", ".sr2", ".bmp", ".ppm", ".heif")
vid_fm = (".flv", ".avi", ".mp4", ".3gp", ".mov", ".webm", ".ogg", ".qt", ".avchd")
aud_fm = (".flac", ".mp3", ".wav", ".wma", ".aac")
media_fms = {"image": img_fm, "video": vid_fm, "audio": aud_fm}
fns = lambda path, media : [fn for fn in os.listdir(path) if any(fn.lower().endswith(media_fms[media]) for ext in media_fms[media])]
img_fns, vid_fns, aud_fns = fns(mypath, "image"), fns(mypath, "video"), fns(mypath, "audio")
print(f"State of media in '{mypath}'")
print("Images: ", len(img_fns), " | Videos: ", len(vid_fns), "| Audios: ", len(aud_fns))
return (img_fns, vid_fns, aud_fns)
mypath = "/home/DATA_Lia/data_02/sample" # define dir
(imgs, vids, auds) = lsmedia(mypath)
output:
State of media in '/home/DATA_Lia/data_02/sample'
Images: 24 | Videos: 3 | Audios: 5
Another option would be to leverage FFmpeg, which supports most media formats in existence. This can be especially useful when wanting to know more about the media type of each file.
Using the ffprobe-python package (pip install ffprobe-python
):
from ffprobe import FFProbe
# try probing the file with ffmpeg
# if no streams are found, it's not in a format that ffmpeg can read
# -> not considered media file
media_files = [file for file in files if len(FFProbe(file).streams)]
This approach may be considerably slower than just reading the file extensions or MIME types, as it may ingest the complete file. On the other hand, it would be possible to have more information on the type of media that is contained, and the metadata.
Selecting only files containing audio:
has_audio = [file for file in files if len(FFProbe(file).audio)]
Similar for images and videos:
has_img_or_vid = [file for file in files if len(FFProbe(file).video)]
Or collecting the codec names:
codecs = {file: [s.codec_name for s in FFProbe(f).streams] for f in files}
I am trying to loop through a list of files, and return those files that are media files (images, video, gif, audio, etc.).
Seeing as there are a lot of media types, is there a library or perhaps better way to check this, than listing all types then checking a file against that list?
Here’s what I’m doing so far:
import os
types = [".mp3", ".mpeg", ".gif", ".jpg", ".jpeg"]
files = ["test.mp3", "test.tmp", "filename.mpg", ".AutoConfig"]
media_files = []
for file in files:
root, extention = os.path.splitext(file)
print(extention)
if extention in types:
media_files.append(file)
print("Found media files are:")
print(media_files)
But note it didn’t include filename.mpg
, since I forgot to put .mpg
in my types
list. (Or, more likely, I didn’t expect that list to include a .mpg
file, so didn’t think to list it out.)
For this purpose you need to get internet media type for file, split it by / character and check if it starts with audio,video,image.
Here is a sample code:
import mimetypes
mimetypes.init()
mimestart = mimetypes.guess_type("test.mp3")[0]
if mimestart != None:
mimestart = mimestart.split('/')[0]
if mimestart in ['audio', 'video', 'image']:
print("media types")
NOTE: This method assume the file type by its extension and don’t open the actual file, it is based only on the file extension.
Creating a module
If you want to create a module that checks if the file is a media file you need to call the init function at the start of the module.
Here is an example of how to create the module:
ismediafile.py
import mimetypes
mimetypes.init()
def isMediaFile(fileName):
mimestart = mimetypes.guess_type(fileName)[0]
if mimestart != None:
mimestart = mimestart.split('/')[0]
if mimestart in ['audio', 'video', 'image']:
return True
return False
and there how to use it:
main.py
from ismediafile import isMediaFile
if __name__ == "__main__":
if isMediaFile("test.mp3"):
print("Media file")
else:
print("not media file")
There is another method that is based not on the file extension but on the file contents using the media type library pypi.org/project/python-libmagic:
Here is the sample code for this library:
import magic
magic = magic.Magic()
mimestart = magic.from_file("test.mp3").split('/')[0]
if mimestart in ['audio', 'video', 'image']:
print("media types")
NOTE: for using this code sample you need to install python-libmagic using pip.
You may list media files as follows:
import os
def lsmedia(mypath):
img_fm = (".tif", ".tiff", ".jpg", ".jpeg", ".gif", ".png", ".eps",
".raw", ".cr2", ".nef", ".orf", ".sr2", ".bmp", ".ppm", ".heif")
vid_fm = (".flv", ".avi", ".mp4", ".3gp", ".mov", ".webm", ".ogg", ".qt", ".avchd")
aud_fm = (".flac", ".mp3", ".wav", ".wma", ".aac")
media_fms = {"image": img_fm, "video": vid_fm, "audio": aud_fm}
fns = lambda path, media : [fn for fn in os.listdir(path) if any(fn.lower().endswith(media_fms[media]) for ext in media_fms[media])]
img_fns, vid_fns, aud_fns = fns(mypath, "image"), fns(mypath, "video"), fns(mypath, "audio")
print(f"State of media in '{mypath}'")
print("Images: ", len(img_fns), " | Videos: ", len(vid_fns), "| Audios: ", len(aud_fns))
return (img_fns, vid_fns, aud_fns)
mypath = "/home/DATA_Lia/data_02/sample" # define dir
(imgs, vids, auds) = lsmedia(mypath)
output:
State of media in '/home/DATA_Lia/data_02/sample'
Images: 24 | Videos: 3 | Audios: 5
Another option would be to leverage FFmpeg, which supports most media formats in existence. This can be especially useful when wanting to know more about the media type of each file.
Using the ffprobe-python package (pip install ffprobe-python
):
from ffprobe import FFProbe
# try probing the file with ffmpeg
# if no streams are found, it's not in a format that ffmpeg can read
# -> not considered media file
media_files = [file for file in files if len(FFProbe(file).streams)]
This approach may be considerably slower than just reading the file extensions or MIME types, as it may ingest the complete file. On the other hand, it would be possible to have more information on the type of media that is contained, and the metadata.
Selecting only files containing audio:
has_audio = [file for file in files if len(FFProbe(file).audio)]
Similar for images and videos:
has_img_or_vid = [file for file in files if len(FFProbe(file).video)]
Or collecting the codec names:
codecs = {file: [s.codec_name for s in FFProbe(f).streams] for f in files}