Creating a multidimensional list of similarly named files with different extensions

Question:

I have a directory of files that follows this file naming pattern:

alice_01.mov
alice_01.mp4
alice_02.mp4
bob_01.avi

My goal is to find all files at a given path and create a "multidimensional" list of them where each sublist is the unique name of the file (without extension) and then a list of extensions, like so:

resulting_list = [
    ['alice_01', ['mov','mp4']],
    ['alice_02', ['mp4']],
    ['bob_01', ['avi']]
]

I have gotten this far:

import os

path = "user_files/"

def user_files(path):
    files = []
    for file in os.listdir(path):
        files.append(file)
    return files

file_array = []
for file in user_files(path):
    file_name = file.split(".")[0]
    file_ext = file.split(".")[1]
    if file_name not in (sublist[0] for sublist in file_array):
        file_array.append([file_name,[file_ext]])
    else:
        file_array[file_array.index(file_name)].append([file_name,[file_ext]])

print(file_array)

My problem is in the else condition but I’m struggling to get it right.
Any help is appreciated.

Asked By: equinoxe5

||

Answers:

Here’s how you can do it using a dict to store the results:

filenames = [
    "alice_01.mov",
    "alice_01.mp4",
    "alice_02.mp4",
    "bob_01.avi",
]

file_dict = {}

for file in filenames:
    file_name, file_ext = file.split(".")[0:2]
    file_dict.setdefault(file_name, []).append(file_ext)

print(file_dict)

Result:

{'alice_01': ['mov', 'mp4'], 'alice_02': ['mp4'], 'bob_01': ['avi']}

UPDATE: The code above doesn’t handle special cases, so here’s a slightly more robust version.

from pprint import pprint

filenames = [
    "alice_01.mov",
    "alice_01.mp4",
    "alice_02.mp4",
    "bob_01.avi",
    "john_007.json.xz",
    "john_007.json.txt.xz",
    "john_007.json.txt.zip",
    "tom_and_jerry",
    "tom_and_jerry.dat",
]

file_dict = {}

for file in filenames:
    parts = file.split(".")
    if len(parts) > 1:
        file_name = ".".join(parts[0:-1])
        file_ext = parts[-1]
    else:
        file_name = parts[0]
        file_ext = ""
    file_dict.setdefault(file_name, []).append(file_ext)

pprint(file_dict)

Result:

{'alice_01': ['mov', 'mp4'],
 'alice_02': ['mp4'],
 'bob_01': ['avi'],
 'john_007.json': ['xz'],
 'john_007.json.txt': ['xz', 'zip'],
 'tom_and_jerry': ['', 'dat']}
Answered By: Fractalism
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.