Picking up files with specific names in Python

Question:

I’m designing a tool that should only pick up EXR image files from the input folder that follow the naming convention: u#_v#.exr or u#v##.exr (where # denotes whole numbers or positive non-zero integers). All other files should be ignored. My working code is given below. However, is there a better or more efficient way to do this?

def main():
    # Add and read command line arguments
    parser = argparse.ArgumentParser()
    parser.add_argument('--input_folder', type=str, help='Directory where input images are located')
    parser.add_argument('--output_folder', type=str, help='Directory where output image should be written')
    args = parser.parse_args()

    # Change directory to input folder and check all filenames belonging to our convention
    os.chdir(args.input_folder)
    all_files = check_combinatons_of_numeric_characters('u_v.exr', 'u_v_.exr')
    print(all_files)


def check_combinatons_of_numeric_characters(convention1, convention2):
    # Combinations for first convention which was supplied
    split_convention1 = convention1.split('_')
    convention1_combinations_alpha = np.array([])

    convention1_combination1 = glob.glob(split_convention1[0] + '[0-9]_' +
                                         split_convention1[1].split('.')[0] + '[0-9].' +
                                         split_convention1[1].split('.')[1]
                                         )

    convention1_combination2 = glob.glob(split_convention1[0] + '[0-9][0-9]_' +
                                         split_convention1[1].split('.')[0] + '[0-9][0-9].' +
                                         split_convention1[1].split('.')[1]
                                         )

    convention1_combination3 = glob.glob(split_convention1[0] + '[0-9][0-9]_' +
                                         split_convention1[1].split('.')[0] + '[0-9].' +
                                         split_convention1[1].split('.')[1]
                                         )

    convention1_combination4 = glob.glob(split_convention1[0] + '[0-9]_' +
                                         split_convention1[1].split('.')[0] + '[0-9][0-9].' +
                                         split_convention1[1].split('.')[1]
                                         )

    convention1_combinations_alpha = np.concatenate((convention1_combination1,
                                                     convention1_combination2,
                                                     convention1_combination3,
                                                     convention1_combination4),
                                                    )

    # Combinations for second convention supplied
    split_convention2 = convention2.split('_')
    convention2_combinations_alpha = np.array([])

    convention2_combination1 = glob.glob(split_convention2[0] + '[0-9]_'+
                                         split_convention2[1] + '[0-9]_[0-9]' +
                                         split_convention2[2]
                                         )
    convention2_combination2 = glob.glob(split_convention2[0] + '[0-9][0-9]_' +
                                         split_convention2[1] + '[0-9]_[0-9]' +
                                         split_convention2[2]
                                         )
    convention2_combination3 = glob.glob(split_convention2[0] + '[0-9]_' +
                                         split_convention2[1] + '[0-9]_[0-9][0-9]' +
                                         split_convention2[2]
                                         )
    convention2_combination4 = glob.glob(split_convention2[0] + '[0-9][0-9]_' +
                                         split_convention2[1] + '[0-9][0-9]_[0-9]' +
                                         split_convention2[2]
                                         )
    convention2_combination5 = glob.glob(split_convention2[0] + '[0-9]_' +
                                         split_convention2[1] + '[0-9][0-9]_[0-9][0-9]' +
                                         split_convention2[2]
                                         )
    convention2_combination6 = glob.glob(split_convention2[0] + '[0-9][0-9]_' +
                                         split_convention2[1] + '[0-9]_[0-9][0-9]' +
                                         split_convention2[2]
                                         )

    convention2_combinations_alpha = np.concatenate((convention2_combination1,
                                                     convention2_combination2,
                                                     convention2_combination3,
                                                     convention2_combination4,
                                                     convention2_combination5,
                                                     convention2_combination6),
                                                    )

    list_of_files = np.concatenate((convention1_combinations_alpha, convention2_combinations_alpha))

    return list_of_files

if __name__ == '__main__':
    main()
Asked By: Azzam

||

Answers:

I would simply match all *.exr files and then skip the ones which don’t follow the pattern.

import glob
import re

list_of_files = [file for file in glob.glob('*.exr')
                 if re.match(r'^ud{1,2}_vd{1,2}.exr$', file)]

The regex will need to use (?!0)d{1,2} instead of d{1,2} (in both places) if you strictly need to exclude zeros, too; or (?!0D)d{1,2} if you want to permit leading zeros but not a zero followed by a non-digit.

In some more detail, d matches a digit, {1,2} says between one and two occurrences of the previous expression, and D matches a character which is not a digit. (?!something) is a negative lookahead which prevents a match if the text at this point matches the regular expression something. . matches a literal dot, ^ matches the beginning of the file name, and $ the end; most other characters simply match themselves. For a more detailed exposition, review the documentation for the Python re module and/or the beginner resources on the Stack Overflow regex tag info page.

Convert the resulting list to a data frame at your leisure.

Answered By: tripleee