Python list comprehension for subdirectories within numerical range

Question:

I have a set of directories which are named

my/directory/and/its/subdirectories0,
my/directory/and/its/subdirectories1,
...,
my/directory/and/its/subdirectoriesN

N could be anything from a 1 digit number to a 3 digit number. These directories contain some data which is usually called MCData.root, but occasionally called something different like certain_conditions_MC1.root.

I want to be able to access a certain subset of these .root files in a python script, where I can decide the number to access at a certain time. I was therefore thinking of using a list comprehension, something like

DataAddress = my/directory/and/its/subdirectories*/
FileSelection = [myfile in glob.glob(DataAddress+"*MC*.root") if DataAddressFinalNumber > number1 and DataAddressFinalNumber < number2]

If I assume that all of the data files are called MCData.root then I suppose I could split the string at the end of subdirectories and at MCData.root, and then take whatever came before the / in the central split part, but first this is going to become a very ugly list comprehension and secondly if I want to use data that isn’t called MCData.root I’ll have to manually change the script and remember to change it back again. Not a big deal, but not an elegant solution.

Is there an intelligent way of doing this operation?

Asked By: Beth Long

||

Answers:

Edited/refactored based on comments


Assign the different parts of the file paths to variables/names and construct the paths from them. Will allow you to make paths on the fly. pathlib.Path is not actually necessary, this could all be done with string formatting.

from pathlib import Path

base = 'my/directory/and/its'
dir_prefix = 'subdirectories'
filename = 'q.txt'
n_s = [1,2,6]

dirs = [dir_prefix + str(n) for n in n_s]
filepaths = [Path(base,d,filename) for d in dirs]

Using os.path to construct the filepaths:

import os.path
base = (r'my/directory/and/its/subdirectories')
numbers = [1,201,3]
f = 'q.txt'
directories = [base+str(n) for n in numbers]
fpaths = [os.path.join(d,f) for d in directories]

First version.

  • Don’t use a list comprehension if it is too messy.
  • use pathlib.Path.glob to take advantage of its pattern matching
  • create the pattern dynamically using string formatting.

The text files in the folders just have the folder name…

from pathlib import Path

name = 'q.txt'
n_s = [1,2,6]
pat = rf'pyProjectsa{n_s}{name}'

for f in Path('..').glob(pat):
    print(f, end='t')
    with open(f) as f:
        print(f.read())

..pyprojectsa1q.txt  a1
..pyprojectsa2q.txt  a2
..pyprojectsa6q.txt  a6

Make it a function with root, directory-number, and file name/pattern parameters so it can be reused. The function could make use of other nice pathlib features.

Answered By: wwii