Python Glob.glob: a wildcard for the number of directories between the root and the destination

Question:

Okay I’m having trouble not only with the problem itself but even with trying to explain my question. I have a directory tree consisting of about 7 iterations, so: rootdir/a/b/c/d/e/f/destinationdir

The thing is some may have 5 subdirectory levels and some may have as many as ten, such as:

rootdir/a/b/c/d/destinationdir

or:

rootdir/a/b/c/d/e/f/g/h/destinationdir

The only thing they have in common is that the destination directory is always named the same thing. The way I’m using the glob function is as follows:

for path in glob.glob('/rootdir/*/*/*/*/*/*/destinationdir'):
--- os.system('cd {0}; do whatever'.format(path))

However, this only works for the directories with that precise number of intermediate subdirectories. Is there any way for me not to have to specify that number of subdirectories(asterices); in other words having the function arrive at the destinationdir no matter what the number of intermediate subdirectories is, and allowing me to iterate through them. Thanks a lot!

Asked By: Christopher Haddad

||

Answers:

You can create a pattern for each level of indentation (increase 10 if needed):

for i in xrange(10):
    pattern = '/rootdir/' + ('*/' * i) + 'destinationdir'
    for path in glob.glob(pattern):
        os.system('cd {0}; do whatever'.format(path))

This will iterate over:

'/rootdir/destinationdir'
'/rootdir/*/destinationdir'
'/rootdir/*/*/destinationdir'
'/rootdir/*/*/*/destinationdir'
'/rootdir/*/*/*/*/destinationdir'
'/rootdir/*/*/*/*/*/destinationdir'
'/rootdir/*/*/*/*/*/*/destinationdir'
'/rootdir/*/*/*/*/*/*/*/destinationdir'
'/rootdir/*/*/*/*/*/*/*/*/destinationdir'
'/rootdir/*/*/*/*/*/*/*/*/*/destinationdir'

If you have to iterate over directories with arbitrary depth then I suggest dividing the algorithm in two steps: one phase where you investigate where all ‘destinationdir’ directories are located and a second phase where you perform your operations.

Answered By: Simeon Visser

This looks much easier to accomplish with a more versatile tool, like the find command (your os.system call indicates you’re on a unix-like system, so this will work).

os.system('find /rootdir -mindepth 5 -maxdepth 10 -type d -name destinationdir | while read d; do ( cd $d && do whatever; ); done')

..Note that if you are going to put any user-supplied string into that command, this becomes drastically unsafe, and you should use subprocess.Popen instead, executing the shell and splitting the arguments yourself. It’s safe as shown, though.

Answered By: the paul

I think this could be done more easily with os.walk:

def find_files(root,filename):
    for directory,subdirs,files in os.walk(root):
        if filename in files:
            yield os.join(root,directory,filename)

Of course, this doesn’t allow you to have a glob expression in the filename portion, but you could check that stuff using regex or fnmatch.

EDIT

Or to find a directory:

def find_files(root,d):
    for directory,subdirs,files in os.walk(root):
        if d in subdirs:
            yield os.join(root,directory,d)
Answered By: mgilson

If you are looking for files, you can use the Formic package (disclosure: I wrote it) – this implements Apache Ant’s FileSet Globs with the ‘**’ wildcard:

import formic
fileset = formic.FileSet(include="rootdir/**/destinationdir/*")

for file_name in fileset:
    # Do something with file_name
Answered By: Andrew Alcock

Python 3 glob.glob now accepts double wildcards to designate any number of intermediate directories, as long as you also pass recursive=True:

>>> import glob
>>> glob.glob('**/*.txt', recursive=True)
['1.txt', 'foo/2.txt', 'foo/bar/3.txt', 'foo/bar/baz/4.txt']
Answered By: Tosha

Here’s a better solution that allows traversing a theoretically unlimited number of directories until a file is found through recursion:

def find_file(root, filename):
"""
Recursively search for a file with the given name starting from the
specified root directory.
"""
# Check the root directory for the file
if filename in os.listdir(root):
    return os.path.join(root, filename)

# Search the subdirectories
for dirname in os.listdir(root):
    path = os.path.join(root, dirname)
    if os.path.isdir(path):
        result = find_file(path, filename)
        if result is not None:
            return result

Author: ChatGPT, prompt ("in python search for a file in a directory and traverse as many as directories until the file is found")

Answered By: whiletrue
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.