Python Glob.glob: a wildcard for the number of directories between the root and the destination
Question:
Okay I’m having trouble not only with the problem itself but even with trying to explain my question. I have a directory tree consisting of about 7 iterations, so: rootdir/a/b/c/d/e/f/destinationdir
The thing is some may have 5 subdirectory levels and some may have as many as ten, such as:
rootdir/a/b/c/d/destinationdir
or:
rootdir/a/b/c/d/e/f/g/h/destinationdir
The only thing they have in common is that the destination directory is always named the same thing. The way I’m using the glob function is as follows:
for path in glob.glob('/rootdir/*/*/*/*/*/*/destinationdir'):
--- os.system('cd {0}; do whatever'.format(path))
However, this only works for the directories with that precise number of intermediate subdirectories. Is there any way for me not to have to specify that number of subdirectories(asterices)
; in other words having the function arrive at the destinationdir no matter what the number of intermediate subdirectories is, and allowing me to iterate through them. Thanks a lot!
Answers:
You can create a pattern for each level of indentation (increase 10
if needed):
for i in xrange(10):
pattern = '/rootdir/' + ('*/' * i) + 'destinationdir'
for path in glob.glob(pattern):
os.system('cd {0}; do whatever'.format(path))
This will iterate over:
'/rootdir/destinationdir'
'/rootdir/*/destinationdir'
'/rootdir/*/*/destinationdir'
'/rootdir/*/*/*/destinationdir'
'/rootdir/*/*/*/*/destinationdir'
'/rootdir/*/*/*/*/*/destinationdir'
'/rootdir/*/*/*/*/*/*/destinationdir'
'/rootdir/*/*/*/*/*/*/*/destinationdir'
'/rootdir/*/*/*/*/*/*/*/*/destinationdir'
'/rootdir/*/*/*/*/*/*/*/*/*/destinationdir'
If you have to iterate over directories with arbitrary depth then I suggest dividing the algorithm in two steps: one phase where you investigate where all ‘destinationdir’ directories are located and a second phase where you perform your operations.
This looks much easier to accomplish with a more versatile tool, like the find
command (your os.system
call indicates you’re on a unix-like system, so this will work).
os.system('find /rootdir -mindepth 5 -maxdepth 10 -type d -name destinationdir | while read d; do ( cd $d && do whatever; ); done')
..Note that if you are going to put any user-supplied string into that command, this becomes drastically unsafe, and you should use subprocess.Popen instead, executing the shell and splitting the arguments yourself. It’s safe as shown, though.
I think this could be done more easily with os.walk
:
def find_files(root,filename):
for directory,subdirs,files in os.walk(root):
if filename in files:
yield os.join(root,directory,filename)
Of course, this doesn’t allow you to have a glob expression in the filename portion, but you could check that stuff using regex or fnmatch.
EDIT
Or to find a directory:
def find_files(root,d):
for directory,subdirs,files in os.walk(root):
if d in subdirs:
yield os.join(root,directory,d)
If you are looking for files, you can use the Formic package (disclosure: I wrote it) – this implements Apache Ant’s FileSet Globs with the ‘**’ wildcard:
import formic
fileset = formic.FileSet(include="rootdir/**/destinationdir/*")
for file_name in fileset:
# Do something with file_name
Python 3 glob.glob
now accepts double wildcards to designate any number of intermediate directories, as long as you also pass recursive=True
:
>>> import glob
>>> glob.glob('**/*.txt', recursive=True)
['1.txt', 'foo/2.txt', 'foo/bar/3.txt', 'foo/bar/baz/4.txt']
Here’s a better solution that allows traversing a theoretically unlimited number of directories until a file is found through recursion:
def find_file(root, filename):
"""
Recursively search for a file with the given name starting from the
specified root directory.
"""
# Check the root directory for the file
if filename in os.listdir(root):
return os.path.join(root, filename)
# Search the subdirectories
for dirname in os.listdir(root):
path = os.path.join(root, dirname)
if os.path.isdir(path):
result = find_file(path, filename)
if result is not None:
return result
Author: ChatGPT, prompt ("in python search for a file in a directory and traverse as many as directories until the file is found")
Okay I’m having trouble not only with the problem itself but even with trying to explain my question. I have a directory tree consisting of about 7 iterations, so: rootdir/a/b/c/d/e/f/destinationdir
The thing is some may have 5 subdirectory levels and some may have as many as ten, such as:
rootdir/a/b/c/d/destinationdir
or:
rootdir/a/b/c/d/e/f/g/h/destinationdir
The only thing they have in common is that the destination directory is always named the same thing. The way I’m using the glob function is as follows:
for path in glob.glob('/rootdir/*/*/*/*/*/*/destinationdir'):
--- os.system('cd {0}; do whatever'.format(path))
However, this only works for the directories with that precise number of intermediate subdirectories. Is there any way for me not to have to specify that number of subdirectories(asterices)
; in other words having the function arrive at the destinationdir no matter what the number of intermediate subdirectories is, and allowing me to iterate through them. Thanks a lot!
You can create a pattern for each level of indentation (increase 10
if needed):
for i in xrange(10):
pattern = '/rootdir/' + ('*/' * i) + 'destinationdir'
for path in glob.glob(pattern):
os.system('cd {0}; do whatever'.format(path))
This will iterate over:
'/rootdir/destinationdir'
'/rootdir/*/destinationdir'
'/rootdir/*/*/destinationdir'
'/rootdir/*/*/*/destinationdir'
'/rootdir/*/*/*/*/destinationdir'
'/rootdir/*/*/*/*/*/destinationdir'
'/rootdir/*/*/*/*/*/*/destinationdir'
'/rootdir/*/*/*/*/*/*/*/destinationdir'
'/rootdir/*/*/*/*/*/*/*/*/destinationdir'
'/rootdir/*/*/*/*/*/*/*/*/*/destinationdir'
If you have to iterate over directories with arbitrary depth then I suggest dividing the algorithm in two steps: one phase where you investigate where all ‘destinationdir’ directories are located and a second phase where you perform your operations.
This looks much easier to accomplish with a more versatile tool, like the find
command (your os.system
call indicates you’re on a unix-like system, so this will work).
os.system('find /rootdir -mindepth 5 -maxdepth 10 -type d -name destinationdir | while read d; do ( cd $d && do whatever; ); done')
..Note that if you are going to put any user-supplied string into that command, this becomes drastically unsafe, and you should use subprocess.Popen instead, executing the shell and splitting the arguments yourself. It’s safe as shown, though.
I think this could be done more easily with os.walk
:
def find_files(root,filename):
for directory,subdirs,files in os.walk(root):
if filename in files:
yield os.join(root,directory,filename)
Of course, this doesn’t allow you to have a glob expression in the filename portion, but you could check that stuff using regex or fnmatch.
EDIT
Or to find a directory:
def find_files(root,d):
for directory,subdirs,files in os.walk(root):
if d in subdirs:
yield os.join(root,directory,d)
If you are looking for files, you can use the Formic package (disclosure: I wrote it) – this implements Apache Ant’s FileSet Globs with the ‘**’ wildcard:
import formic
fileset = formic.FileSet(include="rootdir/**/destinationdir/*")
for file_name in fileset:
# Do something with file_name
Python 3 glob.glob
now accepts double wildcards to designate any number of intermediate directories, as long as you also pass recursive=True
:
>>> import glob
>>> glob.glob('**/*.txt', recursive=True)
['1.txt', 'foo/2.txt', 'foo/bar/3.txt', 'foo/bar/baz/4.txt']
Here’s a better solution that allows traversing a theoretically unlimited number of directories until a file is found through recursion:
def find_file(root, filename):
"""
Recursively search for a file with the given name starting from the
specified root directory.
"""
# Check the root directory for the file
if filename in os.listdir(root):
return os.path.join(root, filename)
# Search the subdirectories
for dirname in os.listdir(root):
path = os.path.join(root, dirname)
if os.path.isdir(path):
result = find_file(path, filename)
if result is not None:
return result
Author: ChatGPT, prompt ("in python search for a file in a directory and traverse as many as directories until the file is found")