os.walk stop looking on subdirectories after first finding
Question:
I need to get the first appearance of the repository.config files in a directory and stop looking in the subdirectories.
Here is my directory tree:
./WAS80/base/disk1/ad/repository.config
./WAS80/base/disk1/md/repository.config
./WAS80/base/disk2/ad/repository.config
./WAS80/base/disk3/ad/repository.config
./WAS80/base/disk4/ad/repository.config
./WAS80/base/repository.config
./WAS80/fixpack/fp5/repository.config
./WAS80/fixpack_suplements/fp5/repository.config
./WAS80/supplements/disk1/ad/repository.config
./WAS80/supplements/disk1/md/repository.config
./WAS80/supplements/disk2/ad/repository.config
./WAS80/supplements/disk3/ad/repository.config
./WAS80/supplements/disk4/ad/repository.config
./WAS80/supplements/repository.config
I need the ones in bold and stop looking in the subdirectories.
I started tinkering with this code, but I couldn’t figure it out.
pattern='repository.config'
path='/opt/was_binaries'
def find_all(name, path):
result = []
for root, dirs, files in os.walk(path):
if name in files:
result.append(os.path.join(root, name))
continue
return result
Answers:
this should do what you want:
import os
res = []
for here, dirs, files in os.walk(startdir, topdown=True):
if 'repository.config' in files:
res.append(os.path.join(here, 'repository.config'))
dirs[:] = []
# dirs.clear() # should also work - not tested...
print(res)
whenever you encounter a 'repository.config'
file, set dirs
to []
in order to prevent os.walk
from descending further into that directory tree.
note: it is vital for this to work to change the dirs
in-place (i.e. dirs[:] = []
) as opposed to rebind it (dirs = []
).,
First, you have to make sure that topdown
is set to True
(this is default) so parent directories are scanned before child directories.
Create an existing
set()
to remember which directories you traversed when successfully found a config file.
Then, when you find your filename in the list:
- check if the directory of the file isn’t a child of a directory you registered
- if it’s not, just note down the path of the file in
existing
(add os.sep
, so you don’t match substrings of directories starting with the current dirname at the same level: ex: pathtodir2
should be scanned even if pathtodir
is already in the set
. But pathtodirsubdir
will be successfully filtered out).
code:
import os
existing = set()
for root,dirs,files in os.walk(path,topdown=True):
if any(root.startswith(r) for r in existing):
# current directory is longest and contains a previously added directory: skip
continue
if "repository.config" in files:
# ok, we note down root dir (+ os.sep to avoid filtering siblings) and print the result
existing.add(root+os.sep)
print(os.path.join(root,"repository.config"))
I need to get the first appearance of the repository.config files in a directory and stop looking in the subdirectories.
Here is my directory tree:
./WAS80/base/disk1/ad/repository.config ./WAS80/base/disk1/md/repository.config ./WAS80/base/disk2/ad/repository.config ./WAS80/base/disk3/ad/repository.config ./WAS80/base/disk4/ad/repository.config ./WAS80/base/repository.config ./WAS80/fixpack/fp5/repository.config ./WAS80/fixpack_suplements/fp5/repository.config ./WAS80/supplements/disk1/ad/repository.config ./WAS80/supplements/disk1/md/repository.config ./WAS80/supplements/disk2/ad/repository.config ./WAS80/supplements/disk3/ad/repository.config ./WAS80/supplements/disk4/ad/repository.config ./WAS80/supplements/repository.config
I need the ones in bold and stop looking in the subdirectories.
I started tinkering with this code, but I couldn’t figure it out.
pattern='repository.config'
path='/opt/was_binaries'
def find_all(name, path):
result = []
for root, dirs, files in os.walk(path):
if name in files:
result.append(os.path.join(root, name))
continue
return result
this should do what you want:
import os
res = []
for here, dirs, files in os.walk(startdir, topdown=True):
if 'repository.config' in files:
res.append(os.path.join(here, 'repository.config'))
dirs[:] = []
# dirs.clear() # should also work - not tested...
print(res)
whenever you encounter a 'repository.config'
file, set dirs
to []
in order to prevent os.walk
from descending further into that directory tree.
note: it is vital for this to work to change the dirs
in-place (i.e. dirs[:] = []
) as opposed to rebind it (dirs = []
).,
First, you have to make sure that topdown
is set to True
(this is default) so parent directories are scanned before child directories.
Create an existing
set()
to remember which directories you traversed when successfully found a config file.
Then, when you find your filename in the list:
- check if the directory of the file isn’t a child of a directory you registered
- if it’s not, just note down the path of the file in
existing
(addos.sep
, so you don’t match substrings of directories starting with the current dirname at the same level: ex:pathtodir2
should be scanned even ifpathtodir
is already in theset
. Butpathtodirsubdir
will be successfully filtered out).
code:
import os
existing = set()
for root,dirs,files in os.walk(path,topdown=True):
if any(root.startswith(r) for r in existing):
# current directory is longest and contains a previously added directory: skip
continue
if "repository.config" in files:
# ok, we note down root dir (+ os.sep to avoid filtering siblings) and print the result
existing.add(root+os.sep)
print(os.path.join(root,"repository.config"))