Search multiple patterns using glob only once

Question:

I want to use glob function in order to find files located in folders corresponding to two different type of folder names.

The solution I found would be simply:

import glob
files1 = glob.glob('*type1*/*')
files2 = glob.glob('*type2*/*')
files = files1 + files2

Is there any way of rewritting this using only one glob? If yes, would it be faster?

Something like

files = glob.glob('*[type1, type2]*/*') 
Asked By: prody

||

Answers:

glob understands shell-style path globbing, so you can simply do:

files1 = glob.glob('*type[12]*/*')

or if you needed to expand to more numbers, something like this (for 1 through 6):

files1 = glob.glob('*type[1-6]*/*')

It will be faster to only call glob() once, because glob() will have to make multiple reads of the current directory and each subdirectory of the current directory (on a Unix system, this is readdir() function) and those will be repeated for each call to glob(). The directory contents might be cached by the OS, so it doesn’t have to be read from disk, but the call still has to be repeated and glob() has to compare all of the filenames against the glob pattern.

That said, practically speaking, the performance difference isn’t likely to be noticeable unless you have thousands of files and subdirectories.

Answered By: sj95126

This might help.
I needed to exclude files that didn’t necessarily all match one pattern. I have been ‘forcing’ myself to use Kotlin instead of Java, and I have just discovered ‘globbing’. Hence, I wrote the following:

    /**
     *This method lists the contents of a file system directory
     *
     * @param path is either the directory to be listed or a file in said directory
     * @param secondsAge is the number of seconds that a file may live in the provided path. null means infinite
     * @param globExclude is a list of name patterns to protect from deletion (a 'glob')
     */
    @JvmStatic
    fun listDirectory(path: Path, secondsAge: Long?, globExclude: List<String>) {
        var listFiles: MutableList<Path> = ArrayList<Path>()
        var pathDir: Path
        val timeCutoff: Instant
        var matcher: PathMatcher

        //List the regular files in either the provided Path (if it is a directory), or in it''s parent directory if it is a file
        if (path.isDirectory()) {
            pathDir = path
        } else {
            pathDir = path.parent
        }
        //List existing files in the folder
        Log.e("CheckListDB.listDirectory()", "Listing of $pathDir")
        pathDir.forEachDirectoryEntry {
            if (it != null && it.isRegularFile()) {
                Log.e("CheckListDB.listDirectory()", "*t$it, size=${it.fileSize()}, " +
                        "last mod=${it.getLastModifiedTime()}")
            }
        }
        //Clean up old files
        if (secondsAge != null) {
            timeCutoff = Instant.now().minusSeconds(secondsAge)
            Log.e("CheckListDB.listDirectory()", "Cut off time is: $timeCutoff")
        //Exclude all that match a pattern (apply the globbing patterns one at a time)
            globExclude.forEach {
                matcher = FileSystems.getDefault().getPathMatcher("glob:$it")
                listFiles.addAll(pathDir.listDirectoryEntries().filterNot { matcher.matches(it) })
            }
            Log.e("ChecklistDB.listDirectory()", "Count: listFiles = ${listFiles.count()}")

            //Delete all selected files that are older than the cutoff datetime
            listFiles //Process the List of files selected for deletion
                .filter { it.getLastModifiedTime().toInstant().isBefore(timeCutoff) }
                .forEach {
                    Log.e("    deleting", "*t$it")
            //        it.deleteIfExists()
                }
        }
    }

As you can see, The globbing patterns are passed in to the method (via globExclude) as a List, so you can use as many as you like.

Although this method excludes files that match the patterns, you can get the effect that you want (i. e. selecting matching files) by changing the call to listDirectoryEntries().filterNot() to just listDirectoryEntries().filter().

I hope that this helps!

Answered By: John
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.