glob finds the same file 10 times

Question:

Why does the recursive=True lead to finding the same file 10 times.

>>> for g in glob.glob("/home/result/test/**/**/**/*.xml", recursive=True):
...   if "testsystems" in g:
...     print(f"{g}")
... 
/home/result/test/foo/bar/test_results/testsystems.xml
/home/result/test/foo/bar/test_results/testsystems.xml
/home/result/test/foo/bar/test_results/testsystems.xml
/home/result/test/foo/bar/test_results/testsystems.xml
/home/result/test/foo/bar/test_results/testsystems.xml
/home/result/test/foo/bar/test_results/testsystems.xml
/home/result/test/foo/bar/test_results/testsystems.xml
/home/result/test/foo/bar/test_results/testsystems.xml
/home/result/test/foo/bar/test_results/testsystems.xml
/home/result/test/foo/bar/test_results/testsystems.xml

According to the docs I expected to need to use recursive=True to support **.

If recursive is true, the pattern “**” will match any files and zero or more directories, subdirectories and symbolic links to directories. If the pattern is followed by an os.sep or os.altsep then files will not match.

source: https://docs.python.org/3.10/library/glob.html

I understand now that I should either use

glob.glob("/home/result/test/**/*.xml", recursive=True)

or to get the expected result

glob.glob("/home/result/test/**/**/**/*.xml")

My main question is why does glob.glob("/home/result/test/**/**/**/*.xml", recursive=True) lead to duplicated files? And why is it 10?

Asked By: Sir l33tname

||

Answers:

They are these 10 matches:

   **                    **                    **
 1 foo/bar/test_results  -                     -
 2 foo/bar               test_results          -
 3 foo/bar               -                     test_results
 4 foo                   bar/test_results      -
 5 foo                   bar                   test_results
 6 foo                   -                     bar/test_results
 7 -                     foo/bar/test_results  -
 8 -                     foo/bar               test_results
 9 -                     foo                   bar/test_results
10 -                     -                     foo/bar/test_results

In all 10 cases, *.xml of course just matches the file in the matched folder.

The **/**/** is not doing what you think it is – it’s matching any number of subdirectories (due to the recursive option) three times, resulting in the 10 matches. Just /home/result/test/**/*.xml with recursive=True would do.

Answered By: Grismar
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.