I have a directory with a bunch of files inside:
asd3442 … and
I want to exclude all files that start with
eph with the
How can I do it?
You can’t exclude patterns with the
glob function, globs only allow for inclusion patterns. Globbing syntax is very limited (even a
[!..] character class must match a character, so it is an inclusion pattern for every character that is not in the class).
You’ll have to do your own filtering; a list comprehension usually works nicely here:
files = [fn for fn in glob('somepath/*.txt') if not os.path.basename(fn).startswith('eph')]
You can deduct sets and cast it back as a list:
list(set(glob("*")) - set(glob("eph*")))
More generally, to exclude files that don’t comply with some shell regexp, you could use module
import fnmatch file_list = glob('somepath') for ind, ii in enumerate(file_list): if not fnmatch.fnmatch(ii, 'bash_regexp_with_exclude'): file_list.pop(ind)
The above will first generate a list from a given path and next pop out the files that won’t satisfy the regular expression with the desired constraint.
Late to the game but you could alternatively just apply a python
filter to the result of a
files = glob.iglob('your_path_here') files_i_care_about = filter(lambda x: not x.startswith("eph"), files)
or replacing the lambda with an appropriate regex search, etc…
EDIT: I just realized that if you’re using full paths the
startswith won’t work, so you’d need a regex
In : a Out: ['/some/path/foo', 'some/path/bar', 'some/path/eph_thing'] In : filter(lambda x: not re.search('/eph', x), a) Out: ['/some/path/foo', 'some/path/bar']
As mentioned by the accepted answer, you can’t exclude patterns with glob, so the following is a method to filter your glob result.
The accepted answer is probably the best pythonic way to do things but if you think list comprehensions look a bit ugly and want to make your code maximally numpythonic anyway (like I did) then you can do this (but note that this is probably less efficient than the list comprehension method):
import glob data_files = glob.glob("path_to_files/*.fits") light_files = np.setdiff1d( data_files, glob.glob("*BIAS*")) light_files = np.setdiff1d(light_files, glob.glob("*FLAT*"))
(In my case, I had some image frames, bias frames, and flat frames all in one directory and I just wanted the image frames)
The pattern rules for glob are not regular expressions. Instead, they follow standard Unix path expansion rules. There are only a few special characters: two different wild-cards, and character ranges are supported [from pymotw: glob – Filename pattern matching].
So you can exclude some files with patterns.
For example to exclude manifests files (files starting with
_) with glob, you can use:
files = glob.glob('files_path/[!_]*')
glob, I recommend
pathlib. Filtering one pattern is very simple.
from pathlib import Path p = Path(YOUR_PATH) filtered = [x for x in p.glob("**/*") if not x.name.startswith("eph")]
And if you want to filter a more complex pattern, you can define a function to do that, just like:
def not_in_pattern(x): return (not x.name.startswith("eph")) and not x.name.startswith("epi") filtered = [x for x in p.glob("**/*") if not_in_pattern(x)]
Using that code, you can filter all files that start with
eph or start with
How about skipping the particular file while iterating over all the files in the folder!
Below code would skip all excel files that start with ‘eph’
import glob import re for file in glob.glob('*.xlsx'): if re.match('eph.*.xlsx',file): continue else: #do your stuff here print(file)
This way you can use more complex regex patterns to include/exclude a particular set of files in a folder.
If the position of the character isn’t important, that is for example to exclude manifests files (wherever it is found
re – regular expression operations, you can use:
import glob import re for file in glob.glob('*.txt'): if re.match(r'.*_.*', file): continue else: print(file)
Or with in a more elegant way –
filtered = [f for f in glob.glob('*.txt') if not re.match(r'.*_.*', f)] for mach in filtered: print(mach)
Suppose you have this directory structure:
. ├── asd3442 ├── eee2314 ├── eph334 ├── eph_dir │ ├── asd330 │ ├── eph_file2 │ ├── exy123 │ └── file_with_eph ├── eph_file ├── not_eph_dir │ ├── ephXXX │ └── with_eph └── not_eph_rest
You can use full globs to filter full path results with pathlib and a generator for the top level directory:
i_want=(fn for fn in Path(path_to).glob('*') if not fn.match('**/*/eph*')) >>> list(i_want) [PosixPath('/tmp/test/eee2314'), PosixPath('/tmp/test/asd3442'), PosixPath('/tmp/test/not_eph_rest'), PosixPath('/tmp/test/not_eph_dir')]
The pathlib method match uses globs to match a path object; The glob
'**/*/eph*' is any full path that leads to a file with a name starting with
Alternatively, you can use the
.name attribute with
i_want=(fn for fn in Path(path_to).glob('*') if not fn.name.startswith('eph'))
If you want only files, no directories:
i_want=(fn for fn in Path(path_to).glob('*') if fn.is_file() and not fn.match('**/*/eph*')) # [PosixPath('/tmp/test/eee2314'), PosixPath('/tmp/test/asd3442'), PosixPath('/tmp/test/not_eph_rest')]
The same method works for recursive globs:
i_want=(fn for fn in Path(path_to).glob('**/*') if fn.is_file() and not fn.match('**/*/eph*')) # [PosixPath('/tmp/test/eee2314'), PosixPath('/tmp/test/asd3442'), PosixPath('/tmp/test/not_eph_rest'), PosixPath('/tmp/test/eph_dir/asd330'), PosixPath('/tmp/test/eph_dir/file_with_eph'), PosixPath('/tmp/test/eph_dir/exy123'), PosixPath('/tmp/test/not_eph_dir/with_eph')]
To exclude exact word you may want to implement custom regex directive, which you will then replace by empty string before
#!/usr/bin/env python3 import glob import re # glob (or fnmatch) does not support exact word matching. This is custom directive to overcome this issue glob_exact_match_regex = r"[^.*]" path = "[^exclude.py]*py" # [^...] is a custom directive, that excludes exact match # Process custom directive try: # Try to parse exact match direction exact_match = re.findall(glob_exact_match_regex, path).replace('[^', '').replace(']', '') except IndexError: exact_match = None else: # Remove custom directive path = re.sub(glob_exact_match_regex, "", path) paths = glob.glob(path) # Implement custom directive if exact_match is not None: # Exclude all paths with specified string paths = [p for p in paths if exact_match not in p] print(paths)
import glob import re """ This is a path that should be excluded """ EXCLUDE = "/home/koosha/Documents/Excel" files = glob.glob("/home/koosha/Documents/**/*.*" , recursive=True) for file in files: if re.search(EXCLUDE,file): pass else: print(file)