How to find files and skip directories in os.listdir
Question:
I use os.listdir
and it works fine, but I get sub-directories in the list also, which is not what I want: I need only files.
What function do I need to use for that?
I looked also at os.walk
and it seems to be what I want, but I’m not sure of how it works.
Answers:
You need to filter out directories; os.listdir()
lists all names in a given path. You can use os.path.isdir()
for this:
basepath = '/path/to/directory'
for fname in os.listdir(basepath):
path = os.path.join(basepath, fname)
if os.path.isdir(path):
# skip directories
continue
Note that this only filters out directories after following symlinks. fname
is not necessarily a regular file, it could also be a symlink to a file. If you need to filter out symlinks as well, you’d need to use not os.path.islink()
first.
On a modern Python version (3.5 or newer), an even better option is to use the os.scandir()
function; this produces DirEntry()
instances. In the common case, this is faster as the direntry loaded already has cached enough information to determine if an entry is a directory or not:
basepath = '/path/to/directory'
for entry in os.scandir(basepath):
if entry.is_dir():
# skip directories
continue
# use entry.path to get the full path of this entry, or use
# entry.name for the base filename
You can use entry.is_file(follow_symlinks=False)
if only regular files (and not symlinks) are needed.
os.walk()
does the same work under the hood; unless you need to recurse down subdirectories, you don’t need to use os.walk()
here.
for fname in os.listdir('.'):
if os.path.isdir(fname):
pass # do your stuff here for directory
else:
pass # do your stuff here for regular file
Here is a nice little one-liner in the form of a list comprehension:
[f for f in os.listdir(your_directory) if os.path.isfile(os.path.join(your_directory, f))]
This will return
a list
of filenames within the specified your_directory
.
import os
directoryOfChoice = "C:\" # Replace with a directory of choice!!!
filter(os.path.isfile, os.listdir(directoryOfChoice))
P.S: os.getcwd() returns the current directory.
Even though this is an older post, let me please add the pathlib library introduced in 3.4 which provides an OOP style of handling directories and files for sakes of completeness. To get all files in a directory, you can use
def get_list_of_files_in_dir(directory: str, file_types: str ='*') -> list:
return [f for f in Path(directory).glob(file_types) if f.is_file()]
Following your example, you could use it like this:
mypath = '/path/to/directory'
files = get_list_of_files_in_dir(mypath)
If you only want a subset of files depending on the file extension (e.g. "only csv files"), you can use:
files = get_list_of_files_in_dir(mypath, '*.csv')
The solution with os.walk() would be:
for r, d, f in os.walk('path/to/dir'):
for files in f:
# This will list all files given in a particular directory
Note PEP 471 DirEntry object attributes is: is_dir(*, follow_symlinks=True)
so…
from os import scandir
folder = '/home/myfolder/'
for entry in scandir(folder):
if entry.is_dir():
# do code or skip
continue
myfile = folder + entry.name
#do something with myfile
I use os.listdir
and it works fine, but I get sub-directories in the list also, which is not what I want: I need only files.
What function do I need to use for that?
I looked also at os.walk
and it seems to be what I want, but I’m not sure of how it works.
You need to filter out directories; os.listdir()
lists all names in a given path. You can use os.path.isdir()
for this:
basepath = '/path/to/directory'
for fname in os.listdir(basepath):
path = os.path.join(basepath, fname)
if os.path.isdir(path):
# skip directories
continue
Note that this only filters out directories after following symlinks. fname
is not necessarily a regular file, it could also be a symlink to a file. If you need to filter out symlinks as well, you’d need to use not os.path.islink()
first.
On a modern Python version (3.5 or newer), an even better option is to use the os.scandir()
function; this produces DirEntry()
instances. In the common case, this is faster as the direntry loaded already has cached enough information to determine if an entry is a directory or not:
basepath = '/path/to/directory'
for entry in os.scandir(basepath):
if entry.is_dir():
# skip directories
continue
# use entry.path to get the full path of this entry, or use
# entry.name for the base filename
You can use entry.is_file(follow_symlinks=False)
if only regular files (and not symlinks) are needed.
os.walk()
does the same work under the hood; unless you need to recurse down subdirectories, you don’t need to use os.walk()
here.
for fname in os.listdir('.'):
if os.path.isdir(fname):
pass # do your stuff here for directory
else:
pass # do your stuff here for regular file
Here is a nice little one-liner in the form of a list comprehension:
[f for f in os.listdir(your_directory) if os.path.isfile(os.path.join(your_directory, f))]
This will return
a list
of filenames within the specified your_directory
.
import os
directoryOfChoice = "C:\" # Replace with a directory of choice!!!
filter(os.path.isfile, os.listdir(directoryOfChoice))
P.S: os.getcwd() returns the current directory.
Even though this is an older post, let me please add the pathlib library introduced in 3.4 which provides an OOP style of handling directories and files for sakes of completeness. To get all files in a directory, you can use
def get_list_of_files_in_dir(directory: str, file_types: str ='*') -> list:
return [f for f in Path(directory).glob(file_types) if f.is_file()]
Following your example, you could use it like this:
mypath = '/path/to/directory'
files = get_list_of_files_in_dir(mypath)
If you only want a subset of files depending on the file extension (e.g. "only csv files"), you can use:
files = get_list_of_files_in_dir(mypath, '*.csv')
The solution with os.walk() would be:
for r, d, f in os.walk('path/to/dir'):
for files in f:
# This will list all files given in a particular directory
Note PEP 471 DirEntry object attributes is: is_dir(*, follow_symlinks=True)
so…
from os import scandir
folder = '/home/myfolder/'
for entry in scandir(folder):
if entry.is_dir():
# do code or skip
continue
myfile = folder + entry.name
#do something with myfile