Yield in a recursive function
Question:
I am trying to do something to all the files under a given path. I don’t want to collect all the file names beforehand then do something with them, so I tried this:
import os
import stat
def explore(p):
s = ''
list = os.listdir(p)
for a in list:
path = p + '/' + a
stat_info = os.lstat(path )
if stat.S_ISDIR(stat_info.st_mode):
explore(path)
else:
yield path
if __name__ == "__main__":
for x in explore('.'):
print '-->', x
But this code skips over directories when it hits them, instead of yielding their contents. What am I doing wrong?
Answers:
Use os.walk
instead of reinventing the wheel.
In particular, following the examples in the library documentation, here is an untested attempt:
import os
from os.path import join
def hellothere(somepath):
for root, dirs, files in os.walk(somepath):
for curfile in files:
yield join(root, curfile)
# call and get full list of results:
allfiles = [ x for x in hellothere("...") ]
# iterate over results lazily:
for x in hellothere("..."):
print x
Try this:
if stat.S_ISDIR(stat_info.st_mode):
for p in explore(path):
yield p
That calls explore
like a function. What you should do is iterate it like a generator:
if stat.S_ISDIR(stat_info.st_mode):
for p in explore(path):
yield p
else:
yield path
EDIT: Instead of the stat
module, you could use os.path.isdir(path)
.
Iterators do not work recursively like that. You have to re-yield each result, by replacing
explore(path)
with something like
for value in explore(path):
yield value
Python 3.3 added the syntax yield from X
, as proposed in PEP 380, to serve this purpose. With it you can do this instead:
yield from explore(path)
If you’re using generators as coroutines, this syntax also supports the use of generator.send()
to pass values back into the recursively-invoked generators. The simple for
loop above would not.
Change this:
explore(path)
To this:
for subpath in explore(path):
yield subpath
Or use os.walk
, as phooji suggested (which is the better option).
The problem is this line of code:
explore(path)
What does it do?
- calls
explore
with the new path
explore
runs, creating a generator
- the generator is returned to the spot where
explore(path)
was executed . . .
- and is discarded
Why is it discarded? It wasn’t assigned to anything, it wasn’t iterated over — it was completely ignored.
If you want to do something with the results, well, you have to do something with them! 😉
The easiest way to fix your code is:
for name in explore(path):
yield name
When you are confident you understand what’s going on, you’ll probably want to use os.walk()
instead.
Once you have migrated to Python 3.3 (assuming all works out as planned) you will be able to use the new yield from
syntax and the easiest way to fix your code at that point will be:
yield from explore(path)
os.walk is great if you need to traverse all the folders and subfolders. If you don’t need that, it’s like using an elephant gun to kill a fly.
However, for this specific case, os.walk could be a better approach.
You can also implement the recursion using a stack.
There is not really any advantage in doing this though, other than the fact that it is possible. If you are using python in the first place, the performance gains are probably not worthwhile.
import os
import stat
def explore(p):
'''
perform a depth first search and yield the path elements in dfs order
-implement the recursion using a stack because a python can't yield within a nested function call
'''
list_t=type(list())
st=[[p,0]]
while len(st)>0:
x=st[-1][0]
print x
i=st[-1][1]
if type(x)==list_t:
if i>=len(x):
st.pop(-1)
else:
st[-1][1]+=1
st.append([x[i],0])
else:
st.pop(-1)
stat_info = os.lstat(x)
if stat.S_ISDIR(stat_info.st_mode):
st.append([['%s/%s'%(x,a) for a in os.listdir(x)],0])
else:
yield x
print list(explore('.'))
To answer the original question as asked, the key is that the yield
statement needs to be propagated back out of the recursion (just like, say, return
). Here is a working reimplementation of os.walk()
. I’m using this in a pseudo-VFS implementation, where I additionally replace os.listdir()
and similar calls.
import os, os.path
def walk (top, topdown=False):
items = ([], [])
for name in os.listdir(top):
isdir = os.path.isdir(os.path.join(top, name))
items[isdir].append(name)
result = (top, items[True], items[False])
if topdown:
yield result
for folder in items[True]:
for item in walk(os.path.join(top, folder), topdown=topdown):
yield item
if not topdown:
yield result
I am trying to do something to all the files under a given path. I don’t want to collect all the file names beforehand then do something with them, so I tried this:
import os
import stat
def explore(p):
s = ''
list = os.listdir(p)
for a in list:
path = p + '/' + a
stat_info = os.lstat(path )
if stat.S_ISDIR(stat_info.st_mode):
explore(path)
else:
yield path
if __name__ == "__main__":
for x in explore('.'):
print '-->', x
But this code skips over directories when it hits them, instead of yielding their contents. What am I doing wrong?
Use os.walk
instead of reinventing the wheel.
In particular, following the examples in the library documentation, here is an untested attempt:
import os
from os.path import join
def hellothere(somepath):
for root, dirs, files in os.walk(somepath):
for curfile in files:
yield join(root, curfile)
# call and get full list of results:
allfiles = [ x for x in hellothere("...") ]
# iterate over results lazily:
for x in hellothere("..."):
print x
Try this:
if stat.S_ISDIR(stat_info.st_mode):
for p in explore(path):
yield p
That calls explore
like a function. What you should do is iterate it like a generator:
if stat.S_ISDIR(stat_info.st_mode):
for p in explore(path):
yield p
else:
yield path
EDIT: Instead of the stat
module, you could use os.path.isdir(path)
.
Iterators do not work recursively like that. You have to re-yield each result, by replacing
explore(path)
with something like
for value in explore(path):
yield value
Python 3.3 added the syntax yield from X
, as proposed in PEP 380, to serve this purpose. With it you can do this instead:
yield from explore(path)
If you’re using generators as coroutines, this syntax also supports the use of generator.send()
to pass values back into the recursively-invoked generators. The simple for
loop above would not.
Change this:
explore(path)
To this:
for subpath in explore(path):
yield subpath
Or use os.walk
, as phooji suggested (which is the better option).
The problem is this line of code:
explore(path)
What does it do?
- calls
explore
with the newpath
explore
runs, creating a generator- the generator is returned to the spot where
explore(path)
was executed . . . - and is discarded
Why is it discarded? It wasn’t assigned to anything, it wasn’t iterated over — it was completely ignored.
If you want to do something with the results, well, you have to do something with them! 😉
The easiest way to fix your code is:
for name in explore(path):
yield name
When you are confident you understand what’s going on, you’ll probably want to use os.walk()
instead.
Once you have migrated to Python 3.3 (assuming all works out as planned) you will be able to use the new yield from
syntax and the easiest way to fix your code at that point will be:
yield from explore(path)
os.walk is great if you need to traverse all the folders and subfolders. If you don’t need that, it’s like using an elephant gun to kill a fly.
However, for this specific case, os.walk could be a better approach.
You can also implement the recursion using a stack.
There is not really any advantage in doing this though, other than the fact that it is possible. If you are using python in the first place, the performance gains are probably not worthwhile.
import os
import stat
def explore(p):
'''
perform a depth first search and yield the path elements in dfs order
-implement the recursion using a stack because a python can't yield within a nested function call
'''
list_t=type(list())
st=[[p,0]]
while len(st)>0:
x=st[-1][0]
print x
i=st[-1][1]
if type(x)==list_t:
if i>=len(x):
st.pop(-1)
else:
st[-1][1]+=1
st.append([x[i],0])
else:
st.pop(-1)
stat_info = os.lstat(x)
if stat.S_ISDIR(stat_info.st_mode):
st.append([['%s/%s'%(x,a) for a in os.listdir(x)],0])
else:
yield x
print list(explore('.'))
To answer the original question as asked, the key is that the yield
statement needs to be propagated back out of the recursion (just like, say, return
). Here is a working reimplementation of os.walk()
. I’m using this in a pseudo-VFS implementation, where I additionally replace os.listdir()
and similar calls.
import os, os.path
def walk (top, topdown=False):
items = ([], [])
for name in os.listdir(top):
isdir = os.path.isdir(os.path.join(top, name))
items[isdir].append(name)
result = (top, items[True], items[False])
if topdown:
yield result
for folder in items[True]:
for item in walk(os.path.join(top, folder), topdown=topdown):
yield item
if not topdown:
yield result