How to let Pool.map take a lambda function
Question:
I have the following function:
def copy_file(source_file, target_dir):
pass
Now I would like to use multiprocessing
to execute this function at once:
p = Pool(12)
p.map(lambda x: copy_file(x,target_dir), file_list)
The problem is, lambda’s can’t be pickled, so this fails. What is the most neat (pythonic) way to fix this?
Answers:
Use a function object:
class Copier(object):
def __init__(self, tgtdir):
self.target_dir = tgtdir
def __call__(self, src):
copy_file(src, self.target_dir)
To run your Pool.map
:
p.map(Copier(target_dir), file_list)
For Python2.7+ or Python3, you could use functools.partial:
import functools
copier = functools.partial(copy_file, target_dir=target_dir)
p.map(copier, file_list)
Question is a bit old but if you are still use Python 2 my answer can be useful.
Trick is to use part of pathos project: multiprocess fork of multiprocessing. It get rid of annoying limitation of original multiprocess.
Installation: pip install multiprocess
Usage:
>>> from multiprocess import Pool
>>> p = Pool(4)
>>> print p.map(lambda x: (lambda y:y**2)(x) + x, xrange(10))
[0, 2, 6, 12, 20, 30, 42, 56, 72, 90]
From this answer, pathos let’s you run your lambda p.map(lambda x: copy_file(x,target_dir), file_list)
directly, saving all the workarounds / hacks
You can use starmap()
to solve this problem with pooling.
Given that you have a list of files, say in your working directory, and you have a location you would like to copy those files to, then you can import os
and use os.system()
to run terminal commands in python. This will allow you to move the files over with ease.
However, before you start you will need to create a variable res = [(file, target_dir) for file in file_list]
that will house each file with the target directory.
It will look like…
[('test1.pdf', '/home/mcurie/files/pdfs/'), ('test2.pdf', '/home/mcurie/files/pdfs/'), ('test3.pdf', '/home/mcurie/files/pdfs/'), ('test4.pdf', '/home/mcurie/files/pdfs/')]
Obviously, for this use case you can simplify this process by storing each file and target directory in one string to begin with, but that would reduce the insight of using this method.
The idea is that starmap()
is going to take each component of res
and place it into the function copy_file(source_file, target_dir)
and execute them synchronously (this is limited by the core quantity of your cpu).
Therefore, the first operational thread will look like
copy_file('test1.pdf', '/home/mcurie/files/pdfs/')
I hope this helps. The full code is below.
from multiprocessing.pool import Pool
import os
file_list = ["test1.pdf", "test2.pdf", "test3.pdf", "test4.pdf"]
target_dir = "/home/mcurie/files/pdfs/"
def copy_file(source_file, target_dir):
os.system(f"cp {source_file} {target_dir + source_file}")
if __name__ == '__main__':
with Pool() as p:
res = [(file, target_dir) for file in file_list]
for results in p.starmap(copy_file, res):
pass
I have the following function:
def copy_file(source_file, target_dir):
pass
Now I would like to use multiprocessing
to execute this function at once:
p = Pool(12)
p.map(lambda x: copy_file(x,target_dir), file_list)
The problem is, lambda’s can’t be pickled, so this fails. What is the most neat (pythonic) way to fix this?
Use a function object:
class Copier(object):
def __init__(self, tgtdir):
self.target_dir = tgtdir
def __call__(self, src):
copy_file(src, self.target_dir)
To run your Pool.map
:
p.map(Copier(target_dir), file_list)
For Python2.7+ or Python3, you could use functools.partial:
import functools
copier = functools.partial(copy_file, target_dir=target_dir)
p.map(copier, file_list)
Question is a bit old but if you are still use Python 2 my answer can be useful.
Trick is to use part of pathos project: multiprocess fork of multiprocessing. It get rid of annoying limitation of original multiprocess.
Installation: pip install multiprocess
Usage:
>>> from multiprocess import Pool
>>> p = Pool(4)
>>> print p.map(lambda x: (lambda y:y**2)(x) + x, xrange(10))
[0, 2, 6, 12, 20, 30, 42, 56, 72, 90]
From this answer, pathos let’s you run your lambda p.map(lambda x: copy_file(x,target_dir), file_list)
directly, saving all the workarounds / hacks
You can use starmap()
to solve this problem with pooling.
Given that you have a list of files, say in your working directory, and you have a location you would like to copy those files to, then you can import os
and use os.system()
to run terminal commands in python. This will allow you to move the files over with ease.
However, before you start you will need to create a variable res = [(file, target_dir) for file in file_list]
that will house each file with the target directory.
It will look like…
[('test1.pdf', '/home/mcurie/files/pdfs/'), ('test2.pdf', '/home/mcurie/files/pdfs/'), ('test3.pdf', '/home/mcurie/files/pdfs/'), ('test4.pdf', '/home/mcurie/files/pdfs/')]
Obviously, for this use case you can simplify this process by storing each file and target directory in one string to begin with, but that would reduce the insight of using this method.
The idea is that starmap()
is going to take each component of res
and place it into the function copy_file(source_file, target_dir)
and execute them synchronously (this is limited by the core quantity of your cpu).
Therefore, the first operational thread will look like
copy_file('test1.pdf', '/home/mcurie/files/pdfs/')
I hope this helps. The full code is below.
from multiprocessing.pool import Pool
import os
file_list = ["test1.pdf", "test2.pdf", "test3.pdf", "test4.pdf"]
target_dir = "/home/mcurie/files/pdfs/"
def copy_file(source_file, target_dir):
os.system(f"cp {source_file} {target_dir + source_file}")
if __name__ == '__main__':
with Pool() as p:
res = [(file, target_dir) for file in file_list]
for results in p.starmap(copy_file, res):
pass