Python: AttributeError: Can't pickle local object 'writeBuf.<locals>.write'

Question:

I am not at all familiar with Python, I usually do Ruby or JS. But I need to write a benchmarking script on a system that runs Python. What I’m trying to do is create a small script that gets a file size and the number of threads and write a random buffer. This is what I got after 2 hours of fiddling:

from multiprocessing import Pool
import os, sys

def writeBuf(buf):
    def write(n):
        f = open(os.path.join(directory, 'n' + str(n)), 'w')
        try:
            f.write(buf)
            f.flush()
            os.fsync(f.fileno)
        finally:
            f.close()
    return write

if __name__ == '__main__':
    targetDir = sys.argv[1]
    numThreads = int(sys.argv[2])
    numKiloBytes = int(sys.argv[3])
    numFiles = int(102400 / numKiloBytes)

    buf = os.urandom(numKiloBytes * 1024)

    directory = os.path.join(targetDir, str(numKiloBytes) + 'k')
    if not os.path.exists(directory):
        os.makedirs(directory)

    with Pool(processes=numThreads) as pool:
        pool.map(writeBuf(buf), range(numFiles))

But it throws the error: AttributeError: Can't pickle local object 'writeBuf.<locals>.write'

I have tried before to use write without the closure, but I got an error when I tried to define the function inside the __name__ == '__main__' part. Omitting that if also lead to an error and I read that it is required for Pool to work.

What’s supposed to be just a tiny script turned into a huge ordeal, can anyone point me the right way?

Asked By: Lanbo

||

Answers:

In theory, python can’t pickle functions. (for details, see Can't pickle Function)

In practice, python pickles a function’s name and module so that passing a function will work. In your case though, the function that you’re trying to pass is a local variable returned by writeBuf.

Instead:

  1. Remove the writeBuf wrapper.
  2. Don’t use the write function’s closure (buf and directory), instead give write everything it needs as a parameter.

Result:

def write(args):
    directory, buf, n = args

    with open(os.path.join(directory, 'n' + str(n)), 'w') as f:
        # might as well use with-statements ;)
        f.write(buf)
        f.flush()
        os.fsync(f.fileno)

if __name__ == '__main__':
    ...

    with Pool(processes=numThreads) as pool:
        nargs = [(directory, buf, n) for n in range(numFiles)]
        pool.map(write, nargs)
Answered By: Hetzroni
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.