Python read/write vs shutil copy

Question:

I need to save files uploaded to my server (Max file size is 10MB) and found this answer, which works perfectly. However, I’m wondering what is the point of using the shutil module, and what is the difference between this:

file_location = f"files/{uploaded_file.filename}"
with open(file_location, "wb+") as file_object:
    file_object.write(uploaded_file.file.read())

and this:

import shutil

file_location = f"files/{uploaded_file.filename}"
with open(file_location, "wb+") as file_object:
    shutil.copyfileobj(uploaded_file.file, file_object) 

During my programming experience, I came across shutil module multiple times, but still can’t figure out what its benefits are over read() and write() methods.

Asked By: salius

||

Answers:

Your method requires the whole file be in memory. shutil copies in chunks so you can copy files larger than memory. Also, shutil has routines to copy files by name so you don’t have to open them at all, and it can preserve the permissions, ownership, and creation/modification/access timestamps.

Answered By: Tim Roberts

I would like to highlight a few points with regards to OP’s question and the (currently accepted) answer by @Tim Roberts:

  1. "shutil copies in chunks so you can copy files larger than memory". You can also copy a file in chunks using read()—please
    have a look at the short example below, as well as this and this answer for more
    details—just like you can load the whole file into memory
    using shutil.copyfileobj(), by giving a negative length value.

    with open(uploaded_file.filename, 'wb') as f:
        while contents := uploaded_file.file.read(1024 * 1024):  # adjust the chunk size as desired
            f.write(contents)
    

    Under the hood, copyfileob() uses a very similar approach to the above, utilising read() and write() methods of file objects; hence, it would make little difference, if you used one over the other. The source code of copyfileob() can be seen below. The default buffer size, i.e., COPY_BUFSIZE below, is set to 1MB (1024 *1024 bytes), if it is running on Wnidows, or 64KB (64 * 1024 bytes) on other platforms (see here).

    def copyfileobj(fsrc, fdst, length=0):
        """copy data from file-like object fsrc to file-like object fdst"""
        if not length:
            length = COPY_BUFSIZE
        # Localize variable access to minimize overhead.
        fsrc_read = fsrc.read
        fdst_write = fdst.write
        while True:
            buf = fsrc_read(length)
            if not buf:
                break
            fdst_write(buf)
    
  2. "shutil has routines to copy files by name so you don’t have to open them at all…" Since OP seems to be using FastAPI
    framework
    (which is actually
    Starlette underneath), UploadFile exposes an actual Python
    SpooledTemporaryFile (a file-like object) that you can get using the .file
    attribute (source code can be found here). When FastAPI/Starlette creates a new instance of UploadFile, it already creates the SpooledTemporaryFile behind the scenes, which remains open. Hence, since you are dealing with a temporary
    file that has no visible name in the file system—that would otherwise allow you to copy the contents without opening the file using shutil—and which is already open, it would make no
    difference using either read() or copyfileobj().

  3. "it can preserve the permissions, ownership, and creation/modification/access timestamps." Even though this is about saving a file uploaded through a web framework—and hence, most of these metadata wouldn’t be transfered along with the file—as per the documentation, the above statement is not entirely true:

    Warning: Even the higher-level file copying functions (shutil.copy(), shutil.copy2()) cannot copy all file
    metadata.

    On POSIX platforms, this means that file owner and group are lost
    as well as ACLs
    . On Mac OS, the resource fork and other metadata are
    not used. This means that resources will be lost and file type and creator codes will not be correct. On Windows, file
    owners,
    ACLs and alternate data streams are not copied
    .

That being said, there is nothing wrong with using copyfileobj(). On the contrary, if you are dealing with large files and you would like to avoid loading the entire file into memory—as you may not have enough RAM to accommodate all the data—and you would rather use copyfileobj() instead of a similar solution using read() method (as described in point 1 above), it is perfectly fine to use shutil.copyfileobj(fsrc, fdst). Besides, copyfileobj() has been offered (since Python 3.8) as an alternative platform-dependent efficient copy operation. You can change the default buffer size through adjusting the length argument in copyfileobj().

Note

If copyfileobj() is used inside a FastAPI def (sync) endpoint, it is perfectly fine, as a normal def endpoint in FastAPI is run in an external threadpool that is then awaited, instead of being called directly (as it would block the server). On the other hand, async def endpoints run on the main (single) thread, and thus, calling such a method, i.e., copyfileobj(), that performs blocking I/O operations (as shown in the source code) would result in blocking the entire server (for more information on def vs async def, please have a look at this answer). Hence, if you are about to call copyfileobj() from within an async def endpoint, you should make sure to run this operation—as well as all other file operations, such as open() and close()—in a separate thread to ensure that the main thread (where coroutines are run) does not get blocked. You can do that using Starlette’s run_in_threadpool(), which is also used by FastAPI internally when you call the async methods of the UploadFile object, as shown here. For instance:

await run_in_threadpool(shutil.copyfileobj, fsrc, fdst)

For more details and code examples, please have a look at this answer.

Answered By: Chris
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.