Optimal way of storing stream content on disk using Python

Question:

I’d like to stream data directly to disk.

One way of doing that is simply to read data and write to file, but I also want to minimalize RAM usage.

with open("dummy.source", "br") as out, open("dummy.copy", "bw") as in_:
    in_.write(out.read())  # this causes reading the whole stream into memory

I’ve figured out some manual way of doing that:

with open("dummy.source", "br") as out, open("dummy.copy", "bw") as in_:
    while b := out.read(BUFFER_SIZE):
        in_.write(b)

Do I really have to manually load stream part by part?
If so, how can I determine optimal value of BUFFER_SIZE?

Asked By: Jakub Kuszneruk

||

Answers:

the optimal value of buffer size is most likely the size of the buffer already reserved by python which is 8192 bytes on most systems, but any value below that is fine as the IO will be buffered by python anyway.

you can change that using the buffering argument of open but 8192 is the optimal size on a lot of systems.

you can actually grab it from the current python interpreter by using

from io import DEFAULT_BUFFER_SIZE

this is in case it is changed in the future or for a given python interpreter.

Answered By: Ahmed AEK
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.