What is the use of buffering in python's built-in open() function?

Question:

Python Documentation : https://docs.python.org/2/library/functions.html#open

open(name[, mode[, buffering]])  

The above documentation says “The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default.If omitted, the system default is used.”.
When I use

filedata = open(file.txt,"r",0)  

or

filedata = open(file.txt,"r",1)  

or

filedata = open(file.txt,"r",2)

or

filedata = open(file.txt,"r",-1) 

or

filedata = open(file.txt,"r")

The output has no change. Each line shown above prints at same speed.
output:

Mr. Bean is a British television programme series of fifteen 25-

minute episodes written by Robin Driscoll and starring Rowan Atkinson
as

the title character. Different episodes were also written by Robin

Driscoll and Richard Curtis, and one by Ben Elton. Thirteen of the

episodes were broadcast on ITV, from the pilot on 1 January 1990,
until

“Goodnight Mr. Bean” on 31 October 1995. A clip show, “The Best Bits
of

Mr. Bean”, was broadcast on 15 December 1995, and one episode, “Hair
by

Mr. Bean of London”, was not broadcast until 2006 on
Nickelodeon.

Then how the buffering parameter in the open() function is useful? What
value

of that buffering parameter is best to use?

Asked By: Srivishnu

||

Answers:

Enabling buffering means that you’re not directly interfacing with the OS’s representation of a file, or its file system API. Instead, a chunk of data is read from the raw OS filestream into a buffer until it is consumed, at which point more data is fetched into the buffer. In terms of the objects you get, you’ll get a BufferedIOBase object wrapping an underlying RawIOBase (which represents the raw file stream).

What is the benefit of this? Well interfacing with the raw stream might have high latency, because the operating system has to fool around with physical objects like the hard disk, and this may not be acceptable in all cases. Let’s say you want to read three letters from a file every 5ms and your file is on a crusty old hard disk, or even a network file system. Instead of trying to read from the raw filestream every 5ms, it is better to load a bunch of bytes from the file into a buffer in memory, then consume it at will.

What size of buffer you choose will depend on how you’re consuming the data. For the example above, a buffer size of 1 char would be awful, 3 chars would be alright, and any large multiple of 3 chars that doesn’t cause a noticeable delay for your users would be ideal.

Answered By: Asad Saeeduddin

You can also check the default buffer size by calling the read only DEFAULT_BUFFER_SIZE attribute from io module.

import io
print (io.DEFAULT_BUFFER_SIZE)

As described here

Answered By: N Randhawa

Buffering is the process of storing a chunk of a file in a temporary memory until the file loads completely. In python there are different values can be given. If the buffering is set to 0 , then the buffering is off. The buffering will be set to 1 when we need to buffer the file.

Answered By: joel.t.mathew

With buffering set to -1 my file write took 13 minutes. With buffering set to 2**10 my file write took 7 seconds. So, the purpose of buffering is to speed up your program.

Answered By: John Abraham

What is perhaps important from practical point of view is that the buffering parameter determines when the data you are sending to the stream is actually saved to disk.

When you open a file without the buffering parameter, and write some stuff to it, you will see the data is written only after the with open(...) as foo: block is exited (or when the file’s close() method is called), or when some system-determined default buffer size is reached. But if you set the buffering parameter, it will write the data as soon as that size of the buffer is reached.

Thus using i.e. open('file.txt', 'w', buffering=1) is a useful thing to do when you have a long-running application, and you are sending some data to a file, and you want it to save after each line, and not only after the application quits. Otherwise a crash, or a power outage, etc. could cause the data to be lost.

See also: How often does python flush to a file?

Answered By: Simimic
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.