Is it better to open/close a file every time vs keeping it open until the process is finished?

Question:

I have a text file that is about 50 GB in size, and I am checking the first few characters of each line and writing those to other files specified for that beginning text.

For example, my input contains:

cow_ilovecow
dog_whreismydog
cat_thatcatshouldgotoreddit
dog_gotitfromshelter
...............

I want to process them in cow, dog, cat, etc. (about 200) categories,

if writeflag==1:
    writefile1=open(writefile,"a") #writefile is somedir/dog.txt....
    writefile1.write(remline+"n")
    #writefile1.close()

What is the best way, should I close the file? Otherwise, if I keep it open, is writefile1=open(writefile,"a") doing the right thing?

Asked By: Ananta

||

Answers:

Keep it open the whole time! Otherwise you tell the system that you are done writing all the time and it might decide to flush it onto the disk instead of buffering it. And for obvious reasons n disk writes are much more expensive than 1 disk write.

If you want to append to the file and not overwrite it then yes, a is the correct mode.

Answered By: ThiefMaster

Use the with statement, it automatically closes the files for you, do all the operations inside the with block, so it’ll keep the files open for you and will close the files once you’re out of the with block.

with open(inputfile)as f1, open('dog.txt','a') as f2,open('cat.txt') as f3:
   #do something here

EDIT:
If you know all the possible filenames to be used before the compilation of your code then using with is a better option and if you don’t then you should use your approach but instead of closing the file you can flush the data to the file using writefile1.flush()

Answered By: Ashwini Chaudhary

IO operations consume too much time. Open and close the file, also.

It’s much faster if you open both files(the input and the output), use a memory buffer with, let’s say, 10MB size for your text processing and then write this to the output file. For example:

file = {} # just initializing dicts
filename = {}
with open(file) as f:
    file['dog'] = None
    buffer = ''
    ...
    #maybe there is a loop here
    if writeflag:
        if file['dog'] == None:
            file['dog'] = open(filename['dog'], 'a')
        buffer += remline + 'n'
    if len(buffer) > 1024*1000*10: # 10MB of text
       files['dog'].write(buffer)
       buffer = ''

for v in files.values():
    v.close()
Answered By: Felipe Tonello

You should definitely try to open/close the file as little as possible

Because even comparing with file read/write, file open/close is far more expensive

Consider two code blocks:

f=open('test1.txt', 'w')
for i in range(1000):
    f.write('n')
f.close()

and

for i in range(1000):
    f=open('test2.txt', 'a')
    f.write('n')
    f.close()

The first one takes 0.025s while the second one takes 0.309s

Answered By: xvatar
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.