Write thread-safe to file in python

Question:

How can I write data to a file in Python thread-safe?
I want to safe some variables to a file for every request and every hour I want to do some grouping and write it to mysql.

In Java I now put it in an Array which is cached and this is written to a file when the array is full.

How can I do this in Python? There are many concurrent requests so it has to be thread-safe.

EDIT:

We ended up using the logging module which works fine.

Asked By: user3605780

||

Answers:

Look at the Queue class, it is thread safe.

from Queue import Queue
writeQueue = Queue()

in thread

writeQueue.put(repr(some_object))

Then to dump it to a file,

outFile = open(path,'w')
while writeQueue.qsize():
  outFile.write(writeQueue.get())
outFile.flush()
outFile.close()

Queue will accept any python object, so if you’re trying to do something other than print to a file, just store the objects from the worker threads via Queue.put.

If you need to split the commits across multiple invocations of the script, you’ll need a way to cache partially built commits to disk. To avoid multiple copies trying to write to the file at the same time, use the lockfile module, available via pip. I usually use json to encode data for these purposes, it supports serializing strings, unicode, lists, numbers, and dicts, and is safer than pickle.

with lockfile.LockFile('/path/to/file.sql'):
  fin=open('/path/to/file')
  data=json.loads(fin.read())
  data.append(newdata)
  fin.close()
  fout=open('/path/to/file','w')
  fout.write(json.dumps(data))
  fout.close()

Note that depending on OS features, the time taken to lock and unlock the file as well as rewrite it for every request may be more than you expect. If possible, try to just append to the file, as that will be faster. Also, you may want to use a client/server model, where each ‘request’ launches a worker script which connects to a server process and forwards the data on via a network socket. This sidesteps the need for lockfiles; depending on how much data you’re talking, it may be able to hold it all in memory in the server process, or the server may need to serialize it to disk and pass it to the database that way.

WSGI server example:

from Queue import Queue
q=Queue()
def flushQueue():
    with open(path,'w') as f:
       while q.qsize():
           f.write(q.get())

def application(env, start_response):
   q.put("Hello World!")
   if q.qsize() > 999:
       flushQueue()
   start_response('200 OK', [('Content-Type', 'text/html')])
   return ["Hello!"]
Answered By: Perkins

We used the logging module:

import logging

logpath = "/tmp/log.log"
logger = logging.getLogger('log')
logger.setLevel(logging.INFO)
ch = logging.FileHandler(logpath)
ch.setFormatter(logging.Formatter('%(message)s'))
logger.addHandler(ch)


def application(env, start_response):
   logger.info("%s %s".format("hello","world!"))
   start_response('200 OK', [('Content-Type', 'text/html')])
   return ["Hello!"]
Answered By: user3605780

I’ve made a simple writer, that uses threading and Queue and works fine with multiple threads. Pros: teoreticaly it can aссept data from multiple processes without blocking them, and write asynconiosly in other thread. Cons: additional thread for writing consumes resourses; in CPython threading doesn’t give real multithreading.

from queue import Queue, Empty
from threading import Thread

class SafeWriter:
    def __init__(self, *args):
        self.filewriter = open(*args)
        self.queue = Queue()
        self.finished = False
        Thread(name = "SafeWriter", target=self.internal_writer).start()  
    
    def write(self, data):
        self.queue.put(data)
    
    def internal_writer(self):
        while not self.finished:
            try:
                data = self.queue.get(True, 1)
            except Empty:
                continue    
            self.filewriter.write(data)
            self.queue.task_done()
    
    def close(self):
        self.queue.join()
        self.finished = True
        self.filewriter.close()
                    
#use it like ordinary open like this:
w = SafeWriter("filename", "w")
w.write("can be used among multiple threads")
w.close() #it is really important to close or the program would not end 
Answered By: Leonid Mednikov

How to Thread-Safe Write to File

Writing to a file can be made thread-safe by using a mutual exclusion (mutex) lock.

Any code that opens and writes to the file, or appends to the file can be treated as a critical section of code subject to race conditions.

This code can be protected from race conditions by requiring that the accessing thread first acquire a mutex lock before executing the critical section.

A mutex lock can only be acquired by one thread at a time, and once acquired prevents any other thread acquiring it until the lock is released.

This means that only a single thread will be able to write to the file at a time, making writing to file thread safe.

This can be achieved using the threading.Lock class.

First, a lock can be created and shared among all code that needs to access the same file.

create a lock

lock = threading.Lock()

Once created, a thread can then acquire the lock before writing to file by calling the acquire() function. Once writing to the file is finished, the lock can be released by calling the release() function.

For example:

# acquire the lock
lock.acquire()
# open file for appending
with open('path/to/file.txt', 'a') as file:
    # write text to data
    file.write('Test')
# release the lock
lock.release()

Like opening the file, the context manager interface can be used to ensure that the lock is always released after the block enclosed code is exited.

For example:

# acquire the lock
with lock:
    # open file for appending
    with open('path/to/file.txt', 'a') as file:
        # write text to data
        file.write('Test')

If a thread attempts to acquire the lock while it has already been acquired, then it must block or wait until the lock is released. This waiting is performed automatically by the thread when attempting to acquire the lock, no additional code is required.

Answered By: Ali hamza
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.