Reading from a frequently updated file

Question:

I’m currently writing a program in python on a Linux system. The objective is to read a log file and execute a bash command upon finding a particular string. The log file is being constantly written to by another program.

My question: If I open the file using the open() method will my Python file object be updated as the actual file gets written to by the other program or will I have to reopen the file at timed intervals?

UPDATE: Thanks for answers so far. I perhaps should have mentioned that the file is being written to by a Java EE app so I have no control over when data gets written to it. I’ve currently got a program that reopens the file every 10 seconds and tries to read from the byte position in the file that it last read up to. For the moment it just prints out the string that’s returned. I was hoping that the file did not need to be reopened but the read command would somehow have access to the data written to the file by the Java app.

#!/usr/bin/python
import time

fileBytePos = 0
while True:
    inFile = open('./server.log','r')
    inFile.seek(fileBytePos)
    data = inFile.read()
    print data
    fileBytePos = inFile.tell()
    print fileBytePos
    inFile.close()
    time.sleep(10)

Thanks for the tips on pyinotify and generators. I’m going to have a look at these for a nicer solution.

Asked By: JimS

||

Answers:

I am no expert here but I think you will have to use some kind of observer pattern to passively watch the file and then fire off an event that reopens the file when a change occurs. As for how to actually implement this, I have no idea.

I do not think that open() will open the file in realtime as you suggest.

Answered By: Adam Pointer

Since you’re targeting a Linux system, you can use pyinotify to notify you when the file changes.

There’s also this trick, which may work fine for you. It uses file.seek to do what tail -f does.

Answered By: nmichaels

I would recommend looking at David Beazley’s Generator Tricks for Python, especially Part 5: Processing Infinite Data. It will handle the Python equivalent of a tail -f logfile command in real-time.

# follow.py
#
# Follow a file like tail -f.

import time
def follow(thefile):
    thefile.seek(0,2)
    while True:
        line = thefile.readline()
        if not line:
            time.sleep(0.1)
            continue
        yield line

if __name__ == '__main__':
    logfile = open("run/foo/access-log","r")
    loglines = follow(logfile)
    for line in loglines:
        print line,
Answered By: Jeff Bauer

If you have the code reading the file running in a while loop:

f = open('/tmp/workfile', 'r')
while(1):
    line = f.readline()
    if line.find("ONE") != -1:
        print "Got it"

and you are writing to that same file ( in append mode ) from another program. As soon as “ONE” is appended in the file you will get the print. You can take whatever action you want to take. In short, you dont have to reopen the file at regular intervals.

>>> f = open('/tmp/workfile', 'a')
>>> f.write("Onen")
>>> f.close()
>>> f = open('/tmp/workfile', 'a')
>>> f.write("ONEn")
>>> f.close()
Answered By: w00t

“An interactive session is worth 1000 words”

>>> f1 = open("bla.txt", "wt")
>>> f2 = open("bla.txt", "rt")
>>> f1.write("bleh")
>>> f2.read()
''
>>> f1.flush()
>>> f2.read()
'bleh'
>>> f1.write("blargh")
>>> f1.flush()
>>> f2.read()
'blargh'

In other words – yes, a single “open” will do.

Answered By: jsbueno

Here is a slightly modified version of Jeff Bauer answer which is resistant to file truncation. Very useful if your file is being processed by logrotate.

import os
import time

def follow(name):
    current = open(name, "r")
    curino = os.fstat(current.fileno()).st_ino
    while True:
        while True:
            line = current.readline()
            if not line:
                break
            yield line

        try:
            if os.stat(name).st_ino != curino:
                new = open(name, "r")
                current.close()
                current = new
                curino = os.fstat(current.fileno()).st_ino
                continue
        except IOError:
            pass
        time.sleep(1)


if __name__ == '__main__':
    fname = "test.log"
    for l in follow(fname):
        print "LINE: {}".format(l)
Answered By: Andrew Druchenko

I have a similar use case, and I have written the following snippet for it.
While some may argue that this is not the most ideal way to do it, this gets the job done and looks easy enough to understand.

def reading_log_files(filename):
    with open(filename, "r") as f:
        data = f.read().splitlines()
    return data


def log_generator(filename, period=1):
    data = reading_log_files(filename)
    while True:
        time.sleep(period)
        new_data = reading_log_files(filename)
        yield new_data[len(data):]
        data = new_data


if __name__ == '__main__':
    x = log_generator(</path/to/log/file.log>)
    for lines in x:
        print(lines)
        # lines will be a list of new lines added at the end

Hope you find this useful

Answered By: noob_coder

It depends on what exactly you want to do with the file. There are two potential use-cases with this:

  1. Reading appended contents from a continuously updated file such as a log file.
  2. Reading contents from a file which is overwritten continuously (such as the network statistics file in *nix systems)

As other people have elaborately answered on how to address scenario #1, I would like to help with those who need scenario #2. Basically you need to reset the file pointer to 0 using seek(0) (or whichever position you want to read from) before calling read() n+1th time.

Your code can look somewhat like the below function.

def generate_network_statistics(iface='wlan0'):
    with open('/sys/class/net/' + iface + '/statistics/' + 'rx' + '_bytes', 'r') as rx:
        with open('/sys/class/net/' + iface + '/statistics/' + 'tx' + '_bytes', 'r') as tx:
            with open('/proc/uptime', 'r') as uptime:
                while True:
                    receive = int(rx.read())
                    rx.seek(0)
                    transmit = int(tx.read())
                    tx.seek(0)
                    uptime_seconds = int(uptime.read())
                    uptime.seek(0)
                    print("Receive: %i, Transmit: %i" % (receive, transmit))
                    time.sleep(1)
Answered By: Dheeraj Pb

Keep the file handle open even if an empty string is returned at the end of the file, and try again to read it after some sleep time.

    import time

    syslog = '/var/log/syslog'
    sleep_time_in_seconds = 1

    try:
        with open(syslog, 'r', errors='ignore') as f:
            while True:
                for line in f:
                    if line:
                        print(line.strip())
                        # do whatever you want to do on the line
                time.sleep(sleep_time_in_seconds)
    except IOError as e:
        print('Cannot open the file {}. Error: {}'.format(syslog, e))
Answered By: Nasimuddin Ansari
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.