Read file line by line with asyncio

Question:

I wish to read several log files as they are written and process their input with asyncio. The code will have to run on windows. From what I understand from searching around both stackoverflow and the web, asynchronous file I/O is tricky on most operating systems (select will not work as intended, for example). While I’m sure I could do this with other methods (e.g. threads), I though I would try out asyncio to see what it is like. The most helpful answer would probably be one that describes what the “architecture” of a solution to this problem should look like, i.e. how different functions and coroutines should be called or scheduled.

The following gives me a generator that reads the files line by line (through polling, which is acceptable):

import time

def line_reader(f):
    while True:
        line = f.readline()
        if not line:
            time.sleep(POLL_INTERVAL)
            continue
        process_line(line)

With several files to monitor and process, this sort of code would require threads. I have modified it slightly to be more usable with asyncio:

import asyncio

def line_reader(f):
    while True:
        line = f.readline()
        if not line:
            yield from asyncio.sleep(POLL_INTERVAL)
            continue
        process_line(line)

This sort of works when I schedule it through the asyncio event loop, but if process_data blocks, then that is of course not good. When starting out, I imagined the solution would look something like

def process_data():
    ...
    while True:
        ...
        line = yield from line_reader()
        ...

but I could not figure out how to make that work (at least not without process_data managing quite a bit of state).

Any ideas on how I should structure this kind of code?

Asked By: josteinb

||

Answers:

asyncio doesn’t support file operations yet, sorry.

Thus it cannot help with your problem.

Answered By: Andrew Svetlov

Your code structure looks good to me, the following code runs fine on my machine:

import asyncio

PERIOD = 0.5

@asyncio.coroutine
def readline(f):
    while True:
        data = f.readline()
        if data:
            return data
        yield from asyncio.sleep(PERIOD)

@asyncio.coroutine
def test():
    with open('test.txt') as f:
        while True:
            line = yield from readline(f)
            print('Got: {!r}'.format(line))

loop = asyncio.get_event_loop()
loop.run_until_complete(test())
Answered By: Vincent

From what I understand from searching around both stackoverflow and the web, asynchronous file I/O is tricky on most operating systems (select will not work as intended, for example). While I’m sure I could do this with other methods (e.g. threads), I though I would try out asyncio to see what it is like.

asyncio is select based on *nix systems under the hood, so you won’t be able to do non-blocking file I/O without the use of threads. On Windows, asyncio can use IOCP, which supports non-blocking file I/O, but this isn’t supported by asyncio.

Your code is fine, except you should do blocking I/O calls in threads, so that you don’t block the event loop if the I/O is slow. Fortunately, it’s really simple to off load work to threads using the loop.run_in_executor function.

First, setup a dedicated thread-pool for your I/O:

from concurrent.futures import ThreadPoolExecutor
io_pool_exc = ThreadPoolExecutor()

And then simply offload any blocking I/O calls to the executor:

...
line = yield from loop.run_in_executor(io_pool_exc, f.readline)
...
Answered By: Jashandeep Sohi

Using the aiofiles:

async with aiofiles.open('filename', mode='r') as f:
    async for line in f:
        print(line)

EDIT 1

As the @Jashandeep mentioned, you should care about blocking operations:

Another method is select and or epoll:

from select import select

files_to_read, files_to_write, exceptions = select([f1, f2], [f1, f2], [f1, f2], timeout=.1)

The timeout parameter is important here.

see: https://docs.python.org/3/library/select.html#select.select

EDIT 2

You can register a file for read/write with: loop.add_reader()

It uses internal EPOLL Handler inside the loop.

EDIT 3

But remember the Epoll will not work with regular files.

Answered By: pylover
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.