ValueError : I/O operation on closed file (local machine OK but not Google Colab)

Question:

I have some CSV files in a folder. A function was defined, to read a column of it (from each CSV file), times the values, find out the max, and print it out.

I’d like the output to be written into a text file.

The lines work well on local machine.

But when it’s put on Google Colab, it produces an error, and seems keep running no stop:

Exception in callback BaseAsyncIOLoop._handle_events(17, 1)
handle: <Handle BaseAsyncIOLoop._handle_events(17, 1)>
Traceback (most recent call last):
  File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.7/dist-packages/tornado/platform/asyncio.py", line 122, in _handle_events
    handler_func(fileobj, events)
  File "/usr/local/lib/python3.7/dist-packages/tornado/stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/zmq/eventloop/zmqstream.py", line 451, in _handle_events
    self._handle_recv()
  File "/usr/local/lib/python3.7/dist-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
    self._run_callback(callback, msg)
  File "/usr/local/lib/python3.7/dist-packages/zmq/eventloop/zmqstream.py", line 434, in _run_callback
    callback(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tornado/stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 239, in dispatch_shell
    sys.stdout.flush()
ValueError: I/O operation on closed file.

Where went wrong, and how can it be corrected?

from google.colab import drive
drive.mount('/content/drive')

import pandas as pd
import numpy as np
import glob, sys

folder = "/content/drive/My Drive/Data folder/"

def to_cal(file_name, times):
  df['Result'] = df['Unit Price'] * times
  print (file_name, df['Result'].max())
  return

files = glob.glob(folder + "/*.csv")

with open(folder + 'output (testing).txt', 'a') as outfile:
  sys.stdout = outfile

  for f in files:
    df = pd.read_csv(f)
    file_name = f.replace(folder, "")
    to_cal(file_name, 10)
outfile.close()
Asked By: Mark K

||

Answers:

I run it on Colab and FULL error message shows very intersting: sys.stdout.flush().
It can confirm that problem makes sys.stdout = outfile.

On local computer you probably runs as python script so it always starts with new intepreter which uses new sys.stdout and close doesn’t make problem but on Colab (and probably in other Python shells) it runs all time the same interpreter and when first executions closes sys.stdout then other execution may have problem to use it.

if you want to redirect print() to file then better use

print(..., file=outfile)

Or maybe write it in normal way

text = '{} {}n'.format(file_name, df['Result'].max())
outfile.write(text)
Answered By: furas