If there is no way to put a timeout in pandas read_csv, how to proceed?

Question:

The CSV files linked to Google Sheets if by any chance there is a problem, it can’t finish executing the task and stays in the same place for eternity, so I need to add a timeout in the attempt to import the CSV.

I am currently test the situation with func-timeout:

from func_timeout import func_timeout, FunctionTimedOut
import pandas as pd

try:
  csv_file = 'https://docs.google.com/spreadsheets/d/e/XXXX/pub?gid=0&single=true&output=csv'
  df = func_timeout(30, pd.read_csv, args=(csv_file))
except FunctionTimedOut:
  print('timeout')
except Exception as e:
  print(e)

But return this error (which apparently besides not having worked, in the future it will become unusable because there is the FutureWarning alert):

FutureWarning: In a future version of pandas all arguments of read_csv except for the argument 'filepath_or_buffer' will be keyword-only.
  self._target(*self._args, **self._kwargs)
read_csv() takes from 1 to 52 positional arguments but 168 were given

When my expected output is:

      SS_Id      SS_Match  xGoals_Id xGoals_Match       Bf_Id          Bf_Match
0  10341056  3219 x 65668        NaN            x    31539043  194508 x 5408226
1  10340808   3217 x 3205        NaN            x    31537759  220949 x 1213581
2  10114414   2022 x 1972        NaN            x    31535268  4525642 x 200603
3  10114275  1974 x 39634        NaN            x    31535452  198124 x 6219238

I would like some assistance in finding the best solution for my current situation and need.

Asked By: Digital Farmer

||

Answers:

There’s a syntax error here: args=(csv_file) which leads to the FutureWarning down the line. You want a singlet (tuple with 1 value) like this: args=(csv_file, )

The comma makes the tuple!

(Riddle: Why did it say you passed 168 arguments?)

# it should work with a proper argument tuple.
df = func_timeout(30, pd.read_csv, args=(csv_file, ))
Answered By: creanion

Using the library func_timeout is not strictly necessary.
Pandas uses urllib to fetch urls and this library wraps the lower level socket library which has a timeout parameter. However Pandas doesn’t expose that timeout parameter to the user, but you can set it through socket.setdefaulttimeout before launching the main program.

So at the beginning define:

TIMEOUT_SEC = 10 # default timeount in seconds
import socket
socket.setdefaulttimeout(TIMEOUT_SEC)
import pandas as pd

and then your code:

try:
  csv_file = 'https://docs.google.com/spreadsheets/d/e/XXXX/pub?gid=0&single=true&output=csv'
  df = pd.read_csv(csv_file)
except Exception as e:
  print(e)
Answered By: Luca
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.