How to skip reading empty files with pandas in Python

Question:

I read all the files in one folder one by one into a pandas.DataFrame and then I check them for some conditions. There are a few thousand files, and I would love to make pandas raise an exception when a file is empty, so that my reader function would skip this file.

I have something like:

class StructureReader(FileList):
    def __init__(self, dirname, filename):
        self.dirname=dirname
        self.filename=str(self.dirname+"/"+filename)
    def read(self):
        self.data = pd.read_csv(self.filename, header=None, sep = ",")
        if len(self.data)==0:
           raise ValueError
class Run(object):
    def __init__(self, dirname):
        self.dirname=dirname
        self.file__list=FileList(dirname)
        self.result=Result()
    def run(self):
        for k in self.file__list.file_list[:]:
            self.b=StructureReader(self.dirname, k)
            try:
                self.b.read()
                self.b.find_interesting_bonds(self.result)
                self.b.find_same_direction_chain(self.result)
            except ValueError:
                pass

Regular file that I’m searching for some condition looks like:

"A/C/24","A/G/14","WW_cis",,
"B/C/24","A/G/15","WW_cis",,
"C/C/24","A/F/11","WW_cis",,
"d/C/24","A/G/12","WW_cis",,

But somehow I don’t ever get ValueError raised, and my functions are searching empty files, which gives me a lot of "Empty DataFrame …" lines in my results file. How can I skip empty files?

Asked By: Leukonoe

||

Answers:

You should not use pandas, but directly the python libraries. The answer is there: python how to check file empty or not

Answered By: DevShark

I’d first check if the file is empty, and if it isn’t empty I’ll try to use it with pandas. Following this link https://stackoverflow.com/a/15924160/5088142 you can find a nice way to check if a file is empty:

import os
def is_non_zero_file(fpath):  
    return os.path.isfile(fpath) and os.path.getsize(fpath) > 0
Answered By: Yaron

You can get your work done with following code, just add your CSVs path to the path variable, and run. You should get an object raw_data which is a Pandas dataframe.

import os, pandas as pd, glob
import pandas.io.common

path = "/home/username/data_folder"
files_list = glob.glob(os.path.join(path, "*.csv"))

for i in range(0,len(files_list)):
   try:
       raw_data = pd.read_csv(files_list[i])
   except pandas.errors.EmptyDataError:
      print(files_list[i], " is empty and has been skipped.")
Answered By: Ahmad M.

How about this

files = glob.glob('*.csv')
files = list(filter(lambda file: os.stat(file).st_size > 0, files))
data = pd.read_csv(files)
Answered By: Nick Mortimer
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.