Reading multiple csv files into separate dataframes in Python

Question:

I have read multiple answers but none have worked in my case so far. I want to read multiple csv files (which may not be in the same directory as my python file), without specifying names (as I may have to read thousands of such files). I want to do something like the last example in this but I am not sure how to add my desktop path.

I tried the following, as given in the link:

# Assign path. The folder "Healthy" contains all the csv files
path, dirs, files = next(os.walk("/Users/my_name/Desktop/All hypnograms/Healthy"))
file_count = len(files)
# create empty list
dataframes_list = []
 
# append datasets to the list
for i in range(file_count):
    temp_df = pd.read_csv("./csv/"+files[i])
    dataframes_list.append(temp_df)

However, I got the following error: "FileNotFoundError: [Errno 2] No such file or directory:". I am using MAC OS. Can someone please help? Thank you!

Asked By: S C

||

Answers:

I guess you should specify the whole path in read_csv method by adding the path variable to the concatenated string. Something like :

for i in range(file_count):
    temp_df = pd.read_csv(path + "/csv/" + files[i])
    dataframes_list.append(temp_df)

You can remove the "/csv/" by doing path + files[i] directly if your CSV files are in the Healthy directory

Answered By: SWEEPY

You can use pathlib to do that easily:

import pandas as pd
import pathlib

DATA_DIR = pathlib.Path.home() / 'Desktop' / 'All hypnograms' / 'Healthy' / 'csv'

dataframes_list = []
for csvfile in DATA_DIR.glob('**/*.csv'):
    temp_df = pd.read_csv(csvfile)
    dataframes_list.append(temp_df)
Answered By: Corralien

In your example, path is the root of each file in files, so you can do

temp_df = pd.read_csv(os.path.join(path, files[i]))

But we really wouldn’t do it this way. Suppose there aren’t any files in the directory, then next(os.walk("/Users/my_name/Desktop/All hypnograms/Healthy")) would raise a StopIteration error that you don’t handle. I think it would be more natural to use os.listdir, glob.glob or even pathlib.Path. Since pathlib keeps track of the root for you, a good choice is

from pathlib import Path 
import pandas as pd

healthy = Path("/Users/my_name/Desktop/All hypnograms/Healthy")
dataframes_list = [pd.read_csv(file) for file in healthy.iterdir()
    if file.is_file()]

Many pandas errors inherit from ValueError. If you have problems with some files, you can put the read into an exception handler to find out which files are in error

dataframes_list = []
error_files = []

for file in helthy.iterdir():
    if file.is_file():
        try:
            dataframes_list.append(pd.read_csv(file, skiprows=18))
        except ValueError as e:
            error_files.append(file)
            print(f"{file}: {e}")
Answered By: tdelaney

Assuming you want indeed to filter the files list by excluding non .csv files in order to use the pandas method read_csv :

Proposed code to execute :

Like you do not provide dataframe to work with I voluntarily excluded pd.read_csv but you would have to use pd.read_csv(os.path.join(path, f)) in real code.

import os
from pathlib import Path

# Let'us suppose path and files following values
path = '/home/Motors'
files = ['engine.html', 'engine.csv']

dataframes_list=[]

for f in files:
    if Path(f).suffixes[0]=='.csv':
        # temp_df = pd.read_csv(os.path.join(path, f))
        temp_df = os.path.join(path, f)
        dataframes_list.append(temp_df)
print(dataframes_list)

Result :

['/home/Motors/engine.csv']

To answer to S C comment:

What you should do is, as a first step, create a an iterator containing all the names.
And after that to read it by chunks to make short listnames to process.

filenames = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M']

def iterchunks(filenames, n):
    for i in range(0, len(filenames), n):
        yield filenames[i:i + n]

chk = iterchunks(filenames, n=2)

print(next(chk))       
# ['A', 'B']

print(next(chk))       
# ['C', 'D']
Answered By: Laurent B.
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.