Run Python script for many files from different folders

Question:

I have a Python script that uses two files for running (they are excel files that the script takes as dataframes. Both excels are called in the same way but they are saved in different folders. It is defined in the script from which folders has to take the files and the name of the file). I have to execute this same script for more than 100 pair of files. The execution is the same for all of them. Is there a way to make that the script considers and be executed for each pair of excels automatically? Without having to write the same lines of code for each pair of excel files

folderA contains: fileA_1.xlsx fileA_30.xlsx …etc

folderB contains: fileB_1.xlsx fileB_30.xlsx …etc

I have, for fileA_1 and fileB_1, the following script:

import pandas as pd

writer1 = pd.ExcelWriter('file1.xlsx')

fileA_1 = pd.read_excel('folderA\fileA_1.xlsx')
fileB_1 = pd.read_excel('folderB\fileB_1.xlsx')

file1_merge = fileA_1.merge(fileB_1, how = 'left', indicator = True)
file1_merge_count = file1_merge.shape[0]
file1_merge_count

file1_merge.to_excel(writer1, 'file1_comparison', index = False)  

I need to replicate this same script for each pair of files I have to analize.

Asked By: Rfl

||

Answers:

You can use a loop to automate the process of running your script for multiple pairs of excel files. Here’s one way to go about it:

import os
import pandas

folderA = '/path/to/folderA'
folderB = '/path/to/folderB'

files_A = sorted([f for f in os.listdir(folderA) if f.endswith('.xlsx')])
files_B = sorted([f for f in os.listdir(folderB) if f.endswith('.xlsx')])

for fileA, fileB in zip(files_A, files_B):
    fileA_df = pd.read_excel(f"{folderA}/{fileA}")
    fileB_df = pd.read_excel(f"{folderB}/{fileB}")
    file_merge = fileA_df.merge(fileB_df, how = 'left', indicator = True)
    file_merge.to_excel(f'{fileA.split(".")[0]}_comparison.xlsx', index = False)

NOTE:
This code allows you to iterate in pairs of excel files in two different folders, it’ll work if both folders have the same amount of files and the pairs are in alphabetic order or both files have the same name

Example of folder architecture:

/path/to/
  |-- folderA/
  |    |-- fileA_01.xlsx
  |    |-- fileA_02.xlsx
  |    |-- ...
  |    |-- fileA_30.xlsx
  |-- folderB/
  |    |-- fileB_01.xlsx
  |    |-- fileB_02.xlsx
  |    |-- ...
  |    |-- fileB_30.xlsx
Answered By: Pedro Rocha

This solution uses pathlib.Path to join the folder and file paths, and path.iterdir() to loop over the files in FolderA.

from pathlib import Path

folderA = Path("path/to/folderA/")
folderB = Path("path/to/folderB/")

for file in folderA.iterdir():
        
    fileA = pd.read_excel(file)
    try:
        fileB = pd.read_excel(folderB / Path(file.name))
    except FileNotFoundError as e:
        print(f"File {file.name} was found in FolderA but not in FolderB")
        continue

    file_merge = fileA.merge(fileB, how='left', indicator=True)
    file_merge_count = file_merge.shape[0]

    writer = pd.ExcelWriter(f"file_{file.stem}.xlsx")
    file_merge.to_excel(writer, f'file_{file.stem}_comparison', index=False)

Edit: Re-written based on clarification that the target filenames don’t have numerically sequential filenames. Instead this will now loop over the files in FolderA, find the matching file in folderB and then name the merged file with the same file stem.

Answered By: nigh_anxiety
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.