How can combine or merge all worksheets within an Excel file into one worksheet using python?

Question:

I’m trying to merge all worksheet tabs of each excel file within a provided file path into one worksheet. For Example if there’s 5 Excel files with multiple amounts worksheet tabs, each Excel file now only contains one merged worksheet tab. I would like to append each of these merged worksheet tabs to a created output file.

My Code below is able to appended all worksheet tabs from the provided source files to an output file but I can’t figure out how to merge the worksheet tabs from the source first first. Does anyone have an Idea how I can accomplish this? Please see my code below – Thanks!:

import glob
import os
import pandas as pd
import sys
import os.path
from openpyxl import load_workbook
from openpyxl import Workbook

#System Arguments
folder = sys.argv[1]
inputFile = sys.argv[2]
outputFile = sys.argv[3]

# specifying the path to xlsx files
path = r""+folder+""

#Create the new Excel Workbook with nameing convention provided by user
def create_file():
    wb = Workbook()
    wb.save(outputFile)

#Append the NEW or EXISTING Workbook with Input Files and Tabs to the already existing Excel File
def appened_file():
    outputPath = outputFile
    book = load_workbook(outputPath)
    writer = pd.ExcelWriter(outputPath, engine = 'openpyxl', mode="a", if_sheet_exists="new")
    writer.book = book
    for filename in glob.glob(path + "*" + inputFile + "*"):
            print(filename)
            excel_file = pd.ExcelFile(filename)
            (_, f_name) = os.path.split(filename)
            (f_short_name, _) = os.path.splitext(f_name)
            for sheet_name in excel_file.sheet_names:
                df_excel = pd.read_excel(filename, sheet_name=sheet_name,engine='openpyxl')
                df_newSheets = pd.DataFrame(df_excel)
                df_newSheets.to_excel(writer, sheet_name, index=False)                   
    writer.save()




Asked By: Nantourakis

||

Answers:

If you want to do what is said in the title, you could do this solely with pandas, as pd.read_excel(path_input, sheet_name=None) can read all worksheets of a workbook in one pass:

import pandas as pd

path_input = r"test.xlsx"
path_output = r"finished.xlsx"

df_lst = pd.read_excel(path_input, sheet_name=None).values()
df_res = pd.concat(
    [df_.transpose().reset_index().transpose() for df_ in df_lst])
df_res.to_excel(path_output, index=False, header=False)

It would also be possible to do this with xlwings or openpyxl, but usually pandas is fast.

Example with data
Assume an Excel workbook with three worksheets.

Worksheet1:

a   b   c
foo cor wal
bar gra plu
baz ult xyz
qux ply thu

Worksheet2:

u   v   w   x   y   z
12  92  86  22      80
23  29      74      21
16  10  75  67  61  99

Worksheet3:

I   II  III IV
1   5   9   1
2   6   0   6
3   7       3
4   8   2   0

Final output (after executing this snippet, i.e. after to_excel):

a   b   c
foo cor wal
bar gra plu
baz ult xyz
qux ply thu
u   v   w   x   y   z
12  92  86  22      80
23  29      74      21
16  10  75  67  61  99
I   II  III IV
1   5   9   1
2   6   0   6
3   7       3
4   8   2   0
Answered By: mouwsy