Multiple Excel Files as Separate Sheets Using Python

Question:

Most of the articles I’m seeing either:
a) Combine multiple excel single-sheet workbooks into one master workbook with just a single sheet or;
b) Split a multiple-sheet excel workbook into individual workbooks.

However, my goal is to grab all the excel files in a specific folder and save them as individual sheets within one new master excel workbook. I’m trying to rename each sheet name as the name of the original file.

import pandas as pd
import glob
import os

file = "C:\File\Path\"
filename = 'Consolidated Files.xlsx'
pth = os.path.dirname(file)
extension = os.path.splitext(file)[1]
files = glob.glob(os.path.join(pth, '*xlsx'))

w = pd.ExcelWriter(file + filename)

for f in files:
    print(f)
    df = pd.read_excel(f, header = None)
    print(df)
    df.to_excel(w, sheet_name = f, index = False)
   
w.save()

How do I adjust the names for each sheet? Also, if you see any opportunities to clean this up please let me know

Asked By: pilotmike327

||

Answers:

You cannot rename sheet with special characters because f is full path and file name. You should use only filename to names sheetname, Use os.path.basename to get file name and use split to seperate file name and extension.

for f in files:
    print(f)
    df = pd.read_excel(f, header = None)
    print(df)
    
    # Use basename to get filename with extension
    # Use split to seperate filename and extension
    new_sheet_name = os.path.basename(f).split('.')[0]
    
    # 
    df.to_excel(w, sheet_name = new_sheet_name , index = False)

I decided to put my solution here as well, just in case it would be useful to anyone.

Thing is, I wanted to be able to recall where the end sheet came from. However, source workbooks can (and likely will) often have same sheet names like "Sheet 1", so I couldn’t just use sheet names from original workbooks. I also could not use source filenames as sheet names since they might be longer than 31 character, which is maximum sheet name length allowed by Excel.

Therefore, I ended up assigning incremental numbers to resulting sheet names, while simultaneously inserting a new column named "source" at the start of each sheet and populating it with file name concatenated with sheet name. Hope it might help someone 🙂

from glob import glob
import pandas as pd
import os

files_input = glob(r'C:Pathtofolder*.xlsx')

result_DFs = []

for xlsx_file in files_input:
    file_DFs = pd.read_excel(xlsx_file, sheet_name=None)
    # save every sheet from every file as dataframe to an array
    for sheet_DF in file_DFs:
        source_name = os.path.basename(xlsx_file) + ":" + sheet_DF
        file_DFs[sheet_DF].insert(0, 'source', source_name)
        result_DFs.append(file_DFs[sheet_DF])

with pd.ExcelWriter(r'C:Pathtoresultingfile.xlsx') as writer:
    for df_index in range(len(result_DFs)):
        # write dataframe to file using simple incremental number as a new sheet name
        result_DFs[df_index].to_excel(writer, sheet_name=str(df_index), index=False)
        # auto-adjust column width (can be omitted if not needed)
        for i, col in enumerate(result_DFs[df_index].columns):
            column_len = max(result_DFs[df_index][col].astype(str).str.len().max(), len(col) + 3)
            _ = writer.sheets[str(df_index)].set_column(i, i, column_len)
Answered By: Alex
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.