Copying data from multiple excel files to specified columns using python

Question:

I’m trying to create a script which would run through excel files in a folder and copy the contents to a workbook. The aim is to copy the contents of each file onto different columns where the spacing between the columns is a set difference, ie. columns: A, D(A+3) & G(D+3). For my example I am running my code with 3 base datasets.

When I run the code, the final dataset ends up copying across the final excel document 3 times across the specified columns, instead of copying the 3 unique documents to the specified columns.

What I want: A B C
What I get: C C C

Code:

import os
import openpyxl
from openpyxl import Workbook, load_workbook
import string     

for file in os.listdir(file_path):
    if file.endswith('.xlsx'):
        print(f'Loading file {file}...')
        wb = load_workbook(file_path+file)
        ws = wb.worksheets[0]
        wb1 = load_workbook(new_path+'data.xlsx')
        ws1 = wb1.active
        #calculate max rows and columns in source dataset
        mr = ws.max_row
        mc = ws.max_column
        m = [0,3,6]
        #copying data to new sheet
        for i in range(1,mr+1):
            for j in range(1,mc+1):
                for y in range(0,3):
                    #reading cell value from source
                    c = ws.cell(row = i, column = j)
                    #writing read value to destination
                    ws1.cell(row = i, column = j+int(m[y])).value = c.value
        wb1.save(new_path+'data.xlsx')

Thank you for your help.

Edit:

The data is all in the same format and looks like:https://ibb.co/TMStH9j Current output: https://ibb.co/dmcbSJ1 Desired output: https://ibb.co/C1nqKJv

Asked By: rhynohorn

||

Answers:

You need to move the creation and saving of the new workbook out of the for loop so that it is not overwritten each time a new file is looped over.

Also you need a way to count how many files you have looped over, so that you can increment the columns where the new data is copied to in the new workbook. Please see below:

Edit:

To get your expected output, I also removed the inner-most for loop and m list to rather use a single variable to space the columns of each new excel data apart.

import os
import openpyxl
from openpyxl import Workbook, load_workbook
import string

# Create new workbook outside of for loop so that it is not overwritten each loop
wb1 = Workbook()
ws1 = wb1.active

# count variable so each loop increments the column where the data is posted
count = 0

# how many columns to space data apart
col_spacing = 2

for file in os.listdir(file_path):
    if file.endswith(".xlsx"):
        print(f"Loading file {file}...")
        wb = load_workbook(file_path + file)
        ws = wb.worksheets[0]
        # calculate max rows and columns in source dataset
        mr = ws.max_row
        mc = ws.max_column
        
        # copying data to new sheet
        for i in range(1, mr + 1):
            for j in range(1, mc + 1):
                # reading cell value from source
                c = ws.cell(row=i, column=j)
                # writing read value to destination
                ws1.cell(row=i, column=count + j + (count * col_spacing)).value = c.value
        # increment column count
        count += 1
# save new workbook after all files have been looped through
wb1.save(new_path + "data.xlsx")
Answered By: keipala
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.