List of list of DataFrame overwrites previous values (pandas, python)

Question

I have a set of Excel files, where I want to summarize some data. The data in one Excel file is spread over 5 sheets. I now want to create a new excel file with 5 sheets, where on every sheet the data of all Excel files is summarized for the respective sheet.

The way I wanted to go, is to create a list of list of DataFrame, where on each row the data from a respective sheet of all files is collected and later on concatenate each row, so I end up with 5 DataFrames I can write to 5 sheets of a new Excel file. The code I created for this, looks like:

import glob
import pandas as pd
from tkinter import filedialog


def select_base_path():

    root = filedialog.askdirectory(
        title='Select base path',
        mustexist=True)

    return root


if __name__ == '__main__':

    base_path = select_base_path()

    files = []
    for file in glob.glob(str(base_path) + '**\10x 0.45 SFR average .xlsx', recursive=True):
        files.append(file)


    sheets = ['Center', 'north west', 'south west', 'north east', 'south east']
    Frames = [[pd.DataFrame()] * len(files)] * len(sheets)
    data_frames = [[]] * len(sheets)
    ids = []
    for k in range(len(files)):
        ids.append('Adapter ' + files[k][files[k].find('#'):files[k].find('#')+3])

    for i, file in enumerate(files):
        if i == 0:
            for j, sheet in enumerate(sheets):
                if sheet == 'Center':
                    Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols='A,E,G,K,M,Q,S,W')
                else:
                    Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols='A,E,G,K')
        else:
            for j, sheet in enumerate(sheets):
                if sheet == 'Center':
                    Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols='E,K,Q,W')
                else:
                    Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols='E,K')

    for m in range(len(sheets)):
        data_frames[m] = pd.concat(Frames[m], axis=1, keys=ids)

The problem I am facing with this is that when it iterates through the Frames, it does not write to a single location Frames[j,i] in the list of list of DataFrame, but instead writes the data to Frames[:,i] and therefore overwriting the data, every time it is iterating through the sheets. This ends in the fact, that the slices in i are all identical in the end.

When having a look at the debugger, after the first pass (i=0, j=0) I already end up with data in Frames[:,i]. I expect just having data in Frames[j, i]. Where is my misconception here?

Asked By: JMP

||

Source

Answer 1

A list comprehension produces new unrelated objects and may avoid the observed problem

Frames2 = [[pd.DataFrame() for i in range(len(files))]
           for j in range(len(sheets))]

Answered By: P.Jo

List of list of DataFrame overwrites previous values (pandas, python)

Question:

Answers: