List of list of DataFrame overwrites previous values (pandas, python)
Question:
I have a set of Excel files, where I want to summarize some data. The data in one Excel file is spread over 5 sheets. I now want to create a new excel file with 5 sheets, where on every sheet the data of all Excel files is summarized for the respective sheet.
The way I wanted to go, is to create a list of list of DataFrame, where on each row the data from a respective sheet of all files is collected and later on concatenate each row, so I end up with 5 DataFrames I can write to 5 sheets of a new Excel file. The code I created for this, looks like:
import glob
import pandas as pd
from tkinter import filedialog
def select_base_path():
root = filedialog.askdirectory(
title='Select base path',
mustexist=True)
return root
if __name__ == '__main__':
base_path = select_base_path()
files = []
for file in glob.glob(str(base_path) + '**\10x 0.45 SFR average .xlsx', recursive=True):
files.append(file)
sheets = ['Center', 'north west', 'south west', 'north east', 'south east']
Frames = [[pd.DataFrame()] * len(files)] * len(sheets)
data_frames = [[]] * len(sheets)
ids = []
for k in range(len(files)):
ids.append('Adapter ' + files[k][files[k].find('#'):files[k].find('#')+3])
for i, file in enumerate(files):
if i == 0:
for j, sheet in enumerate(sheets):
if sheet == 'Center':
Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols='A,E,G,K,M,Q,S,W')
else:
Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols='A,E,G,K')
else:
for j, sheet in enumerate(sheets):
if sheet == 'Center':
Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols='E,K,Q,W')
else:
Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols='E,K')
for m in range(len(sheets)):
data_frames[m] = pd.concat(Frames[m], axis=1, keys=ids)
The problem I am facing with this is that when it iterates through the Frames, it does not write to a single location Frames[j,i] in the list of list of DataFrame, but instead writes the data to Frames[:,i] and therefore overwriting the data, every time it is iterating through the sheets. This ends in the fact, that the slices in i are all identical in the end.
When having a look at the debugger, after the first pass (i=0, j=0) I already end up with data in Frames[:,i]. I expect just having data in Frames[j, i]. Where is my misconception here?
Answers:
A list comprehension produces new unrelated objects and may avoid the observed problem
Frames2 = [[pd.DataFrame() for i in range(len(files))]
for j in range(len(sheets))]
I have a set of Excel files, where I want to summarize some data. The data in one Excel file is spread over 5 sheets. I now want to create a new excel file with 5 sheets, where on every sheet the data of all Excel files is summarized for the respective sheet.
The way I wanted to go, is to create a list of list of DataFrame, where on each row the data from a respective sheet of all files is collected and later on concatenate each row, so I end up with 5 DataFrames I can write to 5 sheets of a new Excel file. The code I created for this, looks like:
import glob
import pandas as pd
from tkinter import filedialog
def select_base_path():
root = filedialog.askdirectory(
title='Select base path',
mustexist=True)
return root
if __name__ == '__main__':
base_path = select_base_path()
files = []
for file in glob.glob(str(base_path) + '**\10x 0.45 SFR average .xlsx', recursive=True):
files.append(file)
sheets = ['Center', 'north west', 'south west', 'north east', 'south east']
Frames = [[pd.DataFrame()] * len(files)] * len(sheets)
data_frames = [[]] * len(sheets)
ids = []
for k in range(len(files)):
ids.append('Adapter ' + files[k][files[k].find('#'):files[k].find('#')+3])
for i, file in enumerate(files):
if i == 0:
for j, sheet in enumerate(sheets):
if sheet == 'Center':
Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols='A,E,G,K,M,Q,S,W')
else:
Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols='A,E,G,K')
else:
for j, sheet in enumerate(sheets):
if sheet == 'Center':
Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols='E,K,Q,W')
else:
Frames[j][i] = pd.read_excel(io=file, sheet_name=sheet, header=2, usecols='E,K')
for m in range(len(sheets)):
data_frames[m] = pd.concat(Frames[m], axis=1, keys=ids)
The problem I am facing with this is that when it iterates through the Frames, it does not write to a single location Frames[j,i] in the list of list of DataFrame, but instead writes the data to Frames[:,i] and therefore overwriting the data, every time it is iterating through the sheets. This ends in the fact, that the slices in i are all identical in the end.
When having a look at the debugger, after the first pass (i=0, j=0) I already end up with data in Frames[:,i]. I expect just having data in Frames[j, i]. Where is my misconception here?
A list comprehension produces new unrelated objects and may avoid the observed problem
Frames2 = [[pd.DataFrame() for i in range(len(files))]
for j in range(len(sheets))]