How to convert a worksheet to a Data frame in Pandas?

Question:

I am trying to read different worksheets from an Excel workbook in Python with Pandas. When I read the entire workbook and then I want to apply a .merge() then the first worksheet is read but the others are not considered. I tried to read each worksheet of the workbook but I guess they were not successfully converted to data frames because when I apply .merge() I end up with the following error: ValueError: Invalid file path or buffer object type: <class 'pandas.core.frame.DataFrame'>

This is what I have done so far:

This code works for converting the entire workbook to a data frame but only the data of the first worksheet is processed

import pandas as pd
import pypyodbc

#sql extractor
start_date = date.today()
retrieve_values = "[DEV].[CS].[QT_KPIExport] @start_date='{start_date:%Y-%m-%d}'".format(
    start_date=start_date)
connection = pypyodbc.connect(driver="{SQL Server}", server="xxx.xxx.xxx.xxx", uid="X",pwd="xxx", Trusted_Connection="No")
data_frame_sql = pd.read_sql(retrieve_values, connection)

#Read the entire workbook 
wb_data = pd.ExcelFile("C:\Users\Dev\Testing\Daily_Data\NSN-Daily Data Report.xlsx")
#Convert to a dataframe the entire workbook
data_frame_excel = pd.read_excel(wb_data,index_col=None,na_values=['NA'],parse_cols="J")

#apply merge
merged_df   = data_frame_sql.merge(data_frame_excel,how="inner",on="sectorname")

This code tries to read the different worksheets and convert them to data frames with no success…yet! (check the answer below)

data_frame_sql = pd.read_sql(retrieve_values, connection)

#Method 1: Tried to parse worksheet 2
#Read the entire workbook and select the specific worksheet
wb_data = pd.ExcelFile("C:\Users\Dev\Testing\Daily_Data\NSN-Daily Data Report.xlsx", sheetname="Sheet-2")
data_frame_excel = pd.read_excel(wb_data,index_col=None,na_values=['NA'],parse_cols="J")

#apply merge
merged_df   = data_frame_sql.merge(data_frame_excel,how="inner",on="sectorname")
#No success... the data of the first sheet is read

#Method 2: Tried to parse worksheet 2
#Read the entire workbook
wb_data = pd.ExcelFile("C:\Users\Dev\Testing\Daily_Data\NSN-Daily Data Report.xlsx")
data_frame_excel = pd.read_excel(wb_data,index_col=None,na_values=['NA'],parse_cols="J")

#select one specific sheet
ws_sheet_2 = wb_data.parse("Sheet-2")

#apply merge
merged_df   = data_frame_sql.merge(ws_sheet_2,how="inner",on="sectorname")
# No success.... ValueError: Invalid file path or buffer object type: <class 'pandas.core.frame.DataFrame'>

Any help or advice is greatly appreciated.

Asked By: abautista

||

Answers:

I found out a solution that did the trick.

#Method 1: Add the sheetname once you have read the entire workbook 
#Read the entire workbook 
wb_data = pd.ExcelFile("C:\Users\Dev\Testing\Daily_Data\NSN-Daily Data 
Report.xlsx")
#Select your sheetname to read 
data_frame_excel = pd.read_excel(wb_data,index_col=None,na_values=
['NA'],parse_cols="J" sheetname="Sheet-2")

#apply merge
merged_df   = 
data_frame_sql.merge(data_frame_excel,how="inner",on="sectorname")
Answered By: abautista

You can get all worksheets from a workbook into a dictionary by using the sheetname=None argument with the read_excel method. Key/value pairs will be ws name/dataframe.

ws_dict = pd.read_excel('excel_file.xlsx', sheetname=None)

Note the sheetname argument will change to sheet_name in future pandas versions…

Answered By: b2002

To read .xlsx files in Pandas, for a document with multiple sheets, specify the sheet name and use a different engine.

Step 1 (install the openpyxl package):

! pip install openpyxl

Step 2 (use the openpyxl engine):

data_df = pd.read_excel(<ARCHIVE_PATH>, sheetname= <sheet_name>, engine='openpyxl')

Here is the official documentation.

Another solution using openpyxl directly:

wb = load_workbook(ARCHIVE_PATH)
ws = wb[<sheet-name>]
data_df = pd.DataFrame(ws.values) 
Answered By: Gennaro
df_tm = sheet.values
coluna_tm = next(df_tm)[0:]
df = pd.DataFrame(df_tm, columns=coluna_tm)
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.