How to convert a worksheet to a Data frame in Pandas?

Question

I am trying to read different worksheets from an Excel workbook in Python with Pandas. When I read the entire workbook and then I want to apply a .merge() then the first worksheet is read but the others are not considered. I tried to read each worksheet of the workbook but I guess they were not successfully converted to data frames because when I apply .merge() I end up with the following error: ValueError: Invalid file path or buffer object type: <class 'pandas.core.frame.DataFrame'>

This is what I have done so far:

This code works for converting the entire workbook to a data frame but only the data of the first worksheet is processed

import pandas as pd
import pypyodbc

#sql extractor
start_date = date.today()
retrieve_values = "[DEV].[CS].[QT_KPIExport] @start_date='{start_date:%Y-%m-%d}'".format(
    start_date=start_date)
connection = pypyodbc.connect(driver="{SQL Server}", server="xxx.xxx.xxx.xxx", uid="X",pwd="xxx", Trusted_Connection="No")
data_frame_sql = pd.read_sql(retrieve_values, connection)

#Read the entire workbook 
wb_data = pd.ExcelFile("C:\Users\Dev\Testing\Daily_Data\NSN-Daily Data Report.xlsx")
#Convert to a dataframe the entire workbook
data_frame_excel = pd.read_excel(wb_data,index_col=None,na_values=['NA'],parse_cols="J")

#apply merge
merged_df   = data_frame_sql.merge(data_frame_excel,how="inner",on="sectorname")

This code tries to read the different worksheets and convert them to data frames with no success…yet! (check the answer below)

data_frame_sql = pd.read_sql(retrieve_values, connection)

#Method 1: Tried to parse worksheet 2
#Read the entire workbook and select the specific worksheet
wb_data = pd.ExcelFile("C:\Users\Dev\Testing\Daily_Data\NSN-Daily Data Report.xlsx", sheetname="Sheet-2")
data_frame_excel = pd.read_excel(wb_data,index_col=None,na_values=['NA'],parse_cols="J")

#apply merge
merged_df   = data_frame_sql.merge(data_frame_excel,how="inner",on="sectorname")
#No success... the data of the first sheet is read

#Method 2: Tried to parse worksheet 2
#Read the entire workbook
wb_data = pd.ExcelFile("C:\Users\Dev\Testing\Daily_Data\NSN-Daily Data Report.xlsx")
data_frame_excel = pd.read_excel(wb_data,index_col=None,na_values=['NA'],parse_cols="J")

#select one specific sheet
ws_sheet_2 = wb_data.parse("Sheet-2")

#apply merge
merged_df   = data_frame_sql.merge(ws_sheet_2,how="inner",on="sectorname")
# No success.... ValueError: Invalid file path or buffer object type: <class 'pandas.core.frame.DataFrame'>

Any help or advice is greatly appreciated.

Asked By: abautista

||

Source

Answer 1

I found out a solution that did the trick.

#Method 1: Add the sheetname once you have read the entire workbook 
#Read the entire workbook 
wb_data = pd.ExcelFile("C:\Users\Dev\Testing\Daily_Data\NSN-Daily Data 
Report.xlsx")
#Select your sheetname to read 
data_frame_excel = pd.read_excel(wb_data,index_col=None,na_values=
['NA'],parse_cols="J" sheetname="Sheet-2")

#apply merge
merged_df   = 
data_frame_sql.merge(data_frame_excel,how="inner",on="sectorname")

Answered By: abautista

Answer 2

You can get all worksheets from a workbook into a dictionary by using the sheetname=None argument with the read_excel method. Key/value pairs will be ws name/dataframe.

ws_dict = pd.read_excel('excel_file.xlsx', sheetname=None)

Note the sheetname argument will change to sheet_name in future pandas versions…

Answered By: b2002

Answer 3

To read .xlsx files in Pandas, for a document with multiple sheets, specify the sheet name and use a different engine.

Step 1 (install the openpyxl package):

! pip install openpyxl

Step 2 (use the openpyxl engine):

data_df = pd.read_excel(<ARCHIVE_PATH>, sheetname= <sheet_name>, engine='openpyxl')

Here is the official documentation.

Another solution using openpyxl directly:

wb = load_workbook(ARCHIVE_PATH)
ws = wb[<sheet-name>]
data_df = pd.DataFrame(ws.values)

Answered By: Gennaro

Answer 4

df_tm = sheet.values
coluna_tm = next(df_tm)[0:]
df = pd.DataFrame(df_tm, columns=coluna_tm)

Answered By: Wanderson Bittencourt

How to convert a worksheet to a Data frame in Pandas?

Question:

This code works for converting the entire workbook to a data frame but only the data of the first worksheet is processed

This code tries to read the different worksheets and convert them to data frames with no success…yet! (check the answer below)

Answers: