How to read an existing worksheet table with Openpyxl?
Question:
A range of cells in an Excel worksheet may be formatted as a table. Openpyxl provides, in the documentation, an example of how to write such a table.
How would Openpyxl be used to read an existing Excel sheet table?
A simple openpyxl statement that, when provided with the table name, would read the table into an openpyxl Table object.
Answers:
Openpyxl stores all the worksheet tables in a list. These can be easily read by:
tables = sheet._tables
Then it is possible to search for the desired table by its tableName, returning the range:
for table in tables:
if table.displayName == 'Table1':
return table.ref
Below is a MWE:
from openpyxl import load_workbook
book = load_workbook('table.xlsx')
sheet = book.active
tables = sheet._tables
table_name = 'Table1'
def find_table(table_name, tables):
for table in tables:
if table.displayName == table_name:
return table.ref
table_range = find_table(table_name, tables)
The following function reads the cell values from the range defined by the table name and returns a tuple containing a list of column headers and a dict of the data. This is useful to then create a Pandas DataFrame:
from openpyxl import load_workbook
import pandas as pd
def read_excel_table(sheet, table_name):
"""
This function will read an Excel table
and return a tuple of columns and data
This function assumes that tables have column headers
:param sheet: the sheet
:param table_name: the name of the table
:return: columns (list) and data (dict)
"""
table = sheet.tables[table_name]
table_range = table.ref
table_head = sheet[table_range][0]
table_data = sheet[table_range][1:]
columns = [column.value for column in table_head]
data = {column: [] for column in columns}
for row in table_data:
row_val = [cell.value for cell in row]
for key, val in zip(columns, row_val):
data[key].append(val)
return columns, data
book = load_workbook('table.xlsx')
ws = book.active
columns, data = read_excel_table(ws, 'Table1')
df = pd.DataFrame(data=data, columns=columns)
The answer from @So_tourist provides the way to get the range of cells in the table, not the Table object as asked.
To get the openpyxl.worksheet.table.Table object you can do this:
sheet.tables.get('MyTable')
NOTE: this answer holds for openpyxl 3.0.6, not sure about previous or later versions.
A simple variation using @So_tourist’s code but leveraging on pd.read_excel()
function:
from openpyxl import load_workbook
import pandas as pd
def tblname2df(filename,sheetname,tablename):
wb = load_workbook(filename, data_only=True)
ws = wb[sheetname]
# range of table
cellrange = ws.tables[tablename].ref
# column range of table
cols = [column.value for column in ws[cellrange][0]]
# number of rows in table
n_rows = len(ws[cellrange][1:])
# number of rows to skip
skip = int(cellrange[1])-1
# return the dataframe
return pd.read_excel(filename,sheetname,usecols=cols,skiprows=skip,nrows=n_rows)
you can load the DataFrame in df
calling the function tblname2df
using
df = tblname2df('workbook.xlsx','Sheet','Table')
A range of cells in an Excel worksheet may be formatted as a table. Openpyxl provides, in the documentation, an example of how to write such a table.
How would Openpyxl be used to read an existing Excel sheet table?
A simple openpyxl statement that, when provided with the table name, would read the table into an openpyxl Table object.
Openpyxl stores all the worksheet tables in a list. These can be easily read by:
tables = sheet._tables
Then it is possible to search for the desired table by its tableName, returning the range:
for table in tables:
if table.displayName == 'Table1':
return table.ref
Below is a MWE:
from openpyxl import load_workbook
book = load_workbook('table.xlsx')
sheet = book.active
tables = sheet._tables
table_name = 'Table1'
def find_table(table_name, tables):
for table in tables:
if table.displayName == table_name:
return table.ref
table_range = find_table(table_name, tables)
The following function reads the cell values from the range defined by the table name and returns a tuple containing a list of column headers and a dict of the data. This is useful to then create a Pandas DataFrame:
from openpyxl import load_workbook
import pandas as pd
def read_excel_table(sheet, table_name):
"""
This function will read an Excel table
and return a tuple of columns and data
This function assumes that tables have column headers
:param sheet: the sheet
:param table_name: the name of the table
:return: columns (list) and data (dict)
"""
table = sheet.tables[table_name]
table_range = table.ref
table_head = sheet[table_range][0]
table_data = sheet[table_range][1:]
columns = [column.value for column in table_head]
data = {column: [] for column in columns}
for row in table_data:
row_val = [cell.value for cell in row]
for key, val in zip(columns, row_val):
data[key].append(val)
return columns, data
book = load_workbook('table.xlsx')
ws = book.active
columns, data = read_excel_table(ws, 'Table1')
df = pd.DataFrame(data=data, columns=columns)
The answer from @So_tourist provides the way to get the range of cells in the table, not the Table object as asked.
To get the openpyxl.worksheet.table.Table object you can do this:
sheet.tables.get('MyTable')
NOTE: this answer holds for openpyxl 3.0.6, not sure about previous or later versions.
A simple variation using @So_tourist’s code but leveraging on pd.read_excel()
function:
from openpyxl import load_workbook
import pandas as pd
def tblname2df(filename,sheetname,tablename):
wb = load_workbook(filename, data_only=True)
ws = wb[sheetname]
# range of table
cellrange = ws.tables[tablename].ref
# column range of table
cols = [column.value for column in ws[cellrange][0]]
# number of rows in table
n_rows = len(ws[cellrange][1:])
# number of rows to skip
skip = int(cellrange[1])-1
# return the dataframe
return pd.read_excel(filename,sheetname,usecols=cols,skiprows=skip,nrows=n_rows)
you can load the DataFrame in df
calling the function tblname2df
using
df = tblname2df('workbook.xlsx','Sheet','Table')