How to read an existing worksheet table with Openpyxl?

Question

A range of cells in an Excel worksheet may be formatted as a table. Openpyxl provides, in the documentation, an example of how to write such a table.

How would Openpyxl be used to read an existing Excel sheet table?

A simple openpyxl statement that, when provided with the table name, would read the table into an openpyxl Table object.

Asked By: SO_tourist

||

Source

Answer 1

Openpyxl stores all the worksheet tables in a list. These can be easily read by:

tables = sheet._tables

Then it is possible to search for the desired table by its tableName, returning the range:

for table in tables:
if table.displayName == 'Table1':
    return table.ref

Below is a MWE:

from openpyxl import load_workbook
book = load_workbook('table.xlsx')
sheet = book.active

tables = sheet._tables
table_name = 'Table1'

def find_table(table_name, tables):
    for table in tables:
        if table.displayName == table_name:
            return table.ref


table_range = find_table(table_name, tables)

Answered By: SO_tourist

Answer 2

The following function reads the cell values from the range defined by the table name and returns a tuple containing a list of column headers and a dict of the data. This is useful to then create a Pandas DataFrame:

from openpyxl import load_workbook
import pandas as pd


    def read_excel_table(sheet, table_name):
    """
    This function will read an Excel table
    and return a tuple of columns and data

    This function assumes that tables have column headers
    :param sheet: the sheet
    :param table_name: the name of the table
    :return: columns (list) and data (dict)
    """
    table = sheet.tables[table_name]
    table_range = table.ref

    table_head = sheet[table_range][0]
    table_data = sheet[table_range][1:]

    columns = [column.value for column in table_head]
    data = {column: [] for column in columns}

    for row in table_data:
        row_val = [cell.value for cell in row]
        for key, val in zip(columns, row_val):
            data[key].append(val)

    return columns, data

book = load_workbook('table.xlsx')
ws = book.active

columns, data = read_excel_table(ws, 'Table1')
df = pd.DataFrame(data=data, columns=columns)

Answered By: SO_tourist

Answer 3

The answer from @So_tourist provides the way to get the range of cells in the table, not the Table object as asked.

To get the openpyxl.worksheet.table.Table object you can do this:

sheet.tables.get('MyTable')

NOTE: this answer holds for openpyxl 3.0.6, not sure about previous or later versions.

Answered By: Krakowski mudrac

Answer 4

A simple variation using @So_tourist’s code but leveraging on pd.read_excel() function:

from openpyxl import load_workbook
import pandas as pd

def tblname2df(filename,sheetname,tablename):
    wb = load_workbook(filename, data_only=True)
    ws = wb[sheetname]
    # range of table
    cellrange = ws.tables[tablename].ref
    # column range of table
    cols = [column.value for column in ws[cellrange][0]]
    # number of rows in table 
    n_rows = len(ws[cellrange][1:])
    # number of rows to skip
    skip = int(cellrange[1])-1
    # return the dataframe
    return pd.read_excel(filename,sheetname,usecols=cols,skiprows=skip,nrows=n_rows)

you can load the DataFrame in df calling the function tblname2df using

df = tblname2df('workbook.xlsx','Sheet','Table')

Answered By: Stefano Verugi

How to read an existing worksheet table with Openpyxl?

Question:

Answers: