Python: Convert xlrd sheet to numpy matrix (ndarray)

Question:

What is the conversion syntax to convert a successfully loaded xlrd excel sheet to a numpy matrix (that represents that sheet)?

Right now I’m trying to take each row of the spreadsheet and add it to the numpy matrix. I can’t figure out the syntax for converting a Sheet.row into a numpy.ndarray. Here’s what I’ve tried so far:

import xlrd
workbook = xlrd.open_workbook('input.xlsx')
worksheet = workbook.sheet_by_name('Sheet1')
num_rows = worksheet.nrows - 1
num_cells = worksheet.ncols - 1
inputData = numpy.empty([worksheet.nrows - 1, worksheet.ncols])
curr_row = -1
while curr_row < num_rows: # for each row
    curr_row += 1
    row = worksheet.row(curr_row)
    if curr_row > 0: # don't want the first row because those are labels
        inputData[curr_row - 1] = numpy.array(row)

I’ve tried all sorts of things on that last line to try to convert the row to something numpy will accept and add to the inputData matrix. What is the correct conversion syntax?

Answers:

You are trying to convert an object row, which is a list of xlrd.sheet.Cell elements to a numpy array straight away. That won’t work the way you want it to. You’ll have to do this the long way and go over each of the columns too:

while curr_row < num_rows: # for each row
    curr_row += 1
    row = worksheet.row(curr_row)
    if curr_row > 0: # don't want the first row because those are labels
        for col_ind, el in enumerate(row):
            inputData[curr_row - 1, col_ind] = el.value

There seems to exist a function for this in pandas though, as suggested elsewhere on SO. And pandas dataframes inherit from numpy arrays, so can be transformed to them too. Probably best not to reinvent the wheel…

Answered By: Oliver W.

I am wondering if you are aware of the Pandas library which features xlsx loading:

import pandas as pd
df = pd.read_excel('input.xlsx')

You can control which sheet to read with sheetname argument and you can get Numpy array from the Pandas DataFrame in the values attribute.

Answered By: Finn Årup Nielsen

To convert the xlrd sheet to numpy matrix we need to iterate the xlrd sheet. Here is function to convert the xlrd workbook sheet to numpy 2d array

import numpy
def to_numpy(book, sheet_no = 0):
    rows = book.sheet_by_index(sheet_no)
    return numpy.array([list(map(lambda x : x.value, rows.row(i))) for i in range(rows.nrows)])

arr = to_numpy(book, 0)
Answered By: Rushikesh
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.