Python Pandas dataframe reading exact specified range in an excel sheet

Question

I have a lot of different table (and other unstructured data in an excel sheet) .. I need to create a dataframe out of range ‘A3:D20’ from ‘Sheet2’ of Excel sheet ‘data’.

All examples that I come across drilldown up to sheet level, but not how to pick it from an exact range.

import openpyxl
import pandas as pd

wb = openpyxl.load_workbook('data.xlsx')
sheet = wb.get_sheet_by_name('Sheet2')
range = ['A3':'D20']   #<-- how to specify this?
spots = pd.DataFrame(sheet.range) #what should be the exact syntax for this?

print (spots)

Once I get this, I plan to look up data in column A and find its corresponding value in column B.

Edit 1: I realised that openpyxl takes too long, and so have changed that to pandas.read_excel('data.xlsx','Sheet2') instead, and it is much faster at that stage at least.

Edit 2: For the time being, I have put my data in just one sheet and:

removed all other info
added column names,
applied index_col on my leftmost column
then used wb.loc[]

Asked By: spiff

||

Source

Answer 1

Use the following arguments from pandas read_excel documentation:

skiprows : list-like

Rows to skip at the beginning (0-indexed)

nrows: int, default None

Number of rows to parse.

parse_cols : int or list, default None

If None then parse all columns,

If int then indicates last column to be parsed

If list of ints then indicates list of column numbers to be parsed

If string then indicates comma separated list of column names and column ranges (e.g. “A:E” or “A,C,E:F”)

I imagine the call will look like:

df = read_excel(filename, 'Sheet2', skiprows = 2, nrows=18,  parse_cols = 'A:D')

Answered By: shane

Answer 2

One way to do this is to use the openpyxl module.

Here’s an example:

from openpyxl import load_workbook

wb = load_workbook(filename='data.xlsx', 
                   read_only=True)

ws = wb['Sheet2']

# Read the cell values into a list of lists
data_rows = []
for row in ws['A3':'D20']:
    data_cols = []
    for cell in row:
        data_cols.append(cell.value)
    data_rows.append(data_cols)

# Transform into dataframe
import pandas as pd
df = pd.DataFrame(data_rows)

Answered By: DocZerø

Answer 3

my answer with pandas O.25 tested and worked well

pd.read_excel('resultat-elections-2012.xls', sheet_name = 'France entière T1T2', skiprows = 2,  nrows= 5, usecols = 'A:H')
pd.read_excel('resultat-elections-2012.xls', index_col = None, skiprows= 2, nrows= 5, sheet_name='France entière T1T2', usecols=range(0,8))

So :
i need data after two first lines ; selected desired lines (5) and col A to H.
Be carefull @shane answer’s need to be improved and updated with the new parameters of Pandas

my original excel

my process with pandas read_excel

Answered By: ddnsimplon

Python Pandas dataframe reading exact specified range in an excel sheet

Question:

Answers: