Python Pandas dataframe reading exact specified range in an excel sheet

Question:

I have a lot of different table (and other unstructured data in an excel sheet) .. I need to create a dataframe out of range ‘A3:D20’ from ‘Sheet2’ of Excel sheet ‘data’.

All examples that I come across drilldown up to sheet level, but not how to pick it from an exact range.

import openpyxl
import pandas as pd

wb = openpyxl.load_workbook('data.xlsx')
sheet = wb.get_sheet_by_name('Sheet2')
range = ['A3':'D20']   #<-- how to specify this?
spots = pd.DataFrame(sheet.range) #what should be the exact syntax for this?

print (spots)

Once I get this, I plan to look up data in column A and find its corresponding value in column B.

Edit 1: I realised that openpyxl takes too long, and so have changed that to pandas.read_excel('data.xlsx','Sheet2') instead, and it is much faster at that stage at least.

Edit 2: For the time being, I have put my data in just one sheet and:

  • removed all other info
  • added column names,
  • applied index_col on my leftmost column
  • then used wb.loc[]
Asked By: spiff

||

Answers:

Use the following arguments from pandas read_excel documentation:

  • skiprows : list-like
    • Rows to skip at the beginning (0-indexed)
  • nrows: int, default None
    • Number of rows to parse.
  • parse_cols : int or list, default None
    • If None then parse all columns,
    • If int then indicates last column to be parsed
    • If list of ints then indicates list of column numbers to be parsed
    • If string then indicates comma separated list of column names and column ranges (e.g. “A:E” or “A,C,E:F”)

I imagine the call will look like:

df = read_excel(filename, 'Sheet2', skiprows = 2, nrows=18,  parse_cols = 'A:D')
Answered By: shane

One way to do this is to use the openpyxl module.

Here’s an example:

from openpyxl import load_workbook

wb = load_workbook(filename='data.xlsx', 
                   read_only=True)

ws = wb['Sheet2']

# Read the cell values into a list of lists
data_rows = []
for row in ws['A3':'D20']:
    data_cols = []
    for cell in row:
        data_cols.append(cell.value)
    data_rows.append(data_cols)

# Transform into dataframe
import pandas as pd
df = pd.DataFrame(data_rows)
Answered By: DocZerø

my answer with pandas O.25 tested and worked well

pd.read_excel('resultat-elections-2012.xls', sheet_name = 'France entière T1T2', skiprows = 2,  nrows= 5, usecols = 'A:H')
pd.read_excel('resultat-elections-2012.xls', index_col = None, skiprows= 2, nrows= 5, sheet_name='France entière T1T2', usecols=range(0,8))

So :
i need data after two first lines ; selected desired lines (5) and col A to H.
Be carefull @shane answer’s need to be improved and updated with the new parameters of Pandas

my original excel

my process with pandas read_excel

Answered By: ddnsimplon
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.