How can I read a range('A5:B10') and place these values into a dataframe using openpyxl

Question:

Being able to define the ranges in a manner similar to excel, i.e. ‘A5:B10’ is important to what I need so reading the entire sheet to a dataframe isn’t very useful.

So what I need to do is read the values from multiple ranges in the Excel sheet to multiple different dataframes.

valuerange1 = ['a5:b10']
valuerange2 = ['z10:z20']
df = pd.DataFrame(values from valuerange)
df = pd.DataFrame(values from valuerange1)

or

df = pd.DataFrame(values from ['A5:B10'])

I have searched but either I have done a very poor job of searching or everyone else has gotten around this problem but I really can’t.

Thanks.

Asked By: Chris Thomas

||

Answers:

Using openpyxl

Since you have indicated, that you are looking into a very user friendly way to specify the range (like the excel-syntax) and as Charlie Clark already suggested, you can use openpyxl.

The following utility function takes a workbook and a column/row range and returns a pandas DataFrame:

from openpyxl import load_workbook
from openpyxl.utils import get_column_interval
import re

def load_workbook_range(range_string, ws):
    col_start, col_end = re.findall("[A-Z]+", range_string)

    data_rows = []
    for row in ws[range_string]:
        data_rows.append([cell.value for cell in row])

    return pd.DataFrame(data_rows, columns=get_column_interval(col_start, col_end))

Usage:

wb = load_workbook(filename='excel-sheet.xlsx', 
                   read_only=True)
ws = wb.active
load_workbook_range('B1:C2', ws)

Output:

   B  C
0  5  6
1  8  9

Pandas only Solution

Given the following data in an excel sheet:

    A   B   C
0   1   2   3
1   4   5   6
2   7   8   9
3  10  11  12

You can load it with the following command:
pd.read_excel('excel-sheet.xlsx')

If you were to limit the data being read, the pandas.read_excel method offers a number of options. Use the parse_cols, skiprows and skip_footer to select the specific subset that you want to load:

pd.read_excel(
    'excel-sheet.xlsx',    # name of excel sheet
    names=['B','C'],       # new column header
    skiprows=range(0,1),   # list of rows you want to omit at the beginning
    skip_footer=1,         # number of rows you want to skip at the end
    parse_cols='B:C'       # columns to parse (note the excel-like syntax)
)

Output:

   B  C
0  5  6
1  8  9

Some notes:

The API of the read_excel method is not meant to support more complex selections. In case you require a complex filter it is much easier (and cleaner) to load the whole data into a DataFrame and use the excellent slicing and indexing mechanisms provided by pandas.

Answered By: Jan Trienes

The most easiest way is to use pandas for getting the range of values from excel.

import pandas as pd

#if you want to choose single range, you can use the below method
src=pd.read_excel(r'August.xlsx',usecols='A:C',sheet_name='S')

#if you have multirange, which means a dataframe with A:S and as well some other range
src=pd.read_excel(r'August.xlsx',usecols='A:C,G:I',sheet_name='S')
Answered By: Raghav

If you want to use particular range, for example "B3:E5", you can use the following structure.

src=pd.read_excel(r'August.xlsx',usecols='B:E',sheet_name='S',header=2)[0:2]
Answered By: Gürkan ÖZTÜRK