How to find the last row in a column using openpyxl normal workbook?
Question:
I’m using openpyxl to put data validation to all rows that have “Default” in them. But to do that, I need to know how many rows there are.
I know there is a way to do that if I were using Iterable workbook mode, but I also add a new sheet to the workbook and in the iterable mode that is not possible.
Answers:
ws.max_row
will give you the number of rows in a worksheet.
Since version openpyxl 2.4 you can also access individual rows and columns and use their length to answer the question.
len(ws['A'])
Though it’s worth noting that for data validation for a single column Excel uses 1:1048576
.
Find length of row and length of col.
Column:
column=sheet['A']
output tuple-->(A1,A2,A3........An)
len(column)
output length--> 18
for row length:
for i in sheet.iter_rows(max_row=0):
print(len(i))
break
This will give you length of header row where you put feature name .
If you wan to get all rows length add max_row=len(column) and remove break.
This works for me well. It gives number of non empty rows in each column, assuming there are no empty rows in between.
from openpyxl import load_workbook as lw
from openpyxl.utils import get_column_letter
wb = lw(your_xlsx_file)
ws = wb[sheet_name]
for col in range(1, ws.max_column + 1):
col_letter = get_column_letter(col)
max_col_row = len([cell for cell in ws[col_letter] if cell.value])
print("Column: {}, Row numbers: {}".format(col_letter, max_col_row)
Here is other solution that might be helpful – as openpyxl function max_row and max_column takes into consideration also empty cells with styles applied I think that using pandas is better in that case:
import pandas as pd
def get_max_row_column(df, sheet_name):
max_row = 1
max_col = 1
for sh_name, sh_content in df.items():
if sh_name == sheet_name:
max_row = len(sh_content) + 1
max_col = len(sh_content.columns)
break
coordinates = {'max_row': max_row, 'max_col': max_col}
return coordinates
df = pd.read_excel('xls_path', sheet_name=None)
max_row = get_max_row_column(df, 'Test_sheet')['max_row']
max_col = get_max_row_column(df, 'Test_sheet')['max_col']
By providing sheet_name=None I create dictionary of all worksheets where key is sheet name and value sheet content (which is pandas DataFrame de facto).
I’m using openpyxl to put data validation to all rows that have “Default” in them. But to do that, I need to know how many rows there are.
I know there is a way to do that if I were using Iterable workbook mode, but I also add a new sheet to the workbook and in the iterable mode that is not possible.
ws.max_row
will give you the number of rows in a worksheet.
Since version openpyxl 2.4 you can also access individual rows and columns and use their length to answer the question.
len(ws['A'])
Though it’s worth noting that for data validation for a single column Excel uses 1:1048576
.
Find length of row and length of col.
Column:
column=sheet['A']
output tuple-->(A1,A2,A3........An)
len(column)
output length--> 18
for row length:
for i in sheet.iter_rows(max_row=0):
print(len(i))
break
This will give you length of header row where you put feature name .
If you wan to get all rows length add max_row=len(column) and remove break.
This works for me well. It gives number of non empty rows in each column, assuming there are no empty rows in between.
from openpyxl import load_workbook as lw
from openpyxl.utils import get_column_letter
wb = lw(your_xlsx_file)
ws = wb[sheet_name]
for col in range(1, ws.max_column + 1):
col_letter = get_column_letter(col)
max_col_row = len([cell for cell in ws[col_letter] if cell.value])
print("Column: {}, Row numbers: {}".format(col_letter, max_col_row)
Here is other solution that might be helpful – as openpyxl function max_row and max_column takes into consideration also empty cells with styles applied I think that using pandas is better in that case:
import pandas as pd
def get_max_row_column(df, sheet_name):
max_row = 1
max_col = 1
for sh_name, sh_content in df.items():
if sh_name == sheet_name:
max_row = len(sh_content) + 1
max_col = len(sh_content.columns)
break
coordinates = {'max_row': max_row, 'max_col': max_col}
return coordinates
df = pd.read_excel('xls_path', sheet_name=None)
max_row = get_max_row_column(df, 'Test_sheet')['max_row']
max_col = get_max_row_column(df, 'Test_sheet')['max_col']
By providing sheet_name=None I create dictionary of all worksheets where key is sheet name and value sheet content (which is pandas DataFrame de facto).