Split one excel file into multiple with specific number of rows in Pandas

Question:

Let’s say I have an excel file with 101 rows, I need to split and write into 11 excel files with equivalent row number 10 for each new file, except the last one since there is only one row left.

This is code I have tried, but I get KeyError: 11:

df = pd.DataFrame(data=np.random.rand(101, 3), columns=list('ABC'))
groups = df.groupby(int(len(df.index)/10) + 1)
for i, g in groups:
    g.to_excel("%s.xlsx" % i, index = False, index_lable = False)

Someone could help with this issue? Thanks a lot.

Reference related: Split pandas dataframe into multiple dataframes with equal numbers of rows

Asked By: ah bon

||

Answers:

I think you need np.arange:

df = pd.DataFrame(data=np.random.rand(101, 3), columns=list('ABC'))
groups = df.groupby(np.arange(len(df.index))//10)
for i, g in groups:
    print(g)
Answered By: jezrael

I solved a similar problem as follows. Backstory to my issue was that I have created an Azure Function with an HTTP trigger, but was overwhelming the endpoint when iterating through 2k rows of requests. So chunked up the origin file into rows of 50:

import pandas as pd
import logging

INXL = pd.read_excel('split/031022.xlsx', engine="openpyxl")


row_count = (len(INXL.index))
#make sure we are dealing with a table bigger than 50    
if row_count >= 51:
    row_start = (row_count -50)
else:
   row_start = 1


def extract(rs, rc):
   while rc >= 51: #loop body
        # set the extraction to be between the row start and ending index
        row_extract = INXL.iloc[rs:rc]
        with pd.ExcelWriter(f'output_{rc}.xlsx') as writer: 
            row_extract.to_excel(writer,index=False)
        rc -= 50
        rs -= 50
        

extract(row_start, row_count)
if row_count < 51:
    row_extract = INXL.iloc[row_start:row_count]
    with pd.ExcelWriter(f'output_{row_count}.xlsx') as writer: 
        row_extract.to_excel(writer,index=False) 
        logging.info("extract completed")       
Answered By: James Stanbridge