How to get rows between two rows with specific text?

Question:

I have this dataframe, I extracted this through and image using PyTesseract. But it extracted all the irrelevant data like signatures and stamps. I only want data from ‘ASSETS’ row to ‘Total Liablities’ Row. I tried

bs = bs[(bs['Purticulars'] == 'ASSETS') & (df['Purticulars'] == 'TOTAL LIABILITIES')

but doesnt seem to work.

Asked By: Saad Saleem

||

Answers:

df.loc[df['Purticulars'].isin(['ASSETS','TOTAL LIABILITIES']).cumsum().eq(1) | df.eq('TOTAL LIABILITES')]

or

d = {'ASSETS':True,'TOTAL LIABILITIES':False}

m = df['Purticulars'].map(d).ffill().fillna(False) | df.eq('TOTAL LIABILITES')

df.loc[m]
Answered By: rhug123

You can first find the indices of the rows that have the values "ASSETS" and "TOTAL LIABILITIES" in the "Particulars" column.

And once you get the row indices, you can very easily find all the rows in between.

here:

assets_index = df.index[df['Purticulars'] == 'ASSETS'].tolist()[0]
liabilities_index = df.index[df['Purticulars'] == 'TOTAL LIABILITIES'].tolist()[0]
result = df.loc[assets_index:liabilities_index]
print(result)
Answered By: Ajeet Verma