Drop row in a for loop Python

Question:

I have a (very large) pandas dataframe like the following:

Sequence
AAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAC
AAAAAAAAAAAAAAAAAAAAAAAAG
AAAAAAAAAAAAAAAAAAAAAAAAT
AAAACAGAAGGTGTCCCAATACTAT
AAAACAGATCTCGGCAGATTGGATG
AAAACAGATCTCGGTAGACTGGACG

And I want to remove rows where the percentage of A is greater than 0.80.
Here is my code:

sequences = file[['Sequence']]

seq_A = 'A' * 25

for row in range(len(file)):
    par1 =  file.iloc[row,0]
    
    # compare sequence with homopolymer and check ratio of match
    ratioA = difflib.SequenceMatcher(None, par1, seq_A).ratio()
        
    if ratioA >= 0.80:
        sequences.drop(row, axis=0, inplace=True)
        # lista.append(row)

But when I check the number of rows with such features with a new list in which I have inserted the indices (without deleting rows), the number of indices does not match the number of deleted rows.
Thank you very much!

Asked By: Denise Lavezzari

||

Answers:

You should generally avoid loops with pandas. Here is how you can do it:

df.loc[df['Sequence'].str.count('A') / df['Sequence'].str.len() <= 0.8]

produces:

                    Sequence
4  AAAACAGAAGGTGTCCCAATACTAT
5  AAAACAGATCTCGGCAGATTGGATG
6  AAAACAGATCTCGGTAGACTGGACG
Answered By: Vladimir Fokow
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.