Different ways to conditional Drop Row in Pandas

Question:

I have a DataFrame that has a column (AE) that could contain: nothing (""), "X", "A" or "E".

I want to drop all the rows that have the value "X" on it.

I searched nad I have found 2 ways of doing it:

df= df.drop(df[df.AE == "X"].index)

or

df=df[df["AE"] != "X"]

But for some reason, the first way of doing it drops more lines than it should.

Do the two lines of code do the same thing?

There seems to be a mistake I’m making when trying to do this "drop" using the first approach.

Asked By: Hélio dos Santos

||

Answers:

They are not the same.

df = df.drop(df[df.AE == "X"].index)

Is dropping rows by their index value, if the indexes are not unique, then the index of the rows where df[‘AE’] == "X" might be shared across other cases.

df = df[df["AE"] != "X"]

Here we are slicing the dataframe and keeping all rows where df["AE"] is different from "X". There is no consideration regarding the index value and actually are not dropping rows, but actually keeping those that meet a criteria.

Answered By: Celius Stingher
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.