Condense dataset pandas

Question:

I wish to condense my dataset. Essentially it is a groupby.

Data

id  box     status
aa  box11   hey
aa  box11   hey
aa  box11   hey
aa  box11   hey
aa  box5    hello
aa  box5    hello
aa  box5    hello
aa  box5    hello
aa  box5    hello
bb  box8    no
bb  box8    no

Desired

id  box     status
aa  box11   hey
aa  box5    hello
bb  box8    no

Doing

df1 = df.groupby(["id"])["box"]).agg()
Asked By: Lynn

||

Answers:

DataFrame.drop_duplicates()

If you want to be careful and exclude "id" you can use the subset keyword:

df1 = df.drop_duplicates(subset = ['box', 'status'])

EDIT:
To clarify, drop_duplicates() will only drop rows if the full row is duplicated. Subset just tells it which rows to consider. If you had a row where box=’box8′ and status=’hey’, this row would not drop. Both are duplicates individually but are in a unique combination.

Answered By: psychicesp
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.