Condense dataset pandas

Question

I wish to condense my dataset. Essentially it is a groupby.

Data

id  box     status
aa  box11   hey
aa  box11   hey
aa  box11   hey
aa  box11   hey
aa  box5    hello
aa  box5    hello
aa  box5    hello
aa  box5    hello
aa  box5    hello
bb  box8    no
bb  box8    no

Desired

id  box     status
aa  box11   hey
aa  box5    hello
bb  box8    no

Doing

df1 = df.groupby(["id"])["box"]).agg()

Asked By: Lynn

||

Source

Answer 1

DataFrame.drop_duplicates()

If you want to be careful and exclude "id" you can use the subset keyword:

df1 = df.drop_duplicates(subset = ['box', 'status'])

EDIT:
To clarify, drop_duplicates() will only drop rows if the full row is duplicated. Subset just tells it which rows to consider. If you had a row where box=’box8′ and status=’hey’, this row would not drop. Both are duplicates individually but are in a unique combination.

Answered By: psychicesp

Condense dataset pandas

Question:

Data

Desired

Doing

Answers: