Condense dataset pandas
Question:
I wish to condense my dataset. Essentially it is a groupby.
Data
id box status
aa box11 hey
aa box11 hey
aa box11 hey
aa box11 hey
aa box5 hello
aa box5 hello
aa box5 hello
aa box5 hello
aa box5 hello
bb box8 no
bb box8 no
Desired
id box status
aa box11 hey
aa box5 hello
bb box8 no
Doing
df1 = df.groupby(["id"])["box"]).agg()
Answers:
If you want to be careful and exclude "id" you can use the subset keyword:
df1 = df.drop_duplicates(subset = ['box', 'status'])
EDIT:
To clarify, drop_duplicates() will only drop rows if the full row is duplicated. Subset just tells it which rows to consider. If you had a row where box=’box8′ and status=’hey’, this row would not drop. Both are duplicates individually but are in a unique combination.
I wish to condense my dataset. Essentially it is a groupby.
Data
id box status
aa box11 hey
aa box11 hey
aa box11 hey
aa box11 hey
aa box5 hello
aa box5 hello
aa box5 hello
aa box5 hello
aa box5 hello
bb box8 no
bb box8 no
Desired
id box status
aa box11 hey
aa box5 hello
bb box8 no
Doing
df1 = df.groupby(["id"])["box"]).agg()
If you want to be careful and exclude "id" you can use the subset keyword:
df1 = df.drop_duplicates(subset = ['box', 'status'])
EDIT:
To clarify, drop_duplicates() will only drop rows if the full row is duplicated. Subset just tells it which rows to consider. If you had a row where box=’box8′ and status=’hey’, this row would not drop. Both are duplicates individually but are in a unique combination.