Tricky Multiple Groupings and Transformations using Pandas

Question

I have a dataset where I would like to:

group by location and box and take distinct count of the box
create column headers with the values in the status column and include its count based on the box

Data

ID  location    type    box     status          
aa  NY          no      box55   hey         
aa  NY          no      box55   hi          
aa  NY          yes     box66   hello           
aa  NY          yes     box66   goodbye         
aa  CA          no      box11   hey         
aa  CA          no      box11   hi          
aa  CA          yes     box11   hello           
aa  CA          yes     box11   goodbye         
aa  CA          no      box86   hey         
aa  CA          no      box86   hi          
aa  CA          yes     box86   hello           
aa  CA          yes     box99   goodbye         
aa  CA          no      box99   hey         
aa  CA          no      box99   hi

Desired

location    box count   box     hey hi  hello   goodbye 
NY          2           box55   1   1   0       0   
NY          2           box66   0   0   1       1   
CA          3           box11   1   1   1       1   
CA          3           box86   1   1   1       0   
CA          3           box99   1   1   0       1

Doing

df['box count'] = df.groupby(['location','box'])['box'].size()


t = pd.get_dummies(df, prefix_sep='', prefix='', 
columns=['status']).groupby(['box', 'location'], 
as_index=False).sum().assign(count=df.groupby(['box', 'location'],
as_index=False)['status'].size()['size'])

Any suggestion is appreciated.

Asked By: Lynn

||

Source

Answer 1

Try making two dataframes: first with .groupby(), second with pd.crosstab. Then just pd.concat them:

df1 = df.groupby(["location", "box"]).agg(**{"box count": ("box", "size")})
df2 = pd.crosstab([df["location"], df["box"]], df["status"])

df_out = pd.concat([df1, df2], axis=1)
print(df_out.reset_index())

Prints:

  location    box  box count  goodbye  hello  hey  hi
0       CA  box11          4        1      1    1   1
1       CA  box86          3        0      1    1   1
2       CA  box99          3        1      0    1   1
3       NY  box55          2        0      0    1   1
4       NY  box66          2        1      1    0   0

EDIT:

m = df.groupby(["location"])["box"].nunique()
df1 = df.groupby(["location", "box"]).agg(
    **{
        "box count": (
            "location",
            lambda x: m[x.iat[0]],
        )
    }
)
df2 = pd.crosstab([df["location"], df["box"]], df["status"])

df_out = pd.concat([df1, df2], axis=1)
print(df_out.reset_index())

Prints:

  location    box  box count  goodbye  hello  hey  hi
0       CA  box11          3        1      1    1   1
1       CA  box86          3        0      1    1   1
2       CA  box99          3        1      0    1   1
3       NY  box55          2        0      0    1   1
4       NY  box66          2        1      1    0   0

Answered By: Andrej Kesely

Tricky Multiple Groupings and Transformations using Pandas

Question:

Data

Desired

Doing

Answers: