Groupby/aggregation shows groups which were supposed to be filtered out before

Question

I have a pandas DataFrame with a column Size, on which I filter first and then group by and count records per group. The result contains also rows for the groups which were filtered out before, but with a count of 0:

(
    df[df["Size"].isin(("XXS", "XS", "S", "M", "L", "XL", "XXL"))]
    .groupby("Size")
    .agg(
        count=("OID", "count"),
    )
    .sort_values("count", ascending=False)
)

The result DataFrame is shown in the figure below. In my understanding of the groupby function, the groups which were filtered out (I double checked, they are really not anymore in the dataframe) should no longer occur in the aggregated dataframe. Even copying and resetting the index before grouping by does not change the output.

Unfortunately, I was not able to reproduce the issue with a simple example dataframe, so I assume that there is something strange happening. Does anybody have an idea why this could happen?

Result dataframe:

Asked By: Yannic

||

Source

Answer 1

df[df["Size"].isin(["XXS", "XS", "S", "M", "L", "XL", "XXL"])]
    .groupby("Size")
    .agg(
        count=("OID", "count"),
    )
    .sort_values("count", ascending=False)

====================================================
isin(["XXS", "XS", "S", "M", "L", "XL", "XXL"])

Answered By: Coco Kuang

Answer 2

Sometimes it helps to wait a weekend and think about on Monday again:
The behavior occurred due to categorical datatype of Size column:

>>> df.dtypes

Size                     category
...

>>> df["Size"].unique()

['S', 'M', 'L', 'XL', 'XXL', 'XS', 'XXS']
Categories (80, object): ['100 CM', '105 CM', '24', '25', ..., 'XS/S', 'XXL', 'XXS', 'XXS/XS']

Answered By: Yannic

Groupby/aggregation shows groups which were supposed to be filtered out before

Question:

Answers: