How to ignore EMPTY/NULL value columns while grouping in python polars?

Question:

I have a dataframe.

df_X = pl.DataFrame({'last_name':['James','Warner','Marino','James','Warner','Marino','James'],
             'first_name':['Horn','Bro','Kach','Horn','Bro','Kach','Horn'],
             'dob':['03/06/1990','09/16/1990','03/06/1990','','03/06/1990','','']}
            )

enter image description here

I’m applying a grouping on last,first and dob columns to get the counts as

df_X.groupby(['last_name','first_name','dob']).agg(pl.count())

enter image description here

Here i would like to ignore the NULL/EMPTY values on grouping columns as

James Horn has two empty DOB’s these should not be included to grouping operation.

Here is the expected output.

enter image description here

Of course we can do filter on the column as below before pass to grouping as

df_X.filter(pl.col('dob')!="").groupby(['last_name','first_name','dob']).agg(pl.count())

But what if I have 10 columns to be specified in filter operation ? i need to write them one after another.

Is there any other solution for it ?

Asked By: myamulla_ciencia

||

Answers:

First replace empty strings with null values and then use drop_nulls

(
    df_X
    .with_columns(
        [
            pl.when(pl.col(group_columns).str.lengths() ==0)
            .then(None)
            .otherwise(pl.col(group_columns))
            .keep_name()
        ]
    )
    .drop_nulls(group_columns)
    .groupby(group_columns)
    .count()
)
shape: (4, 4)
┌───────────┬────────────┬────────────┬───────┐
│ last_name ┆ first_name ┆ dob        ┆ count │
│ ---       ┆ ---        ┆ ---        ┆ ---   │
│ str       ┆ str        ┆ str        ┆ u32   │
╞═══════════╪════════════╪════════════╪═══════╡
│ Warner    ┆ Bro        ┆ 09/16/1990 ┆ 1     │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ Marino    ┆ Kach       ┆ 03/06/1990 ┆ 1     │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ Warner    ┆ Bro        ┆ 03/06/1990 ┆ 1     │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ James     ┆ Horn       ┆ 03/06/1990 ┆ 1     │
└───────────┴────────────┴────────────┴───────┘
Answered By: braaannigan
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.