How to ignore EMPTY/NULL value columns while grouping in python polars?

Question

I have a dataframe.

df_X = pl.DataFrame({'last_name':['James','Warner','Marino','James','Warner','Marino','James'],
             'first_name':['Horn','Bro','Kach','Horn','Bro','Kach','Horn'],
             'dob':['03/06/1990','09/16/1990','03/06/1990','','03/06/1990','','']}
            )

I’m applying a grouping on last,first and dob columns to get the counts as

df_X.groupby(['last_name','first_name','dob']).agg(pl.count())

Here i would like to ignore the NULL/EMPTY values on grouping columns as

James Horn has two empty DOB’s these should not be included to grouping operation.

Here is the expected output.

Of course we can do filter on the column as below before pass to grouping as

df_X.filter(pl.col('dob')!="").groupby(['last_name','first_name','dob']).agg(pl.count())

But what if I have 10 columns to be specified in filter operation ? i need to write them one after another.

Is there any other solution for it ?

Asked By: myamulla_ciencia

||

Source

Answer 1

First replace empty strings with null values and then use drop_nulls

(
    df_X
    .with_columns(
        [
            pl.when(pl.col(group_columns).str.lengths() ==0)
            .then(None)
            .otherwise(pl.col(group_columns))
            .keep_name()
        ]
    )
    .drop_nulls(group_columns)
    .groupby(group_columns)
    .count()
)
shape: (4, 4)
┌───────────┬────────────┬────────────┬───────┐
│ last_name ┆ first_name ┆ dob        ┆ count │
│ ---       ┆ ---        ┆ ---        ┆ ---   │
│ str       ┆ str        ┆ str        ┆ u32   │
╞═══════════╪════════════╪════════════╪═══════╡
│ Warner    ┆ Bro        ┆ 09/16/1990 ┆ 1     │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ Marino    ┆ Kach       ┆ 03/06/1990 ┆ 1     │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ Warner    ┆ Bro        ┆ 03/06/1990 ┆ 1     │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ James     ┆ Horn       ┆ 03/06/1990 ┆ 1     │
└───────────┴────────────┴────────────┴───────┘

Answered By: braaannigan

How to ignore EMPTY/NULL value columns while grouping in python polars?

Question:

Answers: