How to add a new field with the counts per group criteria in python polars?


I have a small use case and here is a polars dataframe.

df_names = pl.DataFrame({'LN'['Mallesham','Bhavik','Mallesham','Bhavik','Mahesh','Naresh','Sharath','Rakesh','Mallesham'],

Here I would like to group on LN,FN,SSN and create a new column in which how many number of observations for this group combination and below is the expected output.

enter image description here

‘Mallesham’,’Yamulla’,’123′ is appeared 3 times, hence LN_FN_SSN_count field is filled up with 3.

Asked By: myamulla_ciencia



You can use an expression using over (which is like grouping, aggregating and self-joining in other libs, but without the need for the join):

df_names.with_column(pl.count().over(['LN', 'FN', 'SSN']).alias('LN_FN_SSN_count'))
│ LN        ┆ FN      ┆ SSN ┆ Address ┆ LN_FN_SSN_count │
│ ---       ┆ ---     ┆ --- ┆ ---     ┆ ---             │
│ str       ┆ str     ┆ str ┆ str     ┆ u32             │
│ Mallesham ┆ Yamulla ┆ 123 ┆ A       ┆ 3               │
│ Bhavik    ┆ Yamulla ┆ 456 ┆ B       ┆ 2               │
│ Mallesham ┆ Yamulla ┆ 123 ┆ C       ┆ 3               │
│ Bhavik    ┆ Yamulla ┆ 456 ┆ D       ┆ 2               │
│ ...       ┆ ...     ┆ ... ┆ ...     ┆ ...             │
│ Naresh    ┆ Burre   ┆ 111 ┆ F       ┆ 1               │
│ Sharath   ┆ Velmala ┆ 222 ┆ G       ┆ 1               │
│ Rakesh    ┆ Uppu    ┆ 333 ┆ H       ┆ 1               │
│ Mallesham ┆ Yamulla ┆ 123 ┆ S       ┆ 3               │
Answered By:
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.