How to filter duplicates based on multiple columns in Polars?

Question:

I was earlier able to filter duplicates based on multiple columns using df.filter(pl.col(['A','C']).is_duplicated()) but after the latest version update this is not working.

import polars as pl


df = pl.DataFrame(
    {
        "A": [1,4,4,7,7,10,10,13,16],
        "B": [2,5,5,8,18,11,11,14,17],
        "C": [3,6,6,9,9,12,12,15,18]        
    }
)
df.filter(pl.col(['A','C']).is_duplicated())

giving error

df.filter(df.select(
    pl.col(['A','C']).is_duplicated()
    )
)

giving error

Asked By: ViSa

||

Answers:

This behavior was noted as ambiguous in 0.16.10 and would return this error:

exceptions.ComputeError: The predicate passed to 'LazyFrame.filter' expanded to multiple expressions: 

        col("A").is_duplicated(),
        col("C").is_duplicated(),
This is ambiguous. Try to combine the predicates with the 'all' or `any' expression.

However 0.19.0 removed the deprecated behavior of all/any replaced by all_horizontal and any_horizontal. To get the same behavior as the pre-0.16.10, use df.filter(pl.all_horizontal(pl.col(['A','C']).is_duplicated()))

I’ve modified the input slightly to reflect the differences between any_horizontal and all_horizontal

import polars as pl

df = pl.DataFrame(
    {
        "A": [1,3,4,7,7,10,10,13,16],
        "B": [2,5,5,8,18,11,11,14,17],
        "C": [3,6,6,9,9,12,12,15,18]        
    }
)

# print("legacy run in 0.16.9: ", df.filter(pl.col(['A','C']).is_duplicated()))
print("all_horizontal: ", df.filter(pl.all_horizontal(pl.col(['A','C']).is_duplicated())))
print("any_horizontal: ", df.filter(pl.any_horizontal(pl.col(['A','C']).is_duplicated())))
legacy run in 0.16.9:  shape: (4, 3)
┌─────┬─────┬─────┐
│ A   ┆ B   ┆ C   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 7   ┆ 8   ┆ 9   │
│ 7   ┆ 18  ┆ 9   │
│ 10  ┆ 11  ┆ 12  │
│ 10  ┆ 11  ┆ 12  │
└─────┴─────┴─────┘

all_horizontal:  shape: (4, 3)
┌─────┬─────┬─────┐
│ A   ┆ B   ┆ C   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 7   ┆ 8   ┆ 9   │
│ 7   ┆ 18  ┆ 9   │
│ 10  ┆ 11  ┆ 12  │
│ 10  ┆ 11  ┆ 12  │
└─────┴─────┴─────┘

any_horizontal:  shape: (6, 3)
┌─────┬─────┬─────┐
│ A   ┆ B   ┆ C   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 3   ┆ 5   ┆ 6   │
│ 4   ┆ 5   ┆ 6   │
│ 7   ┆ 8   ┆ 9   │
│ 7   ┆ 18  ┆ 9   │
│ 10  ┆ 11  ┆ 12  │
│ 10  ┆ 11  ┆ 12  │
└─────┴─────┴─────┘
Answered By: keraion
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.