Polars dataframe drop nans

Question:

I need to drop rows that have a nan value in any column. As for null values with drop_nulls()

df.drop_nulls()

but for nans. I have found that the method drop_nans exist for Series but not for DataFrames

df['A'].drop_nans()

Pandas code that I’m using:

df = pd.DataFrame(
    {
        'A': [0, 0, 0, 1,None, 1],
        'B': [1, 2, 2, 1,1, np.nan]
    }
)
df.dropna()
Asked By: EnesZ

||

Answers:

Try this:

import polars as pl
import numpy as np

# create a DataFrame with some NaN values
df = pl.DataFrame({
    'A': [1, 2, np.nan, 4, 5],
    'B': ['foo', 'bar', 'app', 'ctx', 'mpq']
})

df.to_pandas().dropna()
Answered By: code_adithya

Not sure why it currently only exists as a Series method.

You can use .filter() to emulate the behaviour then call .drop_nulls()

>>> df.filter(pl.all(pl.col(pl.Float32, pl.Float64).is_not_nan())).drop_nulls()
shape: (4, 2)
┌─────┬─────┐
│ A   | B   │
│ --- | --- │
│ i64 | f64 │
╞═════╪═════╡
│ 0   | 1.0 │
│ 0   | 2.0 │
│ 0   | 2.0 │
│ 1   | 1.0 │
└─────┴─────┘
Answered By: jqurious

If you have mixed nulls and nans then the easiest thing to do is replace the nans with nulls then use drop_nulls()

df.with_columns(pl.col(pl.Float32, pl.Float64).fill_nan(None)).drop_nulls()

From inside out:

pl.col(pl.Float32, pl.Float64) picks all the columns that are floats and hence able to be nan.

fill_nan(None) replaces any nan value with, in this case, None which is a proper null

drop_nulls() does exactly what it seems like it does.

Answered By: Dean MacGregor

As @jqurious suggested but with column names

df = pl.DataFrame(
    {
        'A': [0, 1.0, 1, np.nan, 2],
        'B': ['1', '1','1','1', None]
    }
)

# get all columns that have a float type
float_col = df.columns
float_col = [c for c in float_col if df[c].dtype in [pl.Float64, pl.Float32]]

df.filter(pl.all(pl.col(float_col).is_not_nan())).drop_nulls()
Answered By: EnesZ
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.