How to replace an empty lists in polars (python)?

Question:

Given a polars dataframe with empty lists in column values. How can I replace them with pl.Null or None in order to count them as missing values?

ine_items   contacts
list[null]  list[str]
[]  []
[]  []
[]  ["1081"]
[]  ["1313"]
[]  ["3657"]
df.with_columns(
   pl.when(pl.col(pl.List(pl.Null))
     .then(None)
     .otherwise(pl.col(pl.List()))
     .keep_name()
)

I can’t seem to target the [] content with when() selector and I’m not sure, that otherwise block can keep the original value if the list is not empty (e.g. [‘1246’])

Asked By: Valentine

||

Answers:

As @Dean MacGregor mentioned, you can do this with .arr.lenghts()==0.

Here is the code for it:

import polars as pl

# example dataframe
df = pl.DataFrame({
        'contacts' : [[],[], ['1081'],['1313'],['3657']],
        'line_items' : [[],[],[],[],[]]
    }, schema=[('contacts',pl.List(pl.Utf8)), ('line_items',pl.List(pl.Utf8))]
)

shape: (5, 2)
┌───────────┬────────────┐
│ contacts  ┆ line_items │
│ ---       ┆ ---        │
│ list[str] ┆ list[str]  │
╞═══════════╪════════════╡
│ []        ┆ []         │
│ []        ┆ []         │
│ ["1081"]  ┆ []         │
│ ["1313"]  ┆ []         │
│ ["3657"]  ┆ []         │
└───────────┴────────────┘

# transformation for 1 column

df.with_columns(
    pl.when(pl.col('contacts').arr.lengths() == 0)
        .then(None)
        .otherwise(pl.col('contacts')).keep_name(),
)

shape: (5, 2)
┌───────────┬────────────┐
│ contacts  ┆ line_items │
│ ---       ┆ ---        │
│ list[str] ┆ list[str]  │
╞═══════════╪════════════╡
│ null      ┆ []         │
│ null      ┆ []         │
│ ["1081"]  ┆ []         │
│ ["1313"]  ┆ []         │
│ ["3657"]  ┆ []         │
└───────────┴────────────┘

# EDIT: added transformation for all columns of datatype List(Str)

df.with_columns(
    pl.when(
        pl.col(column.name).arr.lengths() == 0)
            .then(None)
            .otherwise(pl.col(column.name))
            .keep_name() 
    for column in df if column.dtype == pl.List(pl.Utf8)
)

shape: (5, 2)
┌───────────┬────────────┐
│ contacts  ┆ line_items │
│ ---       ┆ ---        │
│ list[str] ┆ list[str]  │
╞═══════════╪════════════╡
│ null      ┆ null       │
│ null      ┆ null       │
│ ["1081"]  ┆ null       │
│ ["1313"]  ┆ null       │
│ ["3657"]  ┆ null       │
└───────────┴────────────┘

Answered By: Luca
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.