How to replace an empty lists in polars (python)?
Question:
Given a polars dataframe with empty lists in column values. How can I replace them with pl.Null
or None
in order to count them as missing values?
ine_items contacts
list[null] list[str]
[] []
[] []
[] ["1081"]
[] ["1313"]
[] ["3657"]
df.with_columns(
pl.when(pl.col(pl.List(pl.Null))
.then(None)
.otherwise(pl.col(pl.List()))
.keep_name()
)
I can’t seem to target the [] content with when() selector and I’m not sure, that otherwise
block can keep the original value if the list is not empty (e.g. [‘1246’])
Answers:
As @Dean MacGregor mentioned, you can do this with .arr.lenghts()==0.
Here is the code for it:
import polars as pl
# example dataframe
df = pl.DataFrame({
'contacts' : [[],[], ['1081'],['1313'],['3657']],
'line_items' : [[],[],[],[],[]]
}, schema=[('contacts',pl.List(pl.Utf8)), ('line_items',pl.List(pl.Utf8))]
)
shape: (5, 2)
┌───────────┬────────────┐
│ contacts ┆ line_items │
│ --- ┆ --- │
│ list[str] ┆ list[str] │
╞═══════════╪════════════╡
│ [] ┆ [] │
│ [] ┆ [] │
│ ["1081"] ┆ [] │
│ ["1313"] ┆ [] │
│ ["3657"] ┆ [] │
└───────────┴────────────┘
# transformation for 1 column
df.with_columns(
pl.when(pl.col('contacts').arr.lengths() == 0)
.then(None)
.otherwise(pl.col('contacts')).keep_name(),
)
shape: (5, 2)
┌───────────┬────────────┐
│ contacts ┆ line_items │
│ --- ┆ --- │
│ list[str] ┆ list[str] │
╞═══════════╪════════════╡
│ null ┆ [] │
│ null ┆ [] │
│ ["1081"] ┆ [] │
│ ["1313"] ┆ [] │
│ ["3657"] ┆ [] │
└───────────┴────────────┘
# EDIT: added transformation for all columns of datatype List(Str)
df.with_columns(
pl.when(
pl.col(column.name).arr.lengths() == 0)
.then(None)
.otherwise(pl.col(column.name))
.keep_name()
for column in df if column.dtype == pl.List(pl.Utf8)
)
shape: (5, 2)
┌───────────┬────────────┐
│ contacts ┆ line_items │
│ --- ┆ --- │
│ list[str] ┆ list[str] │
╞═══════════╪════════════╡
│ null ┆ null │
│ null ┆ null │
│ ["1081"] ┆ null │
│ ["1313"] ┆ null │
│ ["3657"] ┆ null │
└───────────┴────────────┘
Given a polars dataframe with empty lists in column values. How can I replace them with pl.Null
or None
in order to count them as missing values?
ine_items contacts
list[null] list[str]
[] []
[] []
[] ["1081"]
[] ["1313"]
[] ["3657"]
df.with_columns(
pl.when(pl.col(pl.List(pl.Null))
.then(None)
.otherwise(pl.col(pl.List()))
.keep_name()
)
I can’t seem to target the [] content with when() selector and I’m not sure, that otherwise
block can keep the original value if the list is not empty (e.g. [‘1246’])
As @Dean MacGregor mentioned, you can do this with .arr.lenghts()==0.
Here is the code for it:
import polars as pl
# example dataframe
df = pl.DataFrame({
'contacts' : [[],[], ['1081'],['1313'],['3657']],
'line_items' : [[],[],[],[],[]]
}, schema=[('contacts',pl.List(pl.Utf8)), ('line_items',pl.List(pl.Utf8))]
)
shape: (5, 2)
┌───────────┬────────────┐
│ contacts ┆ line_items │
│ --- ┆ --- │
│ list[str] ┆ list[str] │
╞═══════════╪════════════╡
│ [] ┆ [] │
│ [] ┆ [] │
│ ["1081"] ┆ [] │
│ ["1313"] ┆ [] │
│ ["3657"] ┆ [] │
└───────────┴────────────┘
# transformation for 1 column
df.with_columns(
pl.when(pl.col('contacts').arr.lengths() == 0)
.then(None)
.otherwise(pl.col('contacts')).keep_name(),
)
shape: (5, 2)
┌───────────┬────────────┐
│ contacts ┆ line_items │
│ --- ┆ --- │
│ list[str] ┆ list[str] │
╞═══════════╪════════════╡
│ null ┆ [] │
│ null ┆ [] │
│ ["1081"] ┆ [] │
│ ["1313"] ┆ [] │
│ ["3657"] ┆ [] │
└───────────┴────────────┘
# EDIT: added transformation for all columns of datatype List(Str)
df.with_columns(
pl.when(
pl.col(column.name).arr.lengths() == 0)
.then(None)
.otherwise(pl.col(column.name))
.keep_name()
for column in df if column.dtype == pl.List(pl.Utf8)
)
shape: (5, 2)
┌───────────┬────────────┐
│ contacts ┆ line_items │
│ --- ┆ --- │
│ list[str] ┆ list[str] │
╞═══════════╪════════════╡
│ null ┆ null │
│ null ┆ null │
│ ["1081"] ┆ null │
│ ["1313"] ┆ null │
│ ["3657"] ┆ null │
└───────────┴────────────┘