How to filter empty values from a column of Python Polars Dataframe?
Question:
I have a python polars dataframe as-
df_pol = pl.DataFrame({'test_names':[['Mallesham','','Bhavik','Jagarini','Jose','Fernando'],
['','','','ABC','','XYZ']]})
I would like to get a count of elements from each list in test_names field not considering the empty values.
df_pol.with_column(pl.col('test_names').arr.lengths().alias('tot_names'))
Here it is considering empty strings into count, this is why we can see 6 names in list-2. actually it has only two names.
required output as:
Answers:
You can use arr.eval
to run any polars expression on the list’s elements. In an arr.eval
expression, you can pl.element()
to refer to the lists element and then apply an expression.
Next we simply use a filter
expression to prune the values we don’t need.
df = pl.DataFrame({
"test_names":[
["Mallesham","","Bhavik","Jagarini","Jose","Fernando"],
["","","","ABC","","XYZ"]
]
})
df.with_column(
pl.col("test_names").arr.eval(pl.element().filter(pl.element() != ""))
)
shape: (2, 1)
┌─────────────────────────────────────┐
│ test_names │
│ --- │
│ list[str] │
╞═════════════════════════════════════╡
│ ["Mallesham", "Bhavik", ... "Fer... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ["ABC", "XYZ"] │
└─────────────────────────────────────┘
Good question – basically we want to apply a filter within each list element.
We do this by using arr.eval
which allows us to do operations inside the Series
on each row and use pl.element
to be a proxy for the Series
on each row.
(
df_pol
.with_column(
pl.col('test_names').arr.eval(
pl.element().filter(pl.element().str.lengths()>0)
)
.arr.lengths()
.alias('tot_names')
)
)
I have a python polars dataframe as-
df_pol = pl.DataFrame({'test_names':[['Mallesham','','Bhavik','Jagarini','Jose','Fernando'],
['','','','ABC','','XYZ']]})
I would like to get a count of elements from each list in test_names field not considering the empty values.
df_pol.with_column(pl.col('test_names').arr.lengths().alias('tot_names'))
Here it is considering empty strings into count, this is why we can see 6 names in list-2. actually it has only two names.
required output as:
You can use arr.eval
to run any polars expression on the list’s elements. In an arr.eval
expression, you can pl.element()
to refer to the lists element and then apply an expression.
Next we simply use a filter
expression to prune the values we don’t need.
df = pl.DataFrame({
"test_names":[
["Mallesham","","Bhavik","Jagarini","Jose","Fernando"],
["","","","ABC","","XYZ"]
]
})
df.with_column(
pl.col("test_names").arr.eval(pl.element().filter(pl.element() != ""))
)
shape: (2, 1)
┌─────────────────────────────────────┐
│ test_names │
│ --- │
│ list[str] │
╞═════════════════════════════════════╡
│ ["Mallesham", "Bhavik", ... "Fer... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ["ABC", "XYZ"] │
└─────────────────────────────────────┘
Good question – basically we want to apply a filter within each list element.
We do this by using arr.eval
which allows us to do operations inside the Series
on each row and use pl.element
to be a proxy for the Series
on each row.
(
df_pol
.with_column(
pl.col('test_names').arr.eval(
pl.element().filter(pl.element().str.lengths()>0)
)
.arr.lengths()
.alias('tot_names')
)
)