Looking to apply a string replace for all columns of a Polars DataFrame without specifying each column
Question:
I’m trying to apply a string replace to a Polars DataFrame, similar to how you would apply it to a Pandas DataFrame.
I would like the equivalent to the following:
df = df.apply(lambda x: x.str.replace("A", "B")
I understand with Polars you can use the following:
`
df=df.with_columns(
pl.col("col1").str.replace("A", "B"),
pl.col("col2").str.replace("A", "B"))`
Is there a way to apply it to all columns? I have a very large DataFrame (100+ columns) and would prefer to not have to specify all columns.
I’ve tried using:
df = df.select([
pl.all().map(lambda x: x.replace("A", "B"), df)
])
and I get an error: ValueError: Cannot infer dtype from 'shape: (150000, 150)
Is there something that I may be missing?
Answers:
You can select all columns of a certain type, or multiple columns. See Selectors for more.
df=df.with_columns(pl.col(pl.Utf8).str.replace("A", "B"))
pl.all().str
… is fine too if the df is fully string columns.
Instead of your initial solution:
df = df.select([
pl.all().map(lambda x: x.replace("A", "B"), df)
])
Try:
df = df.with_columns([
pl.all().str.replace("A", "B")
])
I’m trying to apply a string replace to a Polars DataFrame, similar to how you would apply it to a Pandas DataFrame.
I would like the equivalent to the following:
df = df.apply(lambda x: x.str.replace("A", "B")
I understand with Polars you can use the following:
`
df=df.with_columns(
pl.col("col1").str.replace("A", "B"),
pl.col("col2").str.replace("A", "B"))`
Is there a way to apply it to all columns? I have a very large DataFrame (100+ columns) and would prefer to not have to specify all columns.
I’ve tried using:
df = df.select([
pl.all().map(lambda x: x.replace("A", "B"), df)
])
and I get an error: ValueError: Cannot infer dtype from 'shape: (150000, 150)
Is there something that I may be missing?
You can select all columns of a certain type, or multiple columns. See Selectors for more.
df=df.with_columns(pl.col(pl.Utf8).str.replace("A", "B"))
pl.all().str
… is fine too if the df is fully string columns.
Instead of your initial solution:
df = df.select([
pl.all().map(lambda x: x.replace("A", "B"), df)
])
Try:
df = df.with_columns([
pl.all().str.replace("A", "B")
])