Pyspark multiply only some Column Values when condition is met, otherwise keep the same value

Question:

I have the following dataset

id  col1 ... col10  quantity
0    2        3        0
1    1        4        2
2    0        4        2
3    2        2        0

I would like to multiply the values of col1 to col10 by 2 only when quantity is equal 2, otherwise I would like to keep the previous value. Here is an example of the result:

id  col1 ... col10  quantity
0    2        3        0
1    2        8        2
2    0        8        2
3    2        2        0

I wrote the following code for now:

cols_names = df.drop('id','quantity').columns
df = df.withColumn("arr", F.when(F.col('quantity') == 2, F.struct(*[(F.col(x)* 2).alias(x) for x in
                  cols_names]))).select("id","quantity","arr.*")

The only problem with this approach is that when the condition is not met I get null values instead of keeping the old one. How can I keep the old value when the condition is not met? Or if there is an easier way to do that it would be great too.

Asked By: Marco

||

Answers:

you need to use the otherwise clause with the when clause. If you don’t give the otherwise clause, it’s gonna take the default values as None in case of a non-matching condition.

df = df.withColumn("arr", F.when(F.col('quantity') == 2, F.struct(*[(F.col(x)* 2).alias(x) for x in cols_names])).otherwise(F.struct(*[(F.col(x)).alias(x) for x in cols_names]))).select("id","quantity","arr.*")
Answered By: frosty
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.