I want to use when on pyspark dataframe but i have multiple columns df.withcolumn

Question:

Dataframe Schema:

root |–LAST_UPDATE_DATE |–ADDR_1 |–ADDR_2 |–ERROR

If the "ERROR" col is null i want to change df like :

df = df.withColumn("LAST_UPDATE_DATE", current_timestamp()) 
   .withColumn("ADDR_1", lit("ADDR_1")) 
   .withColumn("ADDR_2", lit("ADDR_2"))

else :

df = df.withColumn("ADDR_1", lit("0"))

i have checked the "when-otherwise" but only one column can be changed in that scenario

Desired output :

//+----------------+------+------+-----+
//|LAST_UPDATE_DATE|ADDR_1|ADDR_2|ERROR|
//+----------------+------+------+-----+
//|2022-06-17 07:54|ADDR_1|ADDR_2| null| 
//|            null|  null|  null|    1|
//+----------------+------+------+-----+  
Asked By: Sonali Bisht

||

Answers:

Why not use when-otherwise for each witnColumn? Condition can be taken out for convenience.
Example:

error_event = F.col('ERROR').isNull()

df = (
    df
    .withColumn('LAST_UPDATE_DATE', F.when(error_event, F.current_timestamp()))
    .withColumn('ADDR_1', F.when(error_event, F.lit('ADDR_1'))
                          .otherwise(1))
)
Answered By: metravod