I want to use when on pyspark dataframe but i have multiple columns df.withcolumn
Question:
Dataframe Schema:
root |–LAST_UPDATE_DATE |–ADDR_1 |–ADDR_2 |–ERROR
If the "ERROR" col is null i want to change df like :
df = df.withColumn("LAST_UPDATE_DATE", current_timestamp())
.withColumn("ADDR_1", lit("ADDR_1"))
.withColumn("ADDR_2", lit("ADDR_2"))
else :
df = df.withColumn("ADDR_1", lit("0"))
i have checked the "when-otherwise" but only one column can be changed in that scenario
Desired output :
//+----------------+------+------+-----+
//|LAST_UPDATE_DATE|ADDR_1|ADDR_2|ERROR|
//+----------------+------+------+-----+
//|2022-06-17 07:54|ADDR_1|ADDR_2| null|
//| null| null| null| 1|
//+----------------+------+------+-----+
Answers:
Why not use when-otherwise for each witnColumn? Condition can be taken out for convenience.
Example:
error_event = F.col('ERROR').isNull()
df = (
df
.withColumn('LAST_UPDATE_DATE', F.when(error_event, F.current_timestamp()))
.withColumn('ADDR_1', F.when(error_event, F.lit('ADDR_1'))
.otherwise(1))
)
Dataframe Schema:
root |–LAST_UPDATE_DATE |–ADDR_1 |–ADDR_2 |–ERROR
If the "ERROR" col is null i want to change df like :
df = df.withColumn("LAST_UPDATE_DATE", current_timestamp())
.withColumn("ADDR_1", lit("ADDR_1"))
.withColumn("ADDR_2", lit("ADDR_2"))
else :
df = df.withColumn("ADDR_1", lit("0"))
i have checked the "when-otherwise" but only one column can be changed in that scenario
Desired output :
//+----------------+------+------+-----+
//|LAST_UPDATE_DATE|ADDR_1|ADDR_2|ERROR|
//+----------------+------+------+-----+
//|2022-06-17 07:54|ADDR_1|ADDR_2| null|
//| null| null| null| 1|
//+----------------+------+------+-----+
Why not use when-otherwise for each witnColumn? Condition can be taken out for convenience.
Example:
error_event = F.col('ERROR').isNull()
df = (
df
.withColumn('LAST_UPDATE_DATE', F.when(error_event, F.current_timestamp()))
.withColumn('ADDR_1', F.when(error_event, F.lit('ADDR_1'))
.otherwise(1))
)