PySpark – Conditional Statements

Question:

I am a newbie to PySpark and was wondering if you can guide me on how can I convert following SAS code to PySpark.

SAS Code:

If ColA > Then Do;
    If ColB Not In ('B') and ColC <= 0 Then Do;
         New_Col = Sum(ColA, ColR, ColP);
    End;
    Else Do;
         New_Col = Sum(ColA + ColR)
    End;
End;
Else Do;
   If ColB Not in ('B') and ColC <= 0 then do;
     New_Col = Sum(ColR, ColP);
   end;
   Else Do;
     New_Col = ColR;
   End;
End;

Currently, below is the PySpark logic that I am using :

df.withColumn('New_Col', when(ColA > 0 & ColB.isin(['B']) == False & ColC <= 0, col('ColA') + Col('ColR') + Col('ColP'))
...
...

Is this the most optimal approach or is there a better approach to code?

Thank you for your guidance!

Asked By: ZestStat

||

Answers:

Your code is as good as needed, however the conditions should be wrapped inside parentheses

from pyspark.sql import functions as F

(df
    .withColumn('New_Col', F
        .when((F.col('ColA') > 0) & (F.col('ColB').isin(['B']) == False) & (F.col('ColC') <= 0), F.col('ColA') + F.Col('ColR') + F.Col('ColP'))
    )
)
Answered By: pltc

May I please have a full conversion of the code above, struggling with something like this.

Answered By: Henry Mazhokota
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.