PySpark – Conditional Statements
Question:
I am a newbie to PySpark and was wondering if you can guide me on how can I convert following SAS code to PySpark.
SAS Code:
If ColA > Then Do;
If ColB Not In ('B') and ColC <= 0 Then Do;
New_Col = Sum(ColA, ColR, ColP);
End;
Else Do;
New_Col = Sum(ColA + ColR)
End;
End;
Else Do;
If ColB Not in ('B') and ColC <= 0 then do;
New_Col = Sum(ColR, ColP);
end;
Else Do;
New_Col = ColR;
End;
End;
Currently, below is the PySpark logic that I am using :
df.withColumn('New_Col', when(ColA > 0 & ColB.isin(['B']) == False & ColC <= 0, col('ColA') + Col('ColR') + Col('ColP'))
...
...
Is this the most optimal approach or is there a better approach to code?
Thank you for your guidance!
Answers:
Your code is as good as needed, however the conditions should be wrapped inside parentheses
from pyspark.sql import functions as F
(df
.withColumn('New_Col', F
.when((F.col('ColA') > 0) & (F.col('ColB').isin(['B']) == False) & (F.col('ColC') <= 0), F.col('ColA') + F.Col('ColR') + F.Col('ColP'))
)
)
May I please have a full conversion of the code above, struggling with something like this.
I am a newbie to PySpark and was wondering if you can guide me on how can I convert following SAS code to PySpark.
SAS Code:
If ColA > Then Do;
If ColB Not In ('B') and ColC <= 0 Then Do;
New_Col = Sum(ColA, ColR, ColP);
End;
Else Do;
New_Col = Sum(ColA + ColR)
End;
End;
Else Do;
If ColB Not in ('B') and ColC <= 0 then do;
New_Col = Sum(ColR, ColP);
end;
Else Do;
New_Col = ColR;
End;
End;
Currently, below is the PySpark logic that I am using :
df.withColumn('New_Col', when(ColA > 0 & ColB.isin(['B']) == False & ColC <= 0, col('ColA') + Col('ColR') + Col('ColP'))
...
...
Is this the most optimal approach or is there a better approach to code?
Thank you for your guidance!
Your code is as good as needed, however the conditions should be wrapped inside parentheses
from pyspark.sql import functions as F
(df
.withColumn('New_Col', F
.when((F.col('ColA') > 0) & (F.col('ColB').isin(['B']) == False) & (F.col('ColC') <= 0), F.col('ColA') + F.Col('ColR') + F.Col('ColP'))
)
)
May I please have a full conversion of the code above, struggling with something like this.