Databricks Koalas Column Assignment Based on Another COlumn Value Lambda Function

Question:

Given a koalas Dataframe:

df = ks.DataFrame({"high_risk": [0, 1, 0, 1, 1], 
                   "medium_risk": [1, 0, 0, 0, 0]
                   })

Running a lambda function to get a new column based on the existing column values:

df = df.assign(risk=lambda x: "High" if x.high_risk else ("Medium" if x.medium_risk else "Low"))
df
Out[72]: 
   high_risk  medium_risk  risk
0          0            1  High
4          1            0  High
1          1            0  High
2          0            0  High
3          1            0  High

Expected return:

       high_risk  medium_risk  risk
    0          0            1  Medium
    4          1            0  High
    1          1            0  High
    2          0            0  Low
    3          1            0  High

Why does this assign “High” to each of the values. The intent is to operations on each row, is it looking at the whole column in the comparison?

Asked By: ratchet

||

Answers:

Using assign on a koalas df seems not easy to me, but for your case, I would mul the column ‘high_risk’ by 2 then add the column ‘medium_risk’ and finally map the result to replace the 2 by ‘high’ (because you multiply the column by 2 before) 1 by ‘medium’ and 0 by ‘low’ such as:

df = df.assign(risk= df.high_risk.mul(2).add(df.medium_risk)
                       .map({0:'low', 1:'medium', 2:'high'}))
df
   high_risk  medium_risk    risk
0          0            1  medium
1          1            0    high
2          0            0     low
3          1            0    high
4          1            0    high

Note : this would fail if you have 1 in both high and medium risks column.

Answered By: Ben.T
def function1(ss:ks.Series):
    if ss.high_risk==1:
        return "High"
    elif ss.medium_risk==1:
        return "Medium"
    else:
        return "Low"

col1=df.apply(function1,axis=1)
df.join(col1.rename("risk"))

out:

       high_risk  medium_risk  risk
    0          0            1  Medium
    4          1            0  High
    1          1            0  High
    2          0            0  Low
    3          1            0  High
Answered By: G.G
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.