Lambda function with if else clause with Python

Question:

I have a dataframe that looks like:

A   B   C   D   SUM 
2   5   -4  12  15

I try and run:

df.apply((lambda x: x / x.sum() if x/x.sum() >= 0 else None), axis=1).fillna(0)

to get, if cell is same at total then calculate x/total:

A         B     C   D
2/15    5/15    0   12/15

I get:

'The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How can i improve the code.

Asked By: Mutenda Tshipala

||

Answers:

You are mixing up pd.Series.apply and pd.DataFrame.apply. They are different methods: one works on a series and operates on each element; the other operates across the dataframe along an axis. In the latter case, along axis=1 means each row is fed to the function sequentially.

Because these apply methods (both versions) are just thinly veiled loops, the dataframe will change after each columnwise lambda call. Therefore, you will need to work with a copy of the dataframe:

df2 = df.copy()

for col in df.columns[:-1]:
    df2[col] = df.iloc[:, :-1].apply(lambda x: x[col] / x.sum() if x[col]/x.sum() >= 0 
                                     else None, axis=1).fillna(0)

print(df2)

          A         B  C    D  SUM
0  0.133333  0.333333  0  0.8   15

However, this is all very inefficient. We aren’t utiling the underlying NumPy arrays. Instead, you can use vectorised operations:

res = df.iloc[:, :-1].div(df.iloc[:, :-1].sum(1), axis=0)
res.mask(res < 0, 0, inplace=True)

print(res)

          A         B    C    D
0  0.133333  0.333333  0.0  0.8
Answered By: jpp