For loop with conditional statements not working as expected (Pandas)

Question:

#Loop through example dataset 
for id in df['val_id']:
        #total assigned value 
        df['tot_val'] = 0
        #loop through facilities 
        for fac in df['fac_id']:
            if df['product'].isin(['XL', 'CL', 'DL']).all():
                df['row'] = min(df['our_val_amt'] - df['tot_val'], df['val_against'])
            else:
                df['row'] = 0
                
            df['tot_val'] = df['tot_val'] + df['row']
            df['row'] = df['row'] + 1


Sample to test val_against > val_amt change
# df = pd.DataFrame(data=[["compx","xx1","yy1",424,418,"XL"],["compx","xx1","yy2",424,134,"CL"],["compx","xx2","yy3",472,60,"DL"],["compx","xx2","yy4",472,104,"CL"], ["compx", "xx3", "yy5", 490, 50, "XL"], ["compx", "xx3", "yy6", 490, 500, "CL"], ["compx", "xx3", "yy7", 490, 200, "DL"], ["compx", "xx4", "yy8", 510, 200, "CL"], ["compx", "xx4", "yy9", 510, 300, "CL"], ["compx", "xx4", "yy10", 510, 50, "XL"]], columns=["name","val_id","fac_id","our_val_amt","val_against","product"])


I am trying to in Pandas double loop through the ‘val_id’ and ‘fac_id’ column and create a new field from the following conditions.

1.Within each ‘val_id’ loop if ‘product’ == ‘CL’ then min of ‘val_against’ and ‘our_val_amt’ e.g. min( val_against (134), our_val_amt (424)) therefore ‘NEW FIELD’ = 134. Also if the sum of NEW FIELD exceeds "our_val_amt", then subtract it from "our_val_amt". e.g. for val_id "xx4", (200 + 300 + 50) = 550 which exceeds our_val_amt=510, so NEW FILED = 550 – 510 = 10.

2.If product != ‘CL’ and is in the same ‘val_id’ group. The remainder to be subtracted from ‘our_val_amt’ to be inserted in ‘NEW FIELD’. e.g ‘our_val_amt’ (424) – from step 1 (134) = 290. This inserted above ‘NEW FIELD’.

3.Repeat steps for val_id xx2. NEW FIELD calculation for CL = 104 and XL = 472 – 104 = 368.

Dataset with computed fields labeled with **

|  name | val_id | fac_id  | our_val_amt | val_against | product | **row** | **totval**|**Field Want**
| compx |  xx1   |  yy1    |   424       |    418      |   XL    |   290   |     0     |   290
| compx |  xx1   |  yy2    |   424       |    134      |   CL    |   134   |    134    |   134
| compx |  xx2   |  yy3    |   472       |    60       |   DL    |   0     |     0     |   368
| compx |  xx2   |  yy4    |   472       |    104      |   CL    |   104   |   104     |   104
| compx |  xx3   |  yy5    |   490       |    50       |   XL    | 
| compx |  xx3   |  yy6    |   490       |    500      |   CL    | 
| compx |  xx3   |  yy7    |   490       |    200      |   DL    | 
| compx |  xx4   |  yy8    |   510       |    200      |   CL    | 
| compx |  xx4   |  yy9    |   510       |    300      |   CL    | 
| compx |  xx4   |  yy10   |   510       |    50       |   CL    | 

Expected Output

|  name | val_id | fac_id  | our_val_amt | val_against | product | new field |
| compx |  xx1   |  yy1    |   424       |    418      |   XL    | 290       | 
| compx |  xx1   |  yy2    |   424       |    134      |   CL    |  134      |  
| compx |  xx2   |  yy3    |   472       |    60       |   DL    | 368       |  
| compx |  xx2   |  yy4    |   472       |    104      |   CL    |  104      |
| compx |  xx3   |  yy5    |   490       |    50       |   XL    | 0         | 
| compx |  xx3   |  yy6    |   490       |    500      |   CL    |  490      |  
| compx |  xx3   |  yy7    |   490       |    200      |   DL    | 0         |  
| compx |  xx4   |  yy8    |   510       |    200      |   CL    |  200      |
| compx |  xx4   |  yy9    |   510       |    300      |   CL    |  300      |
| compx |  xx4   |  yy10   |   510       |    50       |   CL    |  10       |

The **row** and ** totval** are fields i created trying to create the field that i want with the logic above.

I’ve tried grouping the variables together and having the ‘product’ as columns but unfortunately i receive 6000+ columns and have no idea on how to compute the calculation for this scenario.

I’ve also attempted to loop through but received incorrect outputs / errors

error: Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

Asked By: svenvuko

||

Answers:

You can solve this in 2 steps:

  • In Step 1, compute the new field values for "CL". Following solution assumes 0 value for new_field if computing condition is not met.
  • In Step 2, process non-CL records and use values computed in step 1. To handle case where multiple "CL" records may be present in same val_id group, the following code sums them up.

The comments are added before each step to explain what it does.

df = pd.DataFrame(data=[["compx","xx1","yy1",424,418,"XL"],["compx","xx1","yy2",424,134,"CL"],["compx","xx2","yy3",472,60,"DL"],["compx","xx2","yy4",472,104,"CL"],
                        ["compx","xx3","yy5",490,50,"XL"],["compx","xx3","yy6",490,500,"CL"],["compx","xx3","yy7",490,200,"DL"],
                        ["compx","xx4","yy8",510,200,"CL"],["compx","xx4","yy9",510,300,"CL"],["compx","xx4","yy10",510,50,"CL"],
                        ], columns=["name","val_id","fac_id","our_val_amt","val_against","product"])

# Compute tuple of "our_val_amt", "val_against" and "product" for easy processing as one column. It is hard to process multiple columns with "transform()".
df["the_tuple"] = df[["our_val_amt", "val_against", "product"]].apply(tuple, axis=1)

def compute_new_field_for_cl(g):
  # df_g is a tuple ("our_val_amt", "val_against", "product") indexed as (0, 1, 2).
  df_g = g.apply(pd.Series)
  df_g["new_field"] = df_g.apply(lambda row: min(row[0], row[1]) if row[2] == "CL" else 0, axis=1)
  df_g["cumsum"] = df_g["new_field"].cumsum()
  df_g["new_field"] = df_g.apply(lambda row: 0 if row["cumsum"] > row[0] else row["new_field"], axis=1)
  df_g["max_cumsum"] = df_g["new_field"].cumsum()
  df_g["new_field"] = df_g.apply(lambda row: row[0] - row["max_cumsum"] if row["cumsum"] > row[0] else row["new_field"], axis=1)
  return df_g["new_field"]

# Apply above function and compute new field values for "CL".
df["new_field"] = df.groupby("val_id")[["the_tuple"]].transform(compute_new_field_for_cl)

# Re-compute tuple of "our_val_amt", "new_field" and "product".
df["the_tuple"] = df[["our_val_amt", "new_field", "product"]].apply(tuple, axis=1)

def compute_new_field_for_not_cl(g):
  # df_g is a tuple ("our_val_amt", "new_field", "product") indexed as (0, 1, 2).
  df_g = g.apply(pd.Series)
  result_sr = df_g.where(df_g[2] != "CL")[0] - df_g[df_g[2] == "CL"][1].sum()
  result_sr = result_sr.fillna(0) + df_g[1]
  return result_sr

# Apply above function and compute new field values for "CL".
df["new_field"] = df.groupby("val_id")[["the_tuple"]].transform(compute_new_field_for_not_cl)

df = df.drop("the_tuple", axis=1)

print(df)

Output:

    name val_id fac_id  our_val_amt  val_against product  new_field
0  compx    xx1    yy1          424          418      XL      290.0
1  compx    xx1    yy2          424          134      CL      134.0
2  compx    xx2    yy3          472           60      DL      368.0
3  compx    xx2    yy4          472          104      CL      104.0
4  compx    xx3    yy5          490           50      XL        0.0
5  compx    xx3    yy6          490          500      CL      490.0
6  compx    xx3    yy7          490          200      DL        0.0
7  compx    xx4    yy8          510          200      CL      200.0
8  compx    xx4    yy9          510          300      CL      300.0
9  compx    xx4   yy10          510           50      CL       10.0
Answered By: Azhar Khan