How can I fill empty DataFrame based on conditions?

Question:

I have following dataframe called condition:

     [0]   [1]   [2]   [3]
1     0     0     1     0     
2     0     1     0     0     
3     0     0     0     1     
4     0     0     0     1     

For easier reproduction:

import numpy as np
import pandas as pd
n=4
t=3   
condition = pd.DataFrame([[0,0,1,0], [0,1,0,0], [0,0,0, 1], [0,0,0, 1]], columns=['0','1', '2', '3'])
condition.index=np.arange(1,n+1)

Further I have several dataframes that should be filled in a foor loop

df = pd.DataFrame([],index = range(1,n+1),columns= range(t+1) ) #NaN DataFrame
df_2 = pd.DataFrame([],index = range(1,n+1),columns= range(t+1) ) 
df_3 = pd.DataFrame(3,index = range(1,n+1),columns= range(t+1) )

for i,t in range(t,-1,-1):
    
    if condition[t]==1: 
        df.loc[:,t] = df_3.loc[:,t]**2
        df_2.loc[:,t]=0
    elif (condition == 0 and no 1 in any column after t)
        df.loc[:,t] = 2.5
       ....
    else:
        df.loc[:,t] = 5
        df_2.loc[:,t]= df.loc[:,t+1] 

I am aware that this for loop is not correct, but what I wanted to do, is to check elementwise condition (recursevly) and if it is 1 (in condition) to fill dataframe df with squared valued of df_3. If it is 0 in condition, I should differentiate two cases.

In the first case, there are no 1 after 0 (row 1 and 2 in condition) then df = 2.5

Second case, there was 1 after and fill df with 5 (row 3 and 4)

So the dataframe df should look something like this

     [0]   [1]   [2]   [3]
1     5     5     9    2.5     
2     5     9    2.5   2.5     
3     5     5     5     9     
4     5     5     5     9   

The code should include for loop.
Thanks!

Asked By: Boom

||

Answers:

I am not sure if this is what you want, but based on your desired output you can do this with only masking operations (which is more efficient than looping over the rows anyway). Your code could look like this:

is_one = condition.astype(bool)
is_after_one = (condition.cumsum(axis=1) - condition).astype(bool)

df = pd.DataFrame(5, index=condition.index, columns=condition.columns)
df_2 = pd.DataFrame(2.5, index=condition.index, columns=condition.columns)
df_3 = pd.DataFrame(3, index=condition.index, columns=condition.columns)

df.where(~is_one, other=df_3 * df_3, inplace=True)
df.where(~is_after_one, other=df_2, inplace=True)

which yields:

   0  1    2    3
1  5  5  9.0  2.5
2  5  9  2.5  2.5
3  5  5  5.0  9.0
4  5  5  5.0  9.0

EDIT after comment:

If you really want to loop explicitly over the rows and columns, you could do it like this with the same result:

n_rows = condition.index.size
n_cols = condition.columns.size

for row_index in range(n_rows):
    for col_index in range(n_cols):
        cond = condition.iloc[row_index, col_index]
        if col_index < n_cols - 1:
            rest_row = condition.iloc[row_index, col_index + 1:].to_list()
        else:
            rest_row = []
        if cond == 1:
            df.iloc[row_index, col_index] = df_3.iloc[row_index, col_index] ** 2
        elif cond == 0 and 1 not in rest_row:
            # fill whole row at once
            df.iloc[row_index, col_index:] = 2.5
            # stop iterating over the rest
            break
        else:
            df.iloc[row_index, col_index] = 5
            df_2.loc[:, col_index] = df.iloc[:, col_index + 1]

The result is the same, but this is much more inefficient and ugly, so I would not recommend it like this

Answered By: Eelco van Vliet