Check in every row of a df which columns of a column list have the value True and mark them in a "Test" column with some additional parameters

Question:

I have a dataframe which contains the values True/False in certain columns that are in a column list.

I have also a key column "Flag"

Ι need to create a new "test" column and fill it according to these scenarios, checking every row of every column in the col_list and the column "Flag"

1st scenario : The key column is False and all the columns in the col list are False -> write XET

2nd scenario : The key column is True and all the columns in the col list are False -> write Recheck

3rd scenario : The key column is False and there is a true value in any of the columns in the col_list -> write XET + col_name(s) that are true separated by underscore

4th scenario : The key column is True and there is a true value in any of the columns in the col_list -> write col_name(s) that are true separated by underscore

Example

col_list = [‘col1′,’col2′, … col’10’)

key column = df["Flag"]

In the first row of the df if col1 and col8 in col_list are True and "flag" column is True, write in the test column Col1_Col8

In the second row of the df if col2 and col4 in col_list are True and "flag" column is False, write in the test column XET_Col2_Col4

In the third row of the df if all the columns in the col_list are false write in the test column XET

Answers:

Use apply :

def generate_test(row):
    if not any(row[col] for col in col_list):
        return 'XET'
    
    elif not row['Flag'] and not any(row[col] for col in col_list):
        return 'XET'
    
    elif row['Flag'] and all(row[col] for col in col_list):
        return '_'.join(col_list)
    
    elif not row['Flag'] and any(row[col] for col in col_list):
        true_cols = [col for col in col_list if row[col]]
        return f"XET_{ '_'.join(true_cols)}"
    
    elif row['Flag'] and any(row[col] for col in col_list):
        true_cols = [col for col in col_list if row[col]]
        return '_'.join(true_cols)

df['test'] = df.apply(generate_test, axis=1)
Answered By: Abdulmajeed

Add numpy.select for generate ouput by condition by Flag column:

np.random.seed(123)

f = lambda x: f'col{x+1}'
df = (pd.DataFrame(np.random.choice([True, False], size=(5,10), p=(0.2, 0.8)))
        .rename(columns=f)
        .assign(Flag=[False, True, True, True, False]))
# print (df)
    
            
col_list = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 'col8', 'col9','col10']

df1 = df[col_list]
s = df1.dot(pd.Index(col_list) + '_').str[:-1]
m1 = df[col_list].any(axis=1)

df['Test'] = np.select([~m1 & ~df['Flag'], 
                        ~m1 & df['Flag'],
                        m1 & ~df['Flag']], 
                       ['XET', 'Recheck', 'XET' + s], s)

print (df)
    col1   col2   col3   col4   col5   col6   col7   col8   col9  col10  
0  False  False  False  False  False  False  False  False  False  False   
1  False  False  False   True  False  False   True   True  False  False   
2  False  False  False  False  False  False  False  False  False  False   
3   True  False  False  False  False  False  False  False  False  False   
4  False   True  False  False  False  False  False  False  False  False   

    Flag            Test  
0  False             XET  
1   True  col4_col7_col8  
2   True         Recheck  
3   True            col1  
4  False         XETcol2  

EDIT: With real data need preprocessing:

df1 = df.copy() 
df1.columns = df1.iloc[0,:] 

df1 = df1.iloc[1:,:] 
df1[col_list + ['Flag']] = df1[col_list + ['Flag']].eq('True')
Answered By: jezrael
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.