Check in every row of a df which columns of a column list have the value True and mark them in a "Test" column with some additional parameters
Question:
I have a dataframe which contains the values True/False in certain columns that are in a column list.
I have also a key column "Flag"
Ι need to create a new "test" column and fill it according to these scenarios, checking every row of every column in the col_list and the column "Flag"
1st scenario : The key column is False and all the columns in the col list are False -> write XET
2nd scenario : The key column is True and all the columns in the col list are False -> write Recheck
3rd scenario : The key column is False and there is a true value in any of the columns in the col_list -> write XET + col_name(s) that are true separated by underscore
4th scenario : The key column is True and there is a true value in any of the columns in the col_list -> write col_name(s) that are true separated by underscore
Example
col_list = [‘col1′,’col2′, … col’10’)
key column = df["Flag"]
In the first row of the df if col1 and col8 in col_list are True and "flag" column is True, write in the test column Col1_Col8
In the second row of the df if col2 and col4 in col_list are True and "flag" column is False, write in the test column XET_Col2_Col4
In the third row of the df if all the columns in the col_list are false write in the test column XET
Answers:
Use apply :
def generate_test(row):
if not any(row[col] for col in col_list):
return 'XET'
elif not row['Flag'] and not any(row[col] for col in col_list):
return 'XET'
elif row['Flag'] and all(row[col] for col in col_list):
return '_'.join(col_list)
elif not row['Flag'] and any(row[col] for col in col_list):
true_cols = [col for col in col_list if row[col]]
return f"XET_{ '_'.join(true_cols)}"
elif row['Flag'] and any(row[col] for col in col_list):
true_cols = [col for col in col_list if row[col]]
return '_'.join(true_cols)
df['test'] = df.apply(generate_test, axis=1)
Add numpy.select
for generate ouput by condition by Flag
column:
np.random.seed(123)
f = lambda x: f'col{x+1}'
df = (pd.DataFrame(np.random.choice([True, False], size=(5,10), p=(0.2, 0.8)))
.rename(columns=f)
.assign(Flag=[False, True, True, True, False]))
# print (df)
col_list = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 'col8', 'col9','col10']
df1 = df[col_list]
s = df1.dot(pd.Index(col_list) + '_').str[:-1]
m1 = df[col_list].any(axis=1)
df['Test'] = np.select([~m1 & ~df['Flag'],
~m1 & df['Flag'],
m1 & ~df['Flag']],
['XET', 'Recheck', 'XET' + s], s)
print (df)
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
0 False False False False False False False False False False
1 False False False True False False True True False False
2 False False False False False False False False False False
3 True False False False False False False False False False
4 False True False False False False False False False False
Flag Test
0 False XET
1 True col4_col7_col8
2 True Recheck
3 True col1
4 False XETcol2
EDIT: With real data need preprocessing:
df1 = df.copy()
df1.columns = df1.iloc[0,:]
df1 = df1.iloc[1:,:]
df1[col_list + ['Flag']] = df1[col_list + ['Flag']].eq('True')
I have a dataframe which contains the values True/False in certain columns that are in a column list.
I have also a key column "Flag"
Ι need to create a new "test" column and fill it according to these scenarios, checking every row of every column in the col_list and the column "Flag"
1st scenario : The key column is False and all the columns in the col list are False -> write XET
2nd scenario : The key column is True and all the columns in the col list are False -> write Recheck
3rd scenario : The key column is False and there is a true value in any of the columns in the col_list -> write XET + col_name(s) that are true separated by underscore
4th scenario : The key column is True and there is a true value in any of the columns in the col_list -> write col_name(s) that are true separated by underscore
Example
col_list = [‘col1′,’col2′, … col’10’)
key column = df["Flag"]
In the first row of the df if col1 and col8 in col_list are True and "flag" column is True, write in the test column Col1_Col8
In the second row of the df if col2 and col4 in col_list are True and "flag" column is False, write in the test column XET_Col2_Col4
In the third row of the df if all the columns in the col_list are false write in the test column XET
Use apply :
def generate_test(row):
if not any(row[col] for col in col_list):
return 'XET'
elif not row['Flag'] and not any(row[col] for col in col_list):
return 'XET'
elif row['Flag'] and all(row[col] for col in col_list):
return '_'.join(col_list)
elif not row['Flag'] and any(row[col] for col in col_list):
true_cols = [col for col in col_list if row[col]]
return f"XET_{ '_'.join(true_cols)}"
elif row['Flag'] and any(row[col] for col in col_list):
true_cols = [col for col in col_list if row[col]]
return '_'.join(true_cols)
df['test'] = df.apply(generate_test, axis=1)
Add numpy.select
for generate ouput by condition by Flag
column:
np.random.seed(123)
f = lambda x: f'col{x+1}'
df = (pd.DataFrame(np.random.choice([True, False], size=(5,10), p=(0.2, 0.8)))
.rename(columns=f)
.assign(Flag=[False, True, True, True, False]))
# print (df)
col_list = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 'col8', 'col9','col10']
df1 = df[col_list]
s = df1.dot(pd.Index(col_list) + '_').str[:-1]
m1 = df[col_list].any(axis=1)
df['Test'] = np.select([~m1 & ~df['Flag'],
~m1 & df['Flag'],
m1 & ~df['Flag']],
['XET', 'Recheck', 'XET' + s], s)
print (df)
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
0 False False False False False False False False False False
1 False False False True False False True True False False
2 False False False False False False False False False False
3 True False False False False False False False False False
4 False True False False False False False False False False
Flag Test
0 False XET
1 True col4_col7_col8
2 True Recheck
3 True col1
4 False XETcol2
EDIT: With real data need preprocessing:
df1 = df.copy()
df1.columns = df1.iloc[0,:]
df1 = df1.iloc[1:,:]
df1[col_list + ['Flag']] = df1[col_list + ['Flag']].eq('True')