Python script to highlight data mismatch from predefined list
Question:
I have a predefined list of Service and CI in Excel:
Service
CI
Non-Financial Risk Management South Africa
Aravo
Non-Financial Risk Management South Africa
Business Resilience
Non-Financial Risk Management South Africa
Change Risk Management
First Line Control Attestation South Africa
Control First
Group Audit Assurance South Africa
DigiAud
Group Governance Advisory and Support South Africa
Diligent Boardbooks
I get an extract of data from our call logging system which also has the Service and CI columns.
I need to highlight if the Service and CI in the call extract does not match the predefined list.
My code so far works for one Service in the predefined list, I need to figure out how to add the rest of the listed Services in my predefined list. IF I run it as is it works for the Service named Non-Financial Risk Management South Africa but highlights all the other Services in RED.
from pathlib import Path
import pandas as pd
import numpy as np
import os
extract = Path.cwd() / "/extract.xlsx"
df_extract = pd.read_excel(extract)
m = (df_extract['Service'] == 'Non-Financial Risk Management South Africa') & (df_extract['CI'].isin(['Aravo', 'Business Resilience', 'Change Risk Management']))
(df_extract.style.apply(lambda x: np.where(m, '', 'background-color: red'))
.to_excel('/output.xlsx', index=False))
I tried adding a 2nd boolean mask but I cant figure out how to integrate with the np.where:
n = (df_extract['Service'] == 'First Line Control Attestation South Africa') & (df_extract['CI'].isin(['Control First']))
Answers:
I think you need chain both masks by |
for bitwise OR
:
(df_extract.style.apply(lambda x: np.where(m | n, '', 'background-color: red'))
.to_excel('/output.xlsx', index=False))
If there is more masks is possible create dictionary and pass to mask
for more readable code:
#add more key:values if necessary
d = {'Non-Financial Risk Management South Africa':['Aravo', 'Business Resilience',
'Change Risk Management'],
'First Line Control Attestation South Africa':['Control First']}
mask = np.logical_or.reduce([df_extract['Service'].eq(k) & df_extract['CI'].isin(v)
for k, v in d.items()])
(df_extract.style.apply(lambda x: np.where(mask, '', 'background-color: red'))
.to_excel('/output.xlsx', index=False))
# Predefined list of data
predefined_list = ['apple', 'banana', 'orange', 'grape', 'pear']
# Data to be checked for match with the predefined list
data_to_check = ['apple', 'banana', 'orange', 'kiwi', 'pear']
# Iterate through data to be checked and highlight mismatches
for data in data_to_check:
if data in predefined_list:
print(data)
else:
print(f'*{data}*')
I have a predefined list of Service and CI in Excel:
Service | CI |
---|---|
Non-Financial Risk Management South Africa | Aravo |
Non-Financial Risk Management South Africa | Business Resilience |
Non-Financial Risk Management South Africa | Change Risk Management |
First Line Control Attestation South Africa | Control First |
Group Audit Assurance South Africa | DigiAud |
Group Governance Advisory and Support South Africa | Diligent Boardbooks |
I get an extract of data from our call logging system which also has the Service and CI columns.
I need to highlight if the Service and CI in the call extract does not match the predefined list.
My code so far works for one Service in the predefined list, I need to figure out how to add the rest of the listed Services in my predefined list. IF I run it as is it works for the Service named Non-Financial Risk Management South Africa but highlights all the other Services in RED.
from pathlib import Path
import pandas as pd
import numpy as np
import os
extract = Path.cwd() / "/extract.xlsx"
df_extract = pd.read_excel(extract)
m = (df_extract['Service'] == 'Non-Financial Risk Management South Africa') & (df_extract['CI'].isin(['Aravo', 'Business Resilience', 'Change Risk Management']))
(df_extract.style.apply(lambda x: np.where(m, '', 'background-color: red'))
.to_excel('/output.xlsx', index=False))
I tried adding a 2nd boolean mask but I cant figure out how to integrate with the np.where:
n = (df_extract['Service'] == 'First Line Control Attestation South Africa') & (df_extract['CI'].isin(['Control First']))
I think you need chain both masks by |
for bitwise OR
:
(df_extract.style.apply(lambda x: np.where(m | n, '', 'background-color: red'))
.to_excel('/output.xlsx', index=False))
If there is more masks is possible create dictionary and pass to mask
for more readable code:
#add more key:values if necessary
d = {'Non-Financial Risk Management South Africa':['Aravo', 'Business Resilience',
'Change Risk Management'],
'First Line Control Attestation South Africa':['Control First']}
mask = np.logical_or.reduce([df_extract['Service'].eq(k) & df_extract['CI'].isin(v)
for k, v in d.items()])
(df_extract.style.apply(lambda x: np.where(mask, '', 'background-color: red'))
.to_excel('/output.xlsx', index=False))
# Predefined list of data
predefined_list = ['apple', 'banana', 'orange', 'grape', 'pear']
# Data to be checked for match with the predefined list
data_to_check = ['apple', 'banana', 'orange', 'kiwi', 'pear']
# Iterate through data to be checked and highlight mismatches
for data in data_to_check:
if data in predefined_list:
print(data)
else:
print(f'*{data}*')