Python script to highlight data mismatch from predefined list

Question:

I have a predefined list of Service and CI in Excel:

Service CI
Non-Financial Risk Management South Africa Aravo
Non-Financial Risk Management South Africa Business Resilience
Non-Financial Risk Management South Africa Change Risk Management
First Line Control Attestation South Africa Control First
Group Audit Assurance South Africa DigiAud
Group Governance Advisory and Support South Africa Diligent Boardbooks

I get an extract of data from our call logging system which also has the Service and CI columns.
I need to highlight if the Service and CI in the call extract does not match the predefined list.

My code so far works for one Service in the predefined list, I need to figure out how to add the rest of the listed Services in my predefined list. IF I run it as is it works for the Service named Non-Financial Risk Management South Africa but highlights all the other Services in RED.

from pathlib import Path
import pandas as pd
import numpy as np
import os

extract = Path.cwd() / "/extract.xlsx"
df_extract = pd.read_excel(extract)


m = (df_extract['Service'] == 'Non-Financial Risk Management South Africa') & (df_extract['CI'].isin(['Aravo', 'Business Resilience', 'Change Risk Management']))


(df_extract.style.apply(lambda x: np.where(m, '', 'background-color: red'))
   .to_excel('/output.xlsx', index=False))

I tried adding a 2nd boolean mask but I cant figure out how to integrate with the np.where:

n = (df_extract['Service'] == 'First Line Control Attestation South Africa') & (df_extract['CI'].isin(['Control First']))
Asked By: Dinerz

||

Answers:

I think you need chain both masks by | for bitwise OR:

(df_extract.style.apply(lambda x: np.where(m | n, '', 'background-color: red'))
           .to_excel('/output.xlsx', index=False))

If there is more masks is possible create dictionary and pass to mask for more readable code:

#add more key:values if necessary
d = {'Non-Financial Risk Management South Africa':['Aravo', 'Business Resilience',
                                                   'Change Risk Management'],
     'First Line Control Attestation South Africa':['Control First']}

mask = np.logical_or.reduce([df_extract['Service'].eq(k) &  df_extract['CI'].isin(v)  
                             for k, v in d.items()])

(df_extract.style.apply(lambda x: np.where(mask, '', 'background-color: red'))
           .to_excel('/output.xlsx', index=False))
Answered By: jezrael
# Predefined list of data
predefined_list = ['apple', 'banana', 'orange', 'grape', 'pear']

# Data to be checked for match with the predefined list
data_to_check = ['apple', 'banana', 'orange', 'kiwi', 'pear']

# Iterate through data to be checked and highlight mismatches
for data in data_to_check:
    if data in predefined_list:
        print(data)
    else:
        print(f'*{data}*')
Answered By: Enggar R Hariawan
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.