Python script to highlight data mismatch from predefined list

Question

I have a predefined list of Service and CI in Excel:

Service	CI
Non-Financial Risk Management South Africa	Aravo
Non-Financial Risk Management South Africa	Business Resilience
Non-Financial Risk Management South Africa	Change Risk Management
First Line Control Attestation South Africa	Control First
Group Audit Assurance South Africa	DigiAud
Group Governance Advisory and Support South Africa	Diligent Boardbooks

I get an extract of data from our call logging system which also has the Service and CI columns.
I need to highlight if the Service and CI in the call extract does not match the predefined list.

My code so far works for one Service in the predefined list, I need to figure out how to add the rest of the listed Services in my predefined list. IF I run it as is it works for the Service named Non-Financial Risk Management South Africa but highlights all the other Services in RED.

from pathlib import Path
import pandas as pd
import numpy as np
import os

extract = Path.cwd() / "/extract.xlsx"
df_extract = pd.read_excel(extract)


m = (df_extract['Service'] == 'Non-Financial Risk Management South Africa') & (df_extract['CI'].isin(['Aravo', 'Business Resilience', 'Change Risk Management']))


(df_extract.style.apply(lambda x: np.where(m, '', 'background-color: red'))
   .to_excel('/output.xlsx', index=False))

I tried adding a 2nd boolean mask but I cant figure out how to integrate with the np.where:

n = (df_extract['Service'] == 'First Line Control Attestation South Africa') & (df_extract['CI'].isin(['Control First']))

Asked By: Dinerz

||

Source

Answer 1

I think you need chain both masks by | for bitwise OR:

(df_extract.style.apply(lambda x: np.where(m | n, '', 'background-color: red'))
           .to_excel('/output.xlsx', index=False))

If there is more masks is possible create dictionary and pass to mask for more readable code:

#add more key:values if necessary
d = {'Non-Financial Risk Management South Africa':['Aravo', 'Business Resilience',
                                                   'Change Risk Management'],
     'First Line Control Attestation South Africa':['Control First']}

mask = np.logical_or.reduce([df_extract['Service'].eq(k) &  df_extract['CI'].isin(v)  
                             for k, v in d.items()])

(df_extract.style.apply(lambda x: np.where(mask, '', 'background-color: red'))
           .to_excel('/output.xlsx', index=False))

Answered By: jezrael

Answer 2

# Predefined list of data
predefined_list = ['apple', 'banana', 'orange', 'grape', 'pear']

# Data to be checked for match with the predefined list
data_to_check = ['apple', 'banana', 'orange', 'kiwi', 'pear']

# Iterate through data to be checked and highlight mismatches
for data in data_to_check:
    if data in predefined_list:
        print(data)
    else:
        print(f'*{data}*')

Answered By: Enggar R Hariawan

Python script to highlight data mismatch from predefined list

Question:

Answers: