Boolean mask unexpected behavior when applying style

Question:

I am processing data where values may be of the format ‘<x’ I want to return ‘x/2’. So <5 would be returned as ‘2.5’. I have columns of mixed numbers and text. The problem is that I want to style the values that have been changed. Dummy data and code:

dummy={'Location': {0: 'Perth', 1: 'Perth', 2: 'Perth', 3: 'Perth', 4: 'Perth', 5: 'Perth', 6: 'Perth', 7: 'Perth', 8: 'Perth', 9: 'Perth', 10: 'Perth', 11: 'Perth', 12: 'Perth', 13: 'Perth', 14: 'Perth', 15: 'Perth', 16: 'Perth', 17: 'Perth'}, 'Date': {0: '11/01/2012 0:00', 1: '11/01/2012 0:00', 2: '20/03/2012 0:00', 3: '6/06/2012 0:00', 4: '14/09/2012 0:00', 5: '17/12/2013 0:00', 6: '1/02/2014 0:00', 7: '1/02/2014 0:00', 8: '1/02/2014 0:00', 9: '1/02/2014 0:00', 10: '1/02/2014 0:00', 11: '1/02/2014 0:00', 12: '1/02/2014 0:00', 13: '1/02/2014 0:00', 14: '1/02/2014 0:00', 15: '1/02/2014 0:00', 16: '1/02/2014 0:00', 17: '1/02/2014 0:00'}, 'As µg/L': {0: '9630', 1: '9630', 2: '8580', 3: '4990', 4: '6100', 5: '282', 6: '21', 7: '<1', 8: '<1', 9: '<1', 10: '<1', 11: '<1', 12: '<1', 13: '<1', 14: '<1', 15: '<1', 16: '<1', 17: '<1'}, 'As': {0: '9.63', 1: '9.63', 2: '8.58', 3: '4.99', 4: '6.1', 5: '0.282', 6: '0.021', 7: '<1', 8: '<1', 9: '<1', 10: '<1', 11: '<1', 12: '<1', 13: '<1', 14: '<1', 15: '<1', 16: '<1', 17: '10'}, 'Ba': {0: 1000.0, 1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan, 6: np.nan, 7: np.nan, 8: np.nan, 9: np.nan, 10: np.nan, 11: np.nan, 12: np.nan, 13: np.nan, 14: np.nan, 15: np.nan, 16: np.nan, 17: np.nan}, 'HCO3': {0: '10.00', 1: '0.50', 2: '0.50', 3: '<22', 4: '0.50', 5: '0.50', 6: '0.50', 7: np.nan, 8: np.nan, 9: np.nan, 10: '0.50', 11: np.nan, 12: np.nan, 13: np.nan, 14: np.nan, 15: np.nan, 16: np.nan, 17: np.nan}, 'Cd': {0: 0.0094, 1: 0.0094, 2: 0.011, 3: 0.0035, 4: 0.004, 5: 0.002, 6: 0.0019, 7: np.nan, 8: np.nan, 9: np.nan, 10: np.nan, 11: np.nan, 12: np.nan, 13: np.nan, 14: np.nan, 15: np.nan, 16: np.nan, 17: np.nan}, 'Ca': {0: 248.0, 1: 248.0, 2: 232.0, 3: 108.0, 4: 150.0, 5: 396.0, 6: 472.0, 7: np.nan, 8: np.nan, 9: np.nan, 10: np.nan, 11: np.nan, 12: np.nan, 13: np.nan, 14: 472.0, 15: np.nan, 16: np.nan, 17: np.nan}, 'CO3': {0: 0.5, 1: 0.5, 2: 0.5, 3: 0.5, 4: 0.5, 5: 0.5, 6: 0.5, 7: np.nan, 8: np.nan, 9: 0.5, 10: np.nan, 11: np.nan, 12: np.nan, 13: np.nan, 14: np.nan, 15: np.nan, 16: np.nan, 17: np.nan}, 'Cl': {0: 2.0, 1: 2.0, 2: 2.0, 3: 2.0, 4: 0.5, 5: 2.0, 6: 5.0, 7: np.nan, 8: np.nan, 9: np.nan, 10: np.nan, 11: np.nan, 12: np.nan, 13: 5.0, 14: np.nan, 15: np.nan, 16: np.nan, 17: np.nan}}

df=pd.DataFrame(dummy)
import pandas a pd
import numpy as np

mask = df.applymap(lambda x: (isinstance(x, str) and x.startswith('<')))
def remove_less_thans(x):

    if type(x) is int:
        return x
    elif type(x) is float:
        return x
    elif type(x) is str and x[0]=="<":
        try:
            return float(x[1:])/2
        except:
            return x
    elif type(x) is str and len(x)<10:
        try:
            return float(x)
        except:
            return x
    else:
        return x
def colour_mask(val):
    
    colour='color: red; font-weight: bold' if val in df.values[mask] else ''
    
    return colour

#perform remove less-thans and divide the remainder by two
df=df.applymap(remove_less_thans)

styled_df= df.style.applymap(colour_mask)
styled_df

the mask looks correct, the remove < function works ok but I get values formatted when they shouldn’t be. In the dummy data the HCO3 column has the 0.5 values reformatted even though they do no start with < and are not appearing as True in the mask. I know that they are numbers stored as text but that is how the real data might appear and given the mask is being constructed as expected (i.e. the one True is there and the rest of the values in the column are False) I don’t know why they are being formatted. Same for column CO3, all the non-nan values are formatted when none should be. Why is this happening and how do I fix it? Dataframe
input data showing <values
Output

output styled df

Asked By: flashliquid

||

Answers:

Idea is pass mask to Styler.apply with numpy.where:

def colour_mask(x):
    
    arr = np.where(mask, 'color: red; font-weight: bold', '')
    return pd.DataFrame(arr, index=x.index, columns=x.columns)

styled_df = df.style.apply(colour_mask, axis=None)

Or:

def colour_mask(x, props=''):

        return np.where(mask, props, '')

styled_df = df.style.apply(colour_mask, props='color: red; font-weight: bold', axis=None)
Answered By: jezrael
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.