How can I highlight cells with categorical variables?
Question:
I have a pandas dataframe called value_matrix_classification which looks as follows:
{('wind_on_share',
'Wind-onshore power generation'): {('AIM/CGE 2.0',
'ADVANCE_2020_WB2C'): 'high', ('AIM/CGE 2.0',
'ADVANCE_2030_Price1.5C'): 'high', ('AIM/CGE 2.0',
'ADVANCE_2030_WB2C'): 'high', ('IMAGE 3.0.1',
'ADVANCE_2020_WB2C'): 'low', ('IMAGE 3.0.1',
'ADVANCE_2030_WB2C'): 'low', ('MESSAGE-GLOBIOM 1.0',
'ADVANCE_2020_WB2C'): 'low'},
('wind_off_share',
'Wind-offshore power generation'): {('AIM/CGE 2.0',
'ADVANCE_2020_WB2C'): nan, ('AIM/CGE 2.0',
'ADVANCE_2030_Price1.5C'): nan, ('AIM/CGE 2.0',
'ADVANCE_2030_WB2C'): nan, ('IMAGE 3.0.1',
'ADVANCE_2020_WB2C'): 'low', ('IMAGE 3.0.1',
'ADVANCE_2030_WB2C'): 'low', ('MESSAGE-GLOBIOM 1.0',
'ADVANCE_2020_WB2C'): 'low'}}
The two columns in the right contain low, medium and high
which are categorical variables. I created them using pd.cut(value_matrix_classification, bins = 3, labels = ["low", "medium", "high"]
I’d like to highlight the pandas dataframe such that there are red, orange, yellow and background color for high, medium, low and NaN values respectively.
I wrote the following function
def highlight_cells(x):
if x == "high":
color = "red"
elif x=="medium":
color = "orange"
elif x=="low":
color = "yellow"
else:
color = "gray"
return [f"background-color: {color}"]
and applied it to the dataframe
value_matrix_classification.style.apply(highlight_cells)
However, this gives ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). What would be the appropriate way to do the highlighting here?
I was able to highlight the cells with null values only using
value_matrix_classification.style.highlight_null(null_color = "gray")
I am attaching the screenshot here just for the convenience of the reader.
How can I highlight all the cells based on the given categories: low, medium and high?
Answers:
apply
takes an entire row or column as input. Use applymap
instead.
See this Pandas documentation section.
Edit: you’ll also want highlight_cells
to return just f"background-color: {color}"
, not wrapped in a list.
To add more detail, suppose you have
np.random.seed(0)
df = pd.DataFrame(np.random.randn(4,2), columns=list('AB'))
>>> df
A B
0 -0.686760 -0.791461
1 -0.497699 -1.287310
2 0.793787 0.525824
3 0.501172 1.695914
To understand what is happening, we compare a column against a value=0.2. A column of booleans is returned. This is true for and, or, not, if, while
. When you have multiple criteria, you will get multiple columns returned.
>>> df.B > 0.2
0 True
1 True
2 False
3 False
Name: B, dtype: bool
Now lets do comparsion
if df.B > 0.2:
print("do something")
>>> ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
The above comparison is equal to the case below and not clear what the result should be.Should it be True because it’s not zero-length? False because there are False values? It is unclear, so instead, pandas raises a ValueError:
if Series([True, True, False, False]) > 0.2:
print("do something")
So we need to get those multiple values into a single bool value, depending on what we want to do.
if pd.Series([True, True, False, False]).any(): # evaluates to True
print("I checked if there was any True value in the Series!)
>>> I checked if there was any True value in the Series!
if pd.Series([True, True, False, False]).all(): # Evaluates to False
print("I checked if there were all True values in the Series!")
Series.map
+ fillna
to create a Series of styles for each column is a more common approach to this type of problem:
def highlight_cells(x):
return 'background-color: ' + x.map(
# Associate Values to a given colour code
{'high': 'red', 'medium': 'orange', 'low': 'yellow'}
).fillna('gray') # Fill unmapped values with default
value_matrix_classification.style.apply(highlight_cells)
Each column is mapped to a new set of colour codes.
This is how the styles are determined using just the second column as a reference, but Styler.apply
will call on all columns in the subset:
value_matrix_classification.iloc[:, 1].map(
{'high': 'red', 'medium': 'orange', 'low': 'yellow'}
)
AIM/CGE 2.0 ADVANCE_2020_WB2C NaN
ADVANCE_2030_Price1.5C NaN
ADVANCE_2030_WB2C NaN
IMAGE 3.0.1 ADVANCE_2020_WB2C yellow
ADVANCE_2030_WB2C yellow
MESSAGE-GLOBIOM 1.0 ADVANCE_2020_WB2C yellow
Name: (wind_off_share, Wind-offshore power generation), dtype: object
Then fillna
is used to replace an unmapped values with a default. Note this is not a NaN repr, but rather the default for any value which does not appear in the mapping dictionary:
value_matrix_classification.iloc[:, 1].map(
{'high': 'red', 'medium': 'orange', 'low': 'yellow'}
).fillna('gray')
AIM/CGE 2.0 ADVANCE_2020_WB2C gray # NaN replaced with gray
ADVANCE_2030_Price1.5C gray
ADVANCE_2030_WB2C gray
IMAGE 3.0.1 ADVANCE_2020_WB2C yellow
ADVANCE_2030_WB2C yellow
MESSAGE-GLOBIOM 1.0 ADVANCE_2020_WB2C yellow
Name: (wind_off_share, Wind-offshore power generation), dtype: object
Lastly, add the css property:
'background-color: ' + value_matrix_classification.iloc[:, 1].map(
{'high': 'red', 'medium': 'orange', 'low': 'yellow'}
).fillna('gray')
AIM/CGE 2.0 ADVANCE_2020_WB2C background-color: gray # valid css style
ADVANCE_2030_Price1.5C background-color: gray
ADVANCE_2030_WB2C background-color: gray
IMAGE 3.0.1 ADVANCE_2020_WB2C background-color: yellow
ADVANCE_2030_WB2C background-color: yellow
MESSAGE-GLOBIOM 1.0 ADVANCE_2020_WB2C background-color: yellow
Name: (wind_off_share, Wind-offshore power generation), dtype: object
I have a pandas dataframe called value_matrix_classification which looks as follows:
{('wind_on_share',
'Wind-onshore power generation'): {('AIM/CGE 2.0',
'ADVANCE_2020_WB2C'): 'high', ('AIM/CGE 2.0',
'ADVANCE_2030_Price1.5C'): 'high', ('AIM/CGE 2.0',
'ADVANCE_2030_WB2C'): 'high', ('IMAGE 3.0.1',
'ADVANCE_2020_WB2C'): 'low', ('IMAGE 3.0.1',
'ADVANCE_2030_WB2C'): 'low', ('MESSAGE-GLOBIOM 1.0',
'ADVANCE_2020_WB2C'): 'low'},
('wind_off_share',
'Wind-offshore power generation'): {('AIM/CGE 2.0',
'ADVANCE_2020_WB2C'): nan, ('AIM/CGE 2.0',
'ADVANCE_2030_Price1.5C'): nan, ('AIM/CGE 2.0',
'ADVANCE_2030_WB2C'): nan, ('IMAGE 3.0.1',
'ADVANCE_2020_WB2C'): 'low', ('IMAGE 3.0.1',
'ADVANCE_2030_WB2C'): 'low', ('MESSAGE-GLOBIOM 1.0',
'ADVANCE_2020_WB2C'): 'low'}}
The two columns in the right contain low, medium and high
which are categorical variables. I created them using pd.cut(value_matrix_classification, bins = 3, labels = ["low", "medium", "high"]
I’d like to highlight the pandas dataframe such that there are red, orange, yellow and background color for high, medium, low and NaN values respectively.
I wrote the following function
def highlight_cells(x):
if x == "high":
color = "red"
elif x=="medium":
color = "orange"
elif x=="low":
color = "yellow"
else:
color = "gray"
return [f"background-color: {color}"]
and applied it to the dataframe
value_matrix_classification.style.apply(highlight_cells)
However, this gives ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). What would be the appropriate way to do the highlighting here?
I was able to highlight the cells with null values only using
value_matrix_classification.style.highlight_null(null_color = "gray")
I am attaching the screenshot here just for the convenience of the reader.
How can I highlight all the cells based on the given categories: low, medium and high?
apply
takes an entire row or column as input. Use applymap
instead.
See this Pandas documentation section.
Edit: you’ll also want highlight_cells
to return just f"background-color: {color}"
, not wrapped in a list.
To add more detail, suppose you have
np.random.seed(0)
df = pd.DataFrame(np.random.randn(4,2), columns=list('AB'))
>>> df
A B
0 -0.686760 -0.791461
1 -0.497699 -1.287310
2 0.793787 0.525824
3 0.501172 1.695914
To understand what is happening, we compare a column against a value=0.2. A column of booleans is returned. This is true for and, or, not, if, while
. When you have multiple criteria, you will get multiple columns returned.
>>> df.B > 0.2
0 True
1 True
2 False
3 False
Name: B, dtype: bool
Now lets do comparsion
if df.B > 0.2:
print("do something")
>>> ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
The above comparison is equal to the case below and not clear what the result should be.Should it be True because it’s not zero-length? False because there are False values? It is unclear, so instead, pandas raises a ValueError:
if Series([True, True, False, False]) > 0.2:
print("do something")
So we need to get those multiple values into a single bool value, depending on what we want to do.
if pd.Series([True, True, False, False]).any(): # evaluates to True
print("I checked if there was any True value in the Series!)
>>> I checked if there was any True value in the Series!
if pd.Series([True, True, False, False]).all(): # Evaluates to False
print("I checked if there were all True values in the Series!")
Series.map
+ fillna
to create a Series of styles for each column is a more common approach to this type of problem:
def highlight_cells(x):
return 'background-color: ' + x.map(
# Associate Values to a given colour code
{'high': 'red', 'medium': 'orange', 'low': 'yellow'}
).fillna('gray') # Fill unmapped values with default
value_matrix_classification.style.apply(highlight_cells)
Each column is mapped to a new set of colour codes.
This is how the styles are determined using just the second column as a reference, but Styler.apply
will call on all columns in the subset:
value_matrix_classification.iloc[:, 1].map(
{'high': 'red', 'medium': 'orange', 'low': 'yellow'}
)
AIM/CGE 2.0 ADVANCE_2020_WB2C NaN
ADVANCE_2030_Price1.5C NaN
ADVANCE_2030_WB2C NaN
IMAGE 3.0.1 ADVANCE_2020_WB2C yellow
ADVANCE_2030_WB2C yellow
MESSAGE-GLOBIOM 1.0 ADVANCE_2020_WB2C yellow
Name: (wind_off_share, Wind-offshore power generation), dtype: object
Then fillna
is used to replace an unmapped values with a default. Note this is not a NaN repr, but rather the default for any value which does not appear in the mapping dictionary:
value_matrix_classification.iloc[:, 1].map(
{'high': 'red', 'medium': 'orange', 'low': 'yellow'}
).fillna('gray')
AIM/CGE 2.0 ADVANCE_2020_WB2C gray # NaN replaced with gray
ADVANCE_2030_Price1.5C gray
ADVANCE_2030_WB2C gray
IMAGE 3.0.1 ADVANCE_2020_WB2C yellow
ADVANCE_2030_WB2C yellow
MESSAGE-GLOBIOM 1.0 ADVANCE_2020_WB2C yellow
Name: (wind_off_share, Wind-offshore power generation), dtype: object
Lastly, add the css property:
'background-color: ' + value_matrix_classification.iloc[:, 1].map(
{'high': 'red', 'medium': 'orange', 'low': 'yellow'}
).fillna('gray')
AIM/CGE 2.0 ADVANCE_2020_WB2C background-color: gray # valid css style
ADVANCE_2030_Price1.5C background-color: gray
ADVANCE_2030_WB2C background-color: gray
IMAGE 3.0.1 ADVANCE_2020_WB2C background-color: yellow
ADVANCE_2030_WB2C background-color: yellow
MESSAGE-GLOBIOM 1.0 ADVANCE_2020_WB2C background-color: yellow
Name: (wind_off_share, Wind-offshore power generation), dtype: object