Is it possible to highlight groups of rows in Pandas dataframe?
Question:
I was wondering if there is a way to highlight rows in Pandas dataframe based on values in some specific column? For example:
As can be seen above, in Col_4, values are different. Therefore, is it possible to highlight rows belonging to distinct values? Or, to make it more complex, highlight rows based on different values in multiple columns?
Answers:
With the following toy dataframe:
import pandas as pd
df = pd.DataFrame(
{
"col1": ["A", "F", "A", "F", "A", "A", "A", "A"],
"col2": ["B", "B", "B", "B", "B", "G", "G", "B"],
"col3": ["C", "H", "C", "H", "C", "I", "C", "I"],
"col4": ["D", "E", "D", "E", "E", "D", "D", "E"],
}
)
Here is one way to do it:
# Sort value and add an unique identifier to identical rows
df = df.sort_values(["col1", "col2", "col3", "col4"]).reset_index(drop=True)
df["hash"] = df.apply(lambda x: hash("".join(x)), axis=1)
# Attribut a unique unique color to each identifier
import random
colors = [
f"#{random.randint(0,255):02X}{random.randint(0,255):02X}{random.randint(0,255):02X}"
for _ in range(df.shape[0])
]
color_mapping = {}
for value in df["hash"].unique():
color = colors.pop(0)
if value not in color_mapping:
color_mapping[value] = color
# Color rows (run in a Jupyter notebook)
df.style.apply(
lambda v: [f"background-color: {color_mapping.get(v['hash'], '')}"] * df.shape[1],
axis=1,
).hide_columns("hash")
I was wondering if there is a way to highlight rows in Pandas dataframe based on values in some specific column? For example:
As can be seen above, in Col_4, values are different. Therefore, is it possible to highlight rows belonging to distinct values? Or, to make it more complex, highlight rows based on different values in multiple columns?
With the following toy dataframe:
import pandas as pd
df = pd.DataFrame(
{
"col1": ["A", "F", "A", "F", "A", "A", "A", "A"],
"col2": ["B", "B", "B", "B", "B", "G", "G", "B"],
"col3": ["C", "H", "C", "H", "C", "I", "C", "I"],
"col4": ["D", "E", "D", "E", "E", "D", "D", "E"],
}
)
Here is one way to do it:
# Sort value and add an unique identifier to identical rows
df = df.sort_values(["col1", "col2", "col3", "col4"]).reset_index(drop=True)
df["hash"] = df.apply(lambda x: hash("".join(x)), axis=1)
# Attribut a unique unique color to each identifier
import random
colors = [
f"#{random.randint(0,255):02X}{random.randint(0,255):02X}{random.randint(0,255):02X}"
for _ in range(df.shape[0])
]
color_mapping = {}
for value in df["hash"].unique():
color = colors.pop(0)
if value not in color_mapping:
color_mapping[value] = color
# Color rows (run in a Jupyter notebook)
df.style.apply(
lambda v: [f"background-color: {color_mapping.get(v['hash'], '')}"] * df.shape[1],
axis=1,
).hide_columns("hash")