how to display pandas.io.formats.style.styler object on top of each other
Question:
Here’s some data:
import numpy as np
import random
import pandas as pd
random.seed(365)
duration = np.random.exponential(scale = 5, size = 100).round(1)
numbers = np.random.normal(loc = 50, scale = 2, size = 100).round(2)
group = np.random.choice(["A", "B", "C", "D"], size = len(duration))
gender = np.random.choice(["Male", "Female"], p = [0.7, 0.3], size = len(duration))
provider = np.random.choice(["2Degrees", "Skinny", "Vodafone", "Spark"], p = [0.25, 0.25, 0.25, 0.25], size = len(duration))
df = pd.DataFrame(
{"Duration":duration,
"Numbers":numbers,
"Group":group,
"Gender":gender,
"Provider":provider}
)
I attempting to concatenate multiple pandas.styler
objects together into one figure.
I have all the "pieces" of the figure as individual pandas.styler
objects. These I created as data-frames and "styled" them to have their own individual captions.
Here is the code I used to generate the first two "pieces" of this figure (much of the other code I used to generate the other pieces is very similar):
#Gets the number of rows and columns
pd.DataFrame({
"Number of Rows":df.shape[0],
"Number of Columns":df.shape[1]
}, index = [""])
#Gets the info on the data set's categorical columns
data = []
for column in df:
if df[column].dtype == "object":
freq = df[column].value_counts(ascending = False)
data.append({
"Column Name":column,
"Unique Values":len(df[column].unique()),
"Missing Values":df[column].isna().sum(),
"Most Frequently Occurring":freq.index[0],
"Occurrences":freq[0],
"% of Total":freq[0] / freq.sum()*100
})
pd.DataFrame(data).style.format(precision = 1).set_caption("Categorical Columns").set_table_styles([{
"selector": "caption",
"props": [
("font-size", "16px")
]
}])
The figure I attempting to create looks something like this (this I made in an Excel spreadsheet):
See that the pandas.style
objects (apart from the first data-frame which states the number of rows and columns in the data set) are stacked on top of each with enough padding between them
Ideally, this entire figure would be exportable to an Excel spreadsheet.
I pretty much have all the code I need, its just getting this final part together that I need help with. Any ideas how to tackle this?
Answers:
After some figuring out, I found out that each of the "pieces" of the entire figure must first be rendered to HTML code. These "pieces" (which are now HTML strings) then need to be concatenated by putting padding in between them.
For those that may wish to create similar data summary tables in the future, I will leave my code here:
from IPython.display import display, HTML
styles = [{"selector":"caption", "props":[("font-size", "16px"), ("font-weight", "bold")]}]
head = pd.DataFrame({
"Number of Rows":df.shape[0],
"Number of columns":df.shape[1]
}, index = [""]).style
.set_caption("Data Frame")
.set_table_styles(styles)
.to_html()
data = []
#Info obtained from categorical columns
for column in df:
if df[column].dtype == "object":
freq = df[column].value_counts(dropna = False, ascending = False)
data.append({
"Column Name":column,
"Unique Values":len(df[column].unique()),
"Missing Values":df[column].isna().sum(),
"Most Frequently Occurring":freq.index[0],
"Occurrences":freq[0],
"% of Total":freq[0] / freq.sum()*100,
})
cat = pd.DataFrame(data).style.set_caption("Categorical Columns")
.set_table_styles(styles)
.format(precision = 1)
.hide_index()
.to_html()
data = []
#Info obtained from numeric columns
for column in df:
if df[column].dtype in ["int", "float"]:
data.append({
"Column Name":column,
"Unique Values":len(df[column].unique()),
"Missing Values":df[column].isna().sum(),
"Range":[df[column].min(), df[column].max()],
"Mean Value":df[column].mean(),
"Median Value":df[column].median()
})
num = pd.DataFrame(data).style.set_caption("Numeric Columns")
.set_table_styles(styles)
.format(precision = 1)
.hide_index()
.to_html()
padding = "<div style='padding: 20px;'></div>"
figure = padding.join([head, cat, num])
display(HTML(figure))
Here’s some data:
import numpy as np
import random
import pandas as pd
random.seed(365)
duration = np.random.exponential(scale = 5, size = 100).round(1)
numbers = np.random.normal(loc = 50, scale = 2, size = 100).round(2)
group = np.random.choice(["A", "B", "C", "D"], size = len(duration))
gender = np.random.choice(["Male", "Female"], p = [0.7, 0.3], size = len(duration))
provider = np.random.choice(["2Degrees", "Skinny", "Vodafone", "Spark"], p = [0.25, 0.25, 0.25, 0.25], size = len(duration))
df = pd.DataFrame(
{"Duration":duration,
"Numbers":numbers,
"Group":group,
"Gender":gender,
"Provider":provider}
)
I attempting to concatenate multiple pandas.styler
objects together into one figure.
I have all the "pieces" of the figure as individual pandas.styler
objects. These I created as data-frames and "styled" them to have their own individual captions.
Here is the code I used to generate the first two "pieces" of this figure (much of the other code I used to generate the other pieces is very similar):
#Gets the number of rows and columns
pd.DataFrame({
"Number of Rows":df.shape[0],
"Number of Columns":df.shape[1]
}, index = [""])
#Gets the info on the data set's categorical columns
data = []
for column in df:
if df[column].dtype == "object":
freq = df[column].value_counts(ascending = False)
data.append({
"Column Name":column,
"Unique Values":len(df[column].unique()),
"Missing Values":df[column].isna().sum(),
"Most Frequently Occurring":freq.index[0],
"Occurrences":freq[0],
"% of Total":freq[0] / freq.sum()*100
})
pd.DataFrame(data).style.format(precision = 1).set_caption("Categorical Columns").set_table_styles([{
"selector": "caption",
"props": [
("font-size", "16px")
]
}])
The figure I attempting to create looks something like this (this I made in an Excel spreadsheet):
See that the pandas.style
objects (apart from the first data-frame which states the number of rows and columns in the data set) are stacked on top of each with enough padding between them
Ideally, this entire figure would be exportable to an Excel spreadsheet.
I pretty much have all the code I need, its just getting this final part together that I need help with. Any ideas how to tackle this?
After some figuring out, I found out that each of the "pieces" of the entire figure must first be rendered to HTML code. These "pieces" (which are now HTML strings) then need to be concatenated by putting padding in between them.
For those that may wish to create similar data summary tables in the future, I will leave my code here:
from IPython.display import display, HTML
styles = [{"selector":"caption", "props":[("font-size", "16px"), ("font-weight", "bold")]}]
head = pd.DataFrame({
"Number of Rows":df.shape[0],
"Number of columns":df.shape[1]
}, index = [""]).style
.set_caption("Data Frame")
.set_table_styles(styles)
.to_html()
data = []
#Info obtained from categorical columns
for column in df:
if df[column].dtype == "object":
freq = df[column].value_counts(dropna = False, ascending = False)
data.append({
"Column Name":column,
"Unique Values":len(df[column].unique()),
"Missing Values":df[column].isna().sum(),
"Most Frequently Occurring":freq.index[0],
"Occurrences":freq[0],
"% of Total":freq[0] / freq.sum()*100,
})
cat = pd.DataFrame(data).style.set_caption("Categorical Columns")
.set_table_styles(styles)
.format(precision = 1)
.hide_index()
.to_html()
data = []
#Info obtained from numeric columns
for column in df:
if df[column].dtype in ["int", "float"]:
data.append({
"Column Name":column,
"Unique Values":len(df[column].unique()),
"Missing Values":df[column].isna().sum(),
"Range":[df[column].min(), df[column].max()],
"Mean Value":df[column].mean(),
"Median Value":df[column].median()
})
num = pd.DataFrame(data).style.set_caption("Numeric Columns")
.set_table_styles(styles)
.format(precision = 1)
.hide_index()
.to_html()
padding = "<div style='padding: 20px;'></div>"
figure = padding.join([head, cat, num])
display(HTML(figure))