How to format IPython html display of Pandas dataframe?
Question:
How can I format IPython html display of pandas dataframes so that
- numbers are right justified
- numbers have commas as thousands separator
- large floats have no decimal places
I understand that numpy
has the facility of set_printoptions
where I can do:
int_frmt:lambda x : '{:,}'.format(x)
np.set_printoptions(formatter={'int_kind':int_frmt})
and similarly for other data types.
But IPython does not pick up these formatting options when displaying dataframes in html. I still need to have
pd.set_option('display.notebook_repr_html', True)
but with 1, 2, 3 as in above.
Edit: Below is my solution for 2 & 3 ( not sure this is the best way ), but I still need to figure out how to make number columns right justified.
from IPython.display import HTML
int_frmt = lambda x: '{:,}'.format(x)
float_frmt = lambda x: '{:,.0f}'.format(x) if x > 1e3 else '{:,.2f}'.format(x)
frmt_map = {np.dtype('int64'):int_frmt, np.dtype('float64'):float_frmt}
frmt = {col:frmt_map[df.dtypes[col]] for col in df.columns if df.dtypes[col] in frmt_map.keys()}
HTML(df.to_html(formatters=frmt))
Answers:
HTML receives a custom string of html data. Nobody forbids you to pass in a style tag with the custom CSS style for the .dataframe
class (which the to_html
method adds to the table).
So the simplest solution would be to just add a style and concatenate it with the output of the df.to_html
:
style = '<style>.dataframe td { text-align: right; }</style>'
HTML( style + df.to_html( formatters=frmt ) )
But I would suggest to define a custom class for a DataFrame since this will change the style of all the tables in your notebook (style is “global”).
style = '<style>.right_aligned_df td { text-align: right; }</style>'
HTML(style + df.to_html(formatters=frmt, classes='right_aligned_df'))
You can also define the style in one of the previous cells, and then just set the classes
parameter of the to_html
method:
# Some cell at the begining of the notebook
In [2]: HTML('''<style>
.right_aligned_df td { text-align: right; }
.left_aligned_df td { text-align: right; }
.pink_df { background-color: pink; }
</style>''')
...
# Much later in your notebook
In [66]: HTML(df.to_html(classes='pink_df'))
On the OP’s point 2:
numbers have commas as thousands separator
pandas (as of 0.20.1) does not allow overriding the default integer format in an easy way. It is hard coded in pandas.io.formats.format.IntArrayFormatter
(the labmda
function):
class IntArrayFormatter(GenericArrayFormatter):
def _format_strings(self):
formatter = self.formatter or (lambda x: '% d' % x)
fmt_values = [formatter(x) for x in self.values]
return fmt_values
I’m assuming is what you’re actually asking for is how you can override the format for all integers: replace (“monkey patch”) the IntArrayFormatter
to print integer values with thousands separated by comma as follows:
import pandas
class _IntArrayFormatter(pandas.io.formats.format.GenericArrayFormatter):
def _format_strings(self):
formatter = self.formatter or (lambda x: ' {:,}'.format(x))
fmt_values = [formatter(x) for x in self.values]
return fmt_values
pandas.io.formats.format.IntArrayFormatter = _IntArrayFormatter
Note:
- before 0.20.0, the formatters were in
pandas.formats.format
.
- before 0.18.1, the formatters were in
pandas.core.format
.
Aside
For floats you do not need to jump through those hoops since there is a configuration option for it:
display.float_format
: The callable should accept a floating point number and return a string with the desired format of the number. This is used in some places like SeriesFormatter
. See core.format.EngFormatter
for an example.
This question was asked a long time ago. Back then, pandas didn’t yet include pd.Styler. It was added in version 0.17.1
.
Here’s how you would use this to achieve your desired goal and some more:
- Center the header
- right-align any number columns
- left-align the other columns.
- Add a formatter for the numeric columns like you want
- make it so that each column has the same width.
Here’s some example data:
In [1]:
df = pd.DataFrame(np.random.rand(10,3)*2000, columns=['A','B','C'])
df['D'] = np.random.randint(0,10000,size=10)
df['TextCol'] = np.random.choice(['a','b','c'], 10)
df.dtypes
Out[1]:
A float64
B float64
C float64
D int64
TextCol object
dtype: object
Let’s format this using df.style
:
# Construct a mask of which columns are numeric
numeric_col_mask = df.dtypes.apply(lambda d: issubclass(np.dtype(d).type, np.number))
# Dict used to center the table headers
d = dict(selector="th",
props=[('text-align', 'center')])
# Style
df.style.set_properties(subset=df.columns[numeric_col_mask], # right-align the numeric columns and set their width
**{'width':'10em', 'text-align':'right'})
.set_properties(subset=df.columns[~numeric_col_mask], # left-align the non-numeric columns and set their width
**{'width':'10em', 'text-align':'left'})
.format(lambda x: '{:,.0f}'.format(x) if x > 1e3 else '{:,.2f}'.format(x), # format the numeric values
subset=pd.IndexSlice[:,df.columns[numeric_col_mask]])
.set_table_styles([d]) # center the header
Note that instead of calling .format
on the subset columns, you can very well set the global default pd.options.display.float_format
instead:
pd.options.display.float_format = lambda x: '{:,.0f}'.format(x) if x > 1e3 else '{:,.2f}'.format(x)
How can I format IPython html display of pandas dataframes so that
- numbers are right justified
- numbers have commas as thousands separator
- large floats have no decimal places
I understand that numpy
has the facility of set_printoptions
where I can do:
int_frmt:lambda x : '{:,}'.format(x)
np.set_printoptions(formatter={'int_kind':int_frmt})
and similarly for other data types.
But IPython does not pick up these formatting options when displaying dataframes in html. I still need to have
pd.set_option('display.notebook_repr_html', True)
but with 1, 2, 3 as in above.
Edit: Below is my solution for 2 & 3 ( not sure this is the best way ), but I still need to figure out how to make number columns right justified.
from IPython.display import HTML
int_frmt = lambda x: '{:,}'.format(x)
float_frmt = lambda x: '{:,.0f}'.format(x) if x > 1e3 else '{:,.2f}'.format(x)
frmt_map = {np.dtype('int64'):int_frmt, np.dtype('float64'):float_frmt}
frmt = {col:frmt_map[df.dtypes[col]] for col in df.columns if df.dtypes[col] in frmt_map.keys()}
HTML(df.to_html(formatters=frmt))
HTML receives a custom string of html data. Nobody forbids you to pass in a style tag with the custom CSS style for the .dataframe
class (which the to_html
method adds to the table).
So the simplest solution would be to just add a style and concatenate it with the output of the df.to_html
:
style = '<style>.dataframe td { text-align: right; }</style>'
HTML( style + df.to_html( formatters=frmt ) )
But I would suggest to define a custom class for a DataFrame since this will change the style of all the tables in your notebook (style is “global”).
style = '<style>.right_aligned_df td { text-align: right; }</style>'
HTML(style + df.to_html(formatters=frmt, classes='right_aligned_df'))
You can also define the style in one of the previous cells, and then just set the classes
parameter of the to_html
method:
# Some cell at the begining of the notebook
In [2]: HTML('''<style>
.right_aligned_df td { text-align: right; }
.left_aligned_df td { text-align: right; }
.pink_df { background-color: pink; }
</style>''')
...
# Much later in your notebook
In [66]: HTML(df.to_html(classes='pink_df'))
On the OP’s point 2:
numbers have commas as thousands separator
pandas (as of 0.20.1) does not allow overriding the default integer format in an easy way. It is hard coded in pandas.io.formats.format.IntArrayFormatter
(the labmda
function):
class IntArrayFormatter(GenericArrayFormatter):
def _format_strings(self):
formatter = self.formatter or (lambda x: '% d' % x)
fmt_values = [formatter(x) for x in self.values]
return fmt_values
I’m assuming is what you’re actually asking for is how you can override the format for all integers: replace (“monkey patch”) the IntArrayFormatter
to print integer values with thousands separated by comma as follows:
import pandas
class _IntArrayFormatter(pandas.io.formats.format.GenericArrayFormatter):
def _format_strings(self):
formatter = self.formatter or (lambda x: ' {:,}'.format(x))
fmt_values = [formatter(x) for x in self.values]
return fmt_values
pandas.io.formats.format.IntArrayFormatter = _IntArrayFormatter
Note:
- before 0.20.0, the formatters were in
pandas.formats.format
. - before 0.18.1, the formatters were in
pandas.core.format
.
Aside
For floats you do not need to jump through those hoops since there is a configuration option for it:
display.float_format
: The callable should accept a floating point number and return a string with the desired format of the number. This is used in some places likeSeriesFormatter
. Seecore.format.EngFormatter
for an example.
This question was asked a long time ago. Back then, pandas didn’t yet include pd.Styler. It was added in version 0.17.1
.
Here’s how you would use this to achieve your desired goal and some more:
- Center the header
- right-align any number columns
- left-align the other columns.
- Add a formatter for the numeric columns like you want
- make it so that each column has the same width.
Here’s some example data:
In [1]:
df = pd.DataFrame(np.random.rand(10,3)*2000, columns=['A','B','C'])
df['D'] = np.random.randint(0,10000,size=10)
df['TextCol'] = np.random.choice(['a','b','c'], 10)
df.dtypes
Out[1]:
A float64
B float64
C float64
D int64
TextCol object
dtype: object
Let’s format this using df.style
:
# Construct a mask of which columns are numeric
numeric_col_mask = df.dtypes.apply(lambda d: issubclass(np.dtype(d).type, np.number))
# Dict used to center the table headers
d = dict(selector="th",
props=[('text-align', 'center')])
# Style
df.style.set_properties(subset=df.columns[numeric_col_mask], # right-align the numeric columns and set their width
**{'width':'10em', 'text-align':'right'})
.set_properties(subset=df.columns[~numeric_col_mask], # left-align the non-numeric columns and set their width
**{'width':'10em', 'text-align':'left'})
.format(lambda x: '{:,.0f}'.format(x) if x > 1e3 else '{:,.2f}'.format(x), # format the numeric values
subset=pd.IndexSlice[:,df.columns[numeric_col_mask]])
.set_table_styles([d]) # center the header
Note that instead of calling .format
on the subset columns, you can very well set the global default pd.options.display.float_format
instead:
pd.options.display.float_format = lambda x: '{:,.0f}'.format(x) if x > 1e3 else '{:,.2f}'.format(x)