Format certain floating dataframe columns into percentage in pandas
Question:
I am trying to write a paper in IPython notebook, but encountered some issues with display format. Say I have following dataframe df
, is there any way to format var1
and var2
into 2 digit decimals and var3
into percentages.
var1 var2 var3
id
0 1.458315 1.500092 -0.005709
1 1.576704 1.608445 -0.005122
2 1.629253 1.652577 -0.004754
3 1.669331 1.685456 -0.003525
4 1.705139 1.712096 -0.003134
5 1.740447 1.741961 -0.001223
6 1.775980 1.770801 -0.001723
7 1.812037 1.799327 -0.002013
8 1.853130 1.822982 -0.001396
9 1.943985 1.868401 0.005732
The numbers inside are not multiplied by 100, e.g. -0.0057=-0.57%.
Answers:
replace the values using the round function, and format the string representation of the percentage numbers:
df['var2'] = pd.Series([round(val, 2) for val in df['var2']], index = df.index)
df['var3'] = pd.Series(["{0:.2f}%".format(val * 100) for val in df['var3']], index = df.index)
The round function rounds a floating point number to the number of decimal places provided as second argument to the function.
String formatting allows you to represent the numbers as you wish. You can change the number of decimal places shown by changing the number before the f
.
p.s. I was not sure if your ‘percentage’ numbers had already been multiplied by 100. If they have then clearly you will want to change the number of decimals displayed, and remove the hundred multiplication.
You could also set the default format for float :
pd.options.display.float_format = '{:.2%}'.format
Use ‘{:.2%}’ instead of ‘{:.2f}%’ – The former converts 0.41 to 41.00% (correctly), the latter to 0.41% (incorrectly)
The accepted answer suggests to modify the raw data for presentation purposes, something you generally do not want. Imagine you need to make further analyses with these columns and you need the precision you lost with rounding.
You can modify the formatting of individual columns in data frames, in your case:
output = df.to_string(formatters={
'var1': '{:,.2f}'.format,
'var2': '{:,.2f}'.format,
'var3': '{:,.2%}'.format
})
print(output)
For your information '{:,.2%}'.format(0.214)
yields 21.40%
, so no need for multiplying by 100.
You don’t have a nice HTML table anymore but a text representation. If you need to stay with HTML use the to_html
function instead.
from IPython.core.display import display, HTML
output = df.to_html(formatters={
'var1': '{:,.2f}'.format,
'var2': '{:,.2f}'.format,
'var3': '{:,.2%}'.format
})
display(HTML(output))
Update
As of pandas 0.17.1, life got easier and we can get a beautiful html table right away:
df.style.format({
'var1': '{:,.2f}'.format,
'var2': '{:,.2f}'.format,
'var3': '{:,.2%}'.format,
})
As suggested by @linqu you should not change your data for presentation. Since pandas 0.17.1, (conditional) formatting was made easier. Quoting the documentation:
You can apply conditional formatting, the visual styling of a DataFrame
depending on the data within, by using the DataFrame.style
property. This is a property that returns a pandas.Styler
object, which has useful methods for formatting and displaying DataFrames
.
For your example, that would be (the usual table will show up in Jupyter):
df.style.format({
'var1': '{:,.2f}'.format,
'var2': '{:,.2f}'.format,
'var3': '{:,.2%}'.format,
})
As a similar approach to the accepted answer that might be considered a bit more readable, elegant, and general (YMMV), you can leverage the map
method:
# OP example
df['var3'].map(lambda n: '{:,.2%}'.format(n))
# also works on a series
series_example.map(lambda n: '{:,.2%}'.format(n))
Performance-wise, this is pretty close (marginally slower) than the OP solution.
As an aside, if you do choose to go the pd.options.display.float_format
route, consider using a context manager to handle state per this parallel numpy example.
Just another way of doing it should you require to do it over a larger range of columns
using applymap
df[['var1','var2']] = df[['var1','var2']].applymap("{0:.2f}".format)
df['var3'] = df['var3'].applymap(lambda x: "{0:.2f}%".format(x*100))
applymap is useful if you need to apply the function over multiple columns; it’s essentially an abbreviation of the below for this specific example:
df[['var1','var2']].apply(lambda x: map(lambda x:'{:.2f}%'.format(x),x),axis=1)
Great explanation below of apply, map applymap:
Difference between map, applymap and apply methods in Pandas
Often times we are interested in calculating the full significant digits, but
for the visual aesthetics, we may want to see only few decimal point when we display the dataframe.
In jupyter-notebook, pandas can utilize the html formatting taking advantage of the method called style
.
For the case of just seeing two significant digits of some columns, we can use this code snippet:
Given dataframe
import numpy as np
import pandas as pd
df = pd.DataFrame({'var1': [1.458315, 1.576704, 1.629253, 1.6693310000000001, 1.705139, 1.740447, 1.77598, 1.812037, 1.85313, 1.9439849999999999],
'var2': [1.500092, 1.6084450000000001, 1.652577, 1.685456, 1.7120959999999998, 1.741961, 1.7708009999999998, 1.7993270000000001, 1.8229819999999999, 1.8684009999999998],
'var3': [-0.0057090000000000005, -0.005122, -0.0047539999999999995, -0.003525, -0.003134, -0.0012230000000000001, -0.0017230000000000001, -0.002013, -0.001396, 0.005732]})
print(df)
var1 var2 var3
0 1.458315 1.500092 -0.005709
1 1.576704 1.608445 -0.005122
2 1.629253 1.652577 -0.004754
3 1.669331 1.685456 -0.003525
4 1.705139 1.712096 -0.003134
5 1.740447 1.741961 -0.001223
6 1.775980 1.770801 -0.001723
7 1.812037 1.799327 -0.002013
8 1.853130 1.822982 -0.001396
9 1.943985 1.868401 0.005732
Style to get required format
df.style.format({'var1': "{:.2f}",'var2': "{:.2f}",'var3': "{:.2%}"})
Gives:
var1 var2 var3
id
0 1.46 1.50 -0.57%
1 1.58 1.61 -0.51%
2 1.63 1.65 -0.48%
3 1.67 1.69 -0.35%
4 1.71 1.71 -0.31%
5 1.74 1.74 -0.12%
6 1.78 1.77 -0.17%
7 1.81 1.80 -0.20%
8 1.85 1.82 -0.14%
9 1.94 1.87 0.57%
Update
If display command is not found try following:
from IPython.display import display
df_style = df.style.format({'var1': "{:.2f}",'var2': "{:.2f}",'var3': "{:.2%}"})
display(df_style)
Requirements
- To use
display
command, you need to have installed Ipython in your machine.
- The
display
command does not work in online python interpreter which do not have IPyton
installed such as https://repl.it/languages/python3
- The display command works in jupyter-notebook, jupyter-lab, Google-colab, kaggle-kernels, IBM-watson,Mode-Analytics and many other platforms out of the box, you do not even have to import display from IPython.display
style.format
is vectorized, so we can simply apply it to the entire df
(or just its numerical columns):
df[num_cols].style.format('{:,.3f}')
The list comprehension has an assured result, I’m using it successfully
I think you may use python list comprehension as follow:
df['var1'] = ["{:.2f}".format(i) for i in df['var1'] ]
df['var2'] = ["{:.2f}".format(i) for i in df['var2'] ]
df['var3'] = ["{:.2%}".format(i) for i in df['var3'] ]
Thanks
Following from this answer I used the apply function on the given series. In my case, I was interested in showing value_counts for my Series with percentage formatting.
I did:
df['my_col'].value_counts(normalize=True).apply(lambda x: "{0:.2f}%".format(x*100))
# Incident 88.16%
# StreetWorks 3.29%
# Accident 2.36%
# ...
Instead of just
df['my_col'].value_counts(normalize=True)
# Incident 0.881634
# StreetWorks 0.032856
# Accident 0.023589
# ...
I am trying to write a paper in IPython notebook, but encountered some issues with display format. Say I have following dataframe df
, is there any way to format var1
and var2
into 2 digit decimals and var3
into percentages.
var1 var2 var3
id
0 1.458315 1.500092 -0.005709
1 1.576704 1.608445 -0.005122
2 1.629253 1.652577 -0.004754
3 1.669331 1.685456 -0.003525
4 1.705139 1.712096 -0.003134
5 1.740447 1.741961 -0.001223
6 1.775980 1.770801 -0.001723
7 1.812037 1.799327 -0.002013
8 1.853130 1.822982 -0.001396
9 1.943985 1.868401 0.005732
The numbers inside are not multiplied by 100, e.g. -0.0057=-0.57%.
replace the values using the round function, and format the string representation of the percentage numbers:
df['var2'] = pd.Series([round(val, 2) for val in df['var2']], index = df.index)
df['var3'] = pd.Series(["{0:.2f}%".format(val * 100) for val in df['var3']], index = df.index)
The round function rounds a floating point number to the number of decimal places provided as second argument to the function.
String formatting allows you to represent the numbers as you wish. You can change the number of decimal places shown by changing the number before the f
.
p.s. I was not sure if your ‘percentage’ numbers had already been multiplied by 100. If they have then clearly you will want to change the number of decimals displayed, and remove the hundred multiplication.
You could also set the default format for float :
pd.options.display.float_format = '{:.2%}'.format
Use ‘{:.2%}’ instead of ‘{:.2f}%’ – The former converts 0.41 to 41.00% (correctly), the latter to 0.41% (incorrectly)
The accepted answer suggests to modify the raw data for presentation purposes, something you generally do not want. Imagine you need to make further analyses with these columns and you need the precision you lost with rounding.
You can modify the formatting of individual columns in data frames, in your case:
output = df.to_string(formatters={
'var1': '{:,.2f}'.format,
'var2': '{:,.2f}'.format,
'var3': '{:,.2%}'.format
})
print(output)
For your information '{:,.2%}'.format(0.214)
yields 21.40%
, so no need for multiplying by 100.
You don’t have a nice HTML table anymore but a text representation. If you need to stay with HTML use the to_html
function instead.
from IPython.core.display import display, HTML
output = df.to_html(formatters={
'var1': '{:,.2f}'.format,
'var2': '{:,.2f}'.format,
'var3': '{:,.2%}'.format
})
display(HTML(output))
Update
As of pandas 0.17.1, life got easier and we can get a beautiful html table right away:
df.style.format({
'var1': '{:,.2f}'.format,
'var2': '{:,.2f}'.format,
'var3': '{:,.2%}'.format,
})
As suggested by @linqu you should not change your data for presentation. Since pandas 0.17.1, (conditional) formatting was made easier. Quoting the documentation:
You can apply conditional formatting, the visual styling of a
DataFrame
depending on the data within, by using theDataFrame.style
property. This is a property that returns apandas.Styler
object, which has useful methods for formatting and displayingDataFrames
.
For your example, that would be (the usual table will show up in Jupyter):
df.style.format({
'var1': '{:,.2f}'.format,
'var2': '{:,.2f}'.format,
'var3': '{:,.2%}'.format,
})
As a similar approach to the accepted answer that might be considered a bit more readable, elegant, and general (YMMV), you can leverage the map
method:
# OP example
df['var3'].map(lambda n: '{:,.2%}'.format(n))
# also works on a series
series_example.map(lambda n: '{:,.2%}'.format(n))
Performance-wise, this is pretty close (marginally slower) than the OP solution.
As an aside, if you do choose to go the pd.options.display.float_format
route, consider using a context manager to handle state per this parallel numpy example.
Just another way of doing it should you require to do it over a larger range of columns
using applymap
df[['var1','var2']] = df[['var1','var2']].applymap("{0:.2f}".format)
df['var3'] = df['var3'].applymap(lambda x: "{0:.2f}%".format(x*100))
applymap is useful if you need to apply the function over multiple columns; it’s essentially an abbreviation of the below for this specific example:
df[['var1','var2']].apply(lambda x: map(lambda x:'{:.2f}%'.format(x),x),axis=1)
Great explanation below of apply, map applymap:
Difference between map, applymap and apply methods in Pandas
Often times we are interested in calculating the full significant digits, but
for the visual aesthetics, we may want to see only few decimal point when we display the dataframe.
In jupyter-notebook, pandas can utilize the html formatting taking advantage of the method called style
.
For the case of just seeing two significant digits of some columns, we can use this code snippet:
Given dataframe
import numpy as np
import pandas as pd
df = pd.DataFrame({'var1': [1.458315, 1.576704, 1.629253, 1.6693310000000001, 1.705139, 1.740447, 1.77598, 1.812037, 1.85313, 1.9439849999999999],
'var2': [1.500092, 1.6084450000000001, 1.652577, 1.685456, 1.7120959999999998, 1.741961, 1.7708009999999998, 1.7993270000000001, 1.8229819999999999, 1.8684009999999998],
'var3': [-0.0057090000000000005, -0.005122, -0.0047539999999999995, -0.003525, -0.003134, -0.0012230000000000001, -0.0017230000000000001, -0.002013, -0.001396, 0.005732]})
print(df)
var1 var2 var3
0 1.458315 1.500092 -0.005709
1 1.576704 1.608445 -0.005122
2 1.629253 1.652577 -0.004754
3 1.669331 1.685456 -0.003525
4 1.705139 1.712096 -0.003134
5 1.740447 1.741961 -0.001223
6 1.775980 1.770801 -0.001723
7 1.812037 1.799327 -0.002013
8 1.853130 1.822982 -0.001396
9 1.943985 1.868401 0.005732
Style to get required format
df.style.format({'var1': "{:.2f}",'var2': "{:.2f}",'var3': "{:.2%}"})
Gives:
var1 var2 var3
id
0 1.46 1.50 -0.57%
1 1.58 1.61 -0.51%
2 1.63 1.65 -0.48%
3 1.67 1.69 -0.35%
4 1.71 1.71 -0.31%
5 1.74 1.74 -0.12%
6 1.78 1.77 -0.17%
7 1.81 1.80 -0.20%
8 1.85 1.82 -0.14%
9 1.94 1.87 0.57%
Update
If display command is not found try following:
from IPython.display import display
df_style = df.style.format({'var1': "{:.2f}",'var2': "{:.2f}",'var3': "{:.2%}"})
display(df_style)
Requirements
- To use
display
command, you need to have installed Ipython in your machine. - The
display
command does not work in online python interpreter which do not haveIPyton
installed such as https://repl.it/languages/python3 - The display command works in jupyter-notebook, jupyter-lab, Google-colab, kaggle-kernels, IBM-watson,Mode-Analytics and many other platforms out of the box, you do not even have to import display from IPython.display
style.format
is vectorized, so we can simply apply it to the entire df
(or just its numerical columns):
df[num_cols].style.format('{:,.3f}')
The list comprehension has an assured result, I’m using it successfully
I think you may use python list comprehension as follow:
df['var1'] = ["{:.2f}".format(i) for i in df['var1'] ]
df['var2'] = ["{:.2f}".format(i) for i in df['var2'] ]
df['var3'] = ["{:.2%}".format(i) for i in df['var3'] ]
Thanks
Following from this answer I used the apply function on the given series. In my case, I was interested in showing value_counts for my Series with percentage formatting.
I did:
df['my_col'].value_counts(normalize=True).apply(lambda x: "{0:.2f}%".format(x*100))
# Incident 88.16%
# StreetWorks 3.29%
# Accident 2.36%
# ...
Instead of just
df['my_col'].value_counts(normalize=True)
# Incident 0.881634
# StreetWorks 0.032856
# Accident 0.023589
# ...