Formatting Multiple Columns in a Pandas Dataframe

Question

I have a dataframe I’m working with that has a large number of columns, and I’m trying to format them as efficiently as possible. I have a bunch of columns that all end in .pct that need to be formatted as percentages, some that end in .cost that need to be formatted as currency, etc.

I know I can do something like this:

cost_calc.style.format({'c.somecolumn.cost'       : "${:,.2f}",
                        'c.somecolumn.cost'       : "${:,.2f}",
                        'e.somecolumn.cost'       : "${:,.2f}",
                        'e.somecolumn.cost'       : "${:,.2f}",...

and format each column individually, but I was hoping there was a way to do something similar to this:

cost_calc.style.format({'*.cost'       : "${:,.2f}",
                        '*.pct'        : "{:,.2%}",...

Any ideas? Thanks!

Asked By: chris

||

Source

Answer 1

The first way doesn’t seem bad if you can automatically build that dictionary… you can generate a list of all columns fitting the *.cost description with something like

costcols = [x for x in df.columns.values if x[-5:] == '.cost']

then build your dict like:

formatdict = {}
for costcol in costcols: formatdict[costcol] = "${:,.2f}"

then as you suggested:

cost_calc.style.format(formatdict)

You can easily add the .pct cases similarly. Hope this helps!

Answered By: n3utrino

Answer 2

I would use regEx with dict generators:

import re
mylist = cost_calc.columns

r = re.compile(r'.*cost')
cost_cols = {key: "${:,.2f}" for key in mylist if r.match(key)}

r = re.compile(r'.*pct')
pct_cols = {key: "${:,.2f}" for key in mylist if r.match(key)}

cost_calc.style.format({**cost_cols, **pct_cols})

note: code for Python 2.7 and 3 onwards

Answered By: patricio

Answer 3

import re

mylist = cost_calc.columns
r = re.compile(r'.*cost')
cost_cols = {key: (lambda x: f'{locale.format_string("%.2f", x, True)} €') for key in mylist if r.match(key)}

r = re.compile(r'.*pct')
pct_cols = {key: "{:.2%}" for key in mylist if r.match(key)}

note: version for euro

Answered By: Alessandro Bonelli

Formatting Multiple Columns in a Pandas Dataframe

Question:

Answers: