How To Create a "Total" Row for One Column in a Pandas Dataframe

Question:

So I’ve created a DF from file names I’ve pulled using the os module

The file names include dollar amounts and I would like to be able to create a row that totals just the amount in that column of the DF (index 3)

However, when I follow this code structure:

File_Name.loc['Total'] = File_Name.sum()

I get this:

                                                 Invoice  ...                                             Amount
30                                                  6515  ...                                             401.01
Total  0822OH082522KTR1987000084201987000084481987000...  ...  478.88550.0030.1032.3912.0432.521020.4729.1442...

I would love for it to look like this:

         Invoice         Vendor   Amount
30          6515        Expense   401.01
Total                          198556.79

Any help would be much appreciated!

Asked By: SHW

||

Answers:

The long number you get in Amount is probably the result of string concatenation:

'478.88' + '550.00' + '30.10' + '32.39'

outputs

478.88550.0030.1032.39

So, the first step will be to cast the column Amount to floats with File_Name['Amount'].astype('float').

You can add the sum of Amount and get the visual effect you are looking for with

df.loc['Total', 'Amount'] = df['Amount'].sum()
df.loc['Total'] = df.loc['Total'].fillna('')

Nevertheless, I would strongly recommend not using pandas as if it were Excel. While Excel style can be convenient when working under such a heavy interface, it’s problematic from a programmatic point of view: now you’ll have an extra data point with a large value in Amount and lots of null strings.

Answered By: Ignatius Reilly

Pandas just released (v1.5.0) a new feature in Styler for doing just this. The Styler is used for display of data whereas a DataFrame is essentially an efficient memory map of data. Therefore the ability the combine and structure different DataFrames for display purposes is useful. The Styler allows for configuring formatting output differently for different tables. E.g. a column might have integer values but the arithmetic mean is usually a float with multiple decimals.

See the docs for Styler.concat as it discusses this use case. https://pandas.pydata.org/docs/dev/reference/api/pandas.io.formats.style.Styler.concat.html

Answered By: Attack68
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.