How To Create a "Total" Row for One Column in a Pandas Dataframe
Question:
So I’ve created a DF from file names I’ve pulled using the os module
The file names include dollar amounts and I would like to be able to create a row that totals just the amount in that column of the DF (index 3)
However, when I follow this code structure:
File_Name.loc['Total'] = File_Name.sum()
I get this:
Invoice ... Amount
30 6515 ... 401.01
Total 0822OH082522KTR1987000084201987000084481987000... ... 478.88550.0030.1032.3912.0432.521020.4729.1442...
I would love for it to look like this:
Invoice Vendor Amount
30 6515 Expense 401.01
Total 198556.79
Any help would be much appreciated!
Answers:
The long number you get in Amount
is probably the result of string concatenation:
'478.88' + '550.00' + '30.10' + '32.39'
outputs
478.88550.0030.1032.39
So, the first step will be to cast the column Amount
to floats with File_Name['Amount'].astype('float')
.
You can add the sum of Amount
and get the visual effect you are looking for with
df.loc['Total', 'Amount'] = df['Amount'].sum()
df.loc['Total'] = df.loc['Total'].fillna('')
Nevertheless, I would strongly recommend not using pandas as if it were Excel. While Excel style can be convenient when working under such a heavy interface, it’s problematic from a programmatic point of view: now you’ll have an extra data point with a large value in Amount
and lots of null strings.
Pandas just released (v1.5.0) a new feature in Styler
for doing just this. The Styler
is used for display of data whereas a DataFrame
is essentially an efficient memory map of data. Therefore the ability the combine and structure different DataFrames
for display purposes is useful. The Styler
allows for configuring formatting output differently for different tables. E.g. a column might have integer values but the arithmetic mean is usually a float with multiple decimals.
See the docs for Styler.concat
as it discusses this use case. https://pandas.pydata.org/docs/dev/reference/api/pandas.io.formats.style.Styler.concat.html
So I’ve created a DF from file names I’ve pulled using the os module
The file names include dollar amounts and I would like to be able to create a row that totals just the amount in that column of the DF (index 3)
However, when I follow this code structure:
File_Name.loc['Total'] = File_Name.sum()
I get this:
Invoice ... Amount
30 6515 ... 401.01
Total 0822OH082522KTR1987000084201987000084481987000... ... 478.88550.0030.1032.3912.0432.521020.4729.1442...
I would love for it to look like this:
Invoice Vendor Amount
30 6515 Expense 401.01
Total 198556.79
Any help would be much appreciated!
The long number you get in Amount
is probably the result of string concatenation:
'478.88' + '550.00' + '30.10' + '32.39'
outputs
478.88550.0030.1032.39
So, the first step will be to cast the column Amount
to floats with File_Name['Amount'].astype('float')
.
You can add the sum of Amount
and get the visual effect you are looking for with
df.loc['Total', 'Amount'] = df['Amount'].sum()
df.loc['Total'] = df.loc['Total'].fillna('')
Nevertheless, I would strongly recommend not using pandas as if it were Excel. While Excel style can be convenient when working under such a heavy interface, it’s problematic from a programmatic point of view: now you’ll have an extra data point with a large value in Amount
and lots of null strings.
Pandas just released (v1.5.0) a new feature in Styler
for doing just this. The Styler
is used for display of data whereas a DataFrame
is essentially an efficient memory map of data. Therefore the ability the combine and structure different DataFrames
for display purposes is useful. The Styler
allows for configuring formatting output differently for different tables. E.g. a column might have integer values but the arithmetic mean is usually a float with multiple decimals.
See the docs for Styler.concat
as it discusses this use case. https://pandas.pydata.org/docs/dev/reference/api/pandas.io.formats.style.Styler.concat.html