Pandas, summing values in a row to form a "totals" column
Question:
So I’ve been trying to get the sum of my rows and have them add into a new column. My data is something along the lines of:
Animal num1 num2
0 22-14 36.6 213
1 39-14 42.44 141
2 40-14 39 157
I’ve tried things such as:
df['sum'] = df['num1'] + df['num2']
But that just combines the information together, it doesn’t sum it. Is there a way to do this?
Answers:
Your issue is that your columns are actually strings, not numbers. When you try to sum the columns, the strings just get concatenate (e.g. 'cat' + 'dog'
becoming 'catdog'
).
You can convert them to numbers using pandas.DataFrame.convert_objects
.
import pandas as pd
from io import StringIO
s = """ Animal num1 num2
0 22-14 36.6 213
1 39-14 42.44 141
2 40-14 39 157"""
df = pd.read_csv(StringIO(s), sep='s+', dtype=str)
# You can ignore the code above this point, I was just re-creating your DataFrame.
df = df.convert_objects(convert_numeric=True)
After this, it should work exactly as you said above, so:
df['sum'] = df['num1'] + df['num2']
print(df)
# Animal num1 num2 sum
#0 22-14 36.60 213 249.60
#1 39-14 42.44 141 183.44
#2 40-14 39.00 157 196.00
The sum command takes an axis argument that can be used to sum either columns (1) or rows(0).
df = pd.DataFrame({'a':[4,5,2],'b':[5,2,9]})
df['c'] = df.sum(axis=1)
This method only accept numbers, you first column is a string so you will have to parse that. For example if you want to remove the hyphen from the numbers you could use the command:
df['Animal'] = df['Animal'].apply(lambda x: int(x.replace('-','')))
Or if you want to ignore that first column entirely you can simply not include it .
df['sum'] = df.iloc[:,1:].sum(axis = 1)
So I’ve been trying to get the sum of my rows and have them add into a new column. My data is something along the lines of:
Animal num1 num2
0 22-14 36.6 213
1 39-14 42.44 141
2 40-14 39 157
I’ve tried things such as:
df['sum'] = df['num1'] + df['num2']
But that just combines the information together, it doesn’t sum it. Is there a way to do this?
Your issue is that your columns are actually strings, not numbers. When you try to sum the columns, the strings just get concatenate (e.g. 'cat' + 'dog'
becoming 'catdog'
).
You can convert them to numbers using pandas.DataFrame.convert_objects
.
import pandas as pd
from io import StringIO
s = """ Animal num1 num2
0 22-14 36.6 213
1 39-14 42.44 141
2 40-14 39 157"""
df = pd.read_csv(StringIO(s), sep='s+', dtype=str)
# You can ignore the code above this point, I was just re-creating your DataFrame.
df = df.convert_objects(convert_numeric=True)
After this, it should work exactly as you said above, so:
df['sum'] = df['num1'] + df['num2']
print(df)
# Animal num1 num2 sum
#0 22-14 36.60 213 249.60
#1 39-14 42.44 141 183.44
#2 40-14 39.00 157 196.00
The sum command takes an axis argument that can be used to sum either columns (1) or rows(0).
df = pd.DataFrame({'a':[4,5,2],'b':[5,2,9]})
df['c'] = df.sum(axis=1)
This method only accept numbers, you first column is a string so you will have to parse that. For example if you want to remove the hyphen from the numbers you could use the command:
df['Animal'] = df['Animal'].apply(lambda x: int(x.replace('-','')))
Or if you want to ignore that first column entirely you can simply not include it .
df['sum'] = df.iloc[:,1:].sum(axis = 1)