Pandas: sum up multiple columns into one column without last column
Question:
If I have a dataframe similar to this one
Apples Bananas Grapes Kiwis
2 3 nan 1
1 3 7 nan
nan nan 2 3
I would like to add a column like this
Apples Bananas Grapes Kiwis Fruit Total
2 3 nan 1 6
1 3 7 nan 11
nan nan 2 3 5
I guess you could use df['Apples'] + df['Bananas']
and so on, but my actual dataframe is much larger than this. I was hoping a formula like df['Fruit Total']=df[-4:-1].sum
could do the trick in one line of code. That didn’t work however. Is there any way to do it without explicitly summing up all columns?
Answers:
Using df['Fruit Total']= df.iloc[:, -4:-1].sum(axis=1)
over your original df won’t add the last column (‘Kiwis’), you should use df.iloc[:, -4:]
instead to select all columns:
print(df)
Apples Bananas Grapes Kiwis
0 2.0 3.0 NaN 1.0
1 1.0 3.0 7.0 NaN
2 NaN NaN 2.0 3.0
df['Fruit Total']=df.iloc[:,-4:].sum(axis=1)
print(df)
Apples Bananas Grapes Kiwis Fruit Total
0 2.0 3.0 NaN 1.0 6.0
1 1.0 3.0 7.0 NaN 11.0
2 NaN NaN 2.0 3.0 5.0
It is possible to do it without knowing the number of columns and even without iloc:
print(df)
Apples Bananas Grapes Kiwis
0 2.0 3.0 NaN 1.0
1 1.0 3.0 7.0 NaN
2 NaN NaN 2.0 3.0
cols_to_sum = df.columns[ : df.shape[1]-1]
df['Fruit Total'] = df[cols_to_sum].sum(axis=1)
print(df)
Apples Bananas Grapes Kiwis Fruit Total
0 2.0 3.0 NaN 1.0 5.0
1 1.0 3.0 7.0 NaN 11.0
2 NaN NaN 2.0 3.0 5.0
I want to build on Ramon’s answer if you want to come up with the total without knowing the shape/size of the dataframe.
I will use his answer below but fix one item that didn’t include the last column for the total.
I have removed the -1 from the shape:
cols_to_sum = df.columns[ : df.shape[1]-1]
To this:
cols_to_sum = df.columns[ : df.shape[1]]
print(df)
Apples Bananas Grapes Kiwis
0 2.0 3.0 NaN 1.0
1 1.0 3.0 7.0 NaN
2 NaN NaN 2.0 3.0
cols_to_sum = df.columns[ : df.shape[1]]
df['Fruit Total'] = df[cols_to_sum].sum(axis=1)
print(df)
Apples Bananas Grapes Kiwis Fruit Total
0 2.0 3.0 NaN 1.0 6.0
1 1.0 3.0 7.0 NaN 11.0
2 NaN NaN 2.0 3.0 5.0
Which then gives you the correct total without skipping the last column.
This may be helpful for beginners, so for the sake of completeness, if you know the column names (e.g. they are in a list), you can use:
column_names = ['Apples', 'Bananas', 'Grapes', 'Kiwis']
df['Fruit Total']= df[column_names].sum(axis=1)
This gives you flexibility about which columns you use as you simply have to manipulate the list column_names
and you can do things like pick only columns with the letter ‘a’ in their name. Another benefit of this is that it’s easier for humans to understand what they are doing through column names. Combine this with list(df.columns)
to get the column names in a list format. Thus, if you want to drop the last column, all you have to do is:
column_names = list(df.columns)
df['Fruit Total']= df[column_names[:-1]].sum(axis=1)
This might be a much easier method to solve, and it will take care of the other datatypes too, which are not required:
df['Fruit Total'] = df.sum(axis=1, numeric_only= True)
print(df)
Apples Bananas Grapes Kiwis Fruit Total
0 2.0 3.0 NaN 1.0 6.0
1 1.0 3.0 7.0 NaN 11.0
2 NaN NaN 2.0 3.0 5.0
If I have a dataframe similar to this one
Apples Bananas Grapes Kiwis
2 3 nan 1
1 3 7 nan
nan nan 2 3
I would like to add a column like this
Apples Bananas Grapes Kiwis Fruit Total
2 3 nan 1 6
1 3 7 nan 11
nan nan 2 3 5
I guess you could use df['Apples'] + df['Bananas']
and so on, but my actual dataframe is much larger than this. I was hoping a formula like df['Fruit Total']=df[-4:-1].sum
could do the trick in one line of code. That didn’t work however. Is there any way to do it without explicitly summing up all columns?
Using df['Fruit Total']= df.iloc[:, -4:-1].sum(axis=1)
over your original df won’t add the last column (‘Kiwis’), you should use df.iloc[:, -4:]
instead to select all columns:
print(df)
Apples Bananas Grapes Kiwis
0 2.0 3.0 NaN 1.0
1 1.0 3.0 7.0 NaN
2 NaN NaN 2.0 3.0
df['Fruit Total']=df.iloc[:,-4:].sum(axis=1)
print(df)
Apples Bananas Grapes Kiwis Fruit Total
0 2.0 3.0 NaN 1.0 6.0
1 1.0 3.0 7.0 NaN 11.0
2 NaN NaN 2.0 3.0 5.0
It is possible to do it without knowing the number of columns and even without iloc:
print(df)
Apples Bananas Grapes Kiwis
0 2.0 3.0 NaN 1.0
1 1.0 3.0 7.0 NaN
2 NaN NaN 2.0 3.0
cols_to_sum = df.columns[ : df.shape[1]-1]
df['Fruit Total'] = df[cols_to_sum].sum(axis=1)
print(df)
Apples Bananas Grapes Kiwis Fruit Total
0 2.0 3.0 NaN 1.0 5.0
1 1.0 3.0 7.0 NaN 11.0
2 NaN NaN 2.0 3.0 5.0
I want to build on Ramon’s answer if you want to come up with the total without knowing the shape/size of the dataframe.
I will use his answer below but fix one item that didn’t include the last column for the total.
I have removed the -1 from the shape:
cols_to_sum = df.columns[ : df.shape[1]-1]
To this:
cols_to_sum = df.columns[ : df.shape[1]]
print(df)
Apples Bananas Grapes Kiwis
0 2.0 3.0 NaN 1.0
1 1.0 3.0 7.0 NaN
2 NaN NaN 2.0 3.0
cols_to_sum = df.columns[ : df.shape[1]]
df['Fruit Total'] = df[cols_to_sum].sum(axis=1)
print(df)
Apples Bananas Grapes Kiwis Fruit Total
0 2.0 3.0 NaN 1.0 6.0
1 1.0 3.0 7.0 NaN 11.0
2 NaN NaN 2.0 3.0 5.0
Which then gives you the correct total without skipping the last column.
This may be helpful for beginners, so for the sake of completeness, if you know the column names (e.g. they are in a list), you can use:
column_names = ['Apples', 'Bananas', 'Grapes', 'Kiwis']
df['Fruit Total']= df[column_names].sum(axis=1)
This gives you flexibility about which columns you use as you simply have to manipulate the list column_names
and you can do things like pick only columns with the letter ‘a’ in their name. Another benefit of this is that it’s easier for humans to understand what they are doing through column names. Combine this with list(df.columns)
to get the column names in a list format. Thus, if you want to drop the last column, all you have to do is:
column_names = list(df.columns)
df['Fruit Total']= df[column_names[:-1]].sum(axis=1)
This might be a much easier method to solve, and it will take care of the other datatypes too, which are not required:
df['Fruit Total'] = df.sum(axis=1, numeric_only= True)
print(df)
Apples Bananas Grapes Kiwis Fruit Total
0 2.0 3.0 NaN 1.0 6.0
1 1.0 3.0 7.0 NaN 11.0
2 NaN NaN 2.0 3.0 5.0