Pandas – Sum across n columns based on value in another column
Question:
How can I sum across n number of columns where n is a value in another column. In this example, ‘heldInventory’ should be the sum of (wk1Sales + wk2Sales … + n) where n is the value in ‘weeksInventory’.
>>> df
product weeksInventory wk1Sales wk2Sales wk3Sales wk4Sales wk5Sales wk6Sales heldInventory
0 Product 1 1 25 31 28 32 33 27 25
1 Product 2 2 18 16 12 14 19 17 34
2 Product 3 3 5 4 2 3 4 6 11
3 Product 4 4 38 42 48 41 45 44 169
I want something like the similar:
df['heldInventory'] = df.iloc[:,2:(3 + df['weeksInventory'])].sum(axis=1)
but df['weeksInventory']
passes in the series, not the value at that row.
I’ve achieved the result through nested for loops, but it is very slow – looking for a vectorized approach.
Answers:
You can use apply
method.
def func(row):
n = row.weeksInventory
return (row.iloc[2:3+n]).sum()
df['heldInventory'] = df.apply(func,axis=1)
Or, in one line:
df['heldInventory'] = df.apply(lambda row: (row.iloc[2:3+row.weeksInventory]).sum(),axis=1)
How can I sum across n number of columns where n is a value in another column. In this example, ‘heldInventory’ should be the sum of (wk1Sales + wk2Sales … + n) where n is the value in ‘weeksInventory’.
>>> df
product weeksInventory wk1Sales wk2Sales wk3Sales wk4Sales wk5Sales wk6Sales heldInventory
0 Product 1 1 25 31 28 32 33 27 25
1 Product 2 2 18 16 12 14 19 17 34
2 Product 3 3 5 4 2 3 4 6 11
3 Product 4 4 38 42 48 41 45 44 169
I want something like the similar:
df['heldInventory'] = df.iloc[:,2:(3 + df['weeksInventory'])].sum(axis=1)
but df['weeksInventory']
passes in the series, not the value at that row.
I’ve achieved the result through nested for loops, but it is very slow – looking for a vectorized approach.
You can use apply
method.
def func(row):
n = row.weeksInventory
return (row.iloc[2:3+n]).sum()
df['heldInventory'] = df.apply(func,axis=1)
Or, in one line:
df['heldInventory'] = df.apply(lambda row: (row.iloc[2:3+row.weeksInventory]).sum(),axis=1)