How do I subtract values in a dataframe from the previous row sequentially?

Question:

Supposing I have a data frame that looks like:

col1 col2
0    10
1    23
2    21
3    15

I want to subtract each value in col2 with the previous row sequentially, so that we are subtracting the previously subtracted value to get:

col1 col2
0    10 # left unchanged as index == 0
1    13 # 23 - 10
2    8  # 21 - 13
3    7  # 15 - 8

Other solutions that I have found all subtract the previous values as is, and not the new subtracted value. I would like to avoid using for loops as I have a very large dataset.

Asked By: baked goods

||

Answers:

Try below to understand the ‘previously subtracted’

b2 = a2 - a1
b3 = a3 - b2 = a3 - a2 + a1
b4 = a4 - b3 = a4 - a3 + a2 - a1
b5 = a5 - b4 = a5 - a4 + a3 - a2 + a1

So we just do

s = np.arange(len(df))%2
s = s + s - 1
df['new'] = np.tril(np.multiply.outer(s,s)).dot(df.col2)
Out[47]: array([10, 13,  8,  7])
Answered By: BENY

Below a simple pure Pandas (doesn’t need to import numpy) approach which is a more straightforward concept and easy to understand from code without additional explanations:

Let’s first define a function which will do the required work:

def ssf(val):
    global last_val
    last_val = val - last_val
    return last_val

Using the function above the code for creating the new column will be:

last_val = 0
df['new'] = df.col2.apply(ssf)

Let’s compare number of functions/methods used by the pure Pandas approach compared to the numpy one in the other answer.

The Pandas approach uses 2 functions/methods: ssf() and .apply() and 1 operation: simple subtraction.

The numpy approach uses 5 functions/methods: .arange(), len(), .tril(), .multiply.outer() and .dot() and 3 operations: array addition, array subtraction and modulo division.

Answered By: Claudio
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.