How to to vectorise a path-dependent function in Pandas?

Question:

What is an efficient way to vectorize a path-dependent function in pandas (i.e. a function whose interim result depends on the previous result)? Storing the result in a matrix and indexing the previous result does not improve performance over a for loop.

a,b->f(x,y) on the following dataframe:

pd.DataFrame({
    'a':   [1,3,5,7,7,7,4],
    'b':   [2,2,2,2,2,1,1],
    'f(xy)':[1,1,3,5,5,6,5],
})

   a  b  f(xy)
0  1  2  1
1  3  2  1
2  5  2  3
3  7  2  5
4  7  2  5
5  7  1  6
6  4  1  5

Where the function is (meta-language):

if t==0
    f(xy[t]) = a[t]

else:

    if f(xy[t-1]) < a[t]-b[t]:
        f(xy[t]) = a[t]-b[t]

    else if f(xy[t-1]) > a[t-1]+b[t]:
        f(xy[t]) = a[t]+b[t]

    else:
        f(xy[t]) = xy[t-1]

(t is the dataframe index)

Asked By: C. Claudio

||

Answers:

With the dataframe you provided:

import pandas as pd

df = pd.DataFrame(
    {
        "a": [1, 3, 5, 7, 7, 7, 4],
        "b": [2, 2, 2, 2, 2, 1, 1],
    }
)

Here is one way to do it with Numpy where and Pandas shift:

import numpy as np

# Setup
df["f(xy)"] = 0

# General case
df["f(xy)"] = np.where(
    (df["f(xy)"].shift(-1) < df["a"] - df["b"]),
    df["a"] - df["b"],
    np.where(
        (df["f(xy)"].shift(-1) > df["a"] - df["b"]),
        df["a"] + df["b"],
        df["f(xy)"].shift(-1),
    ),
)

# First row
df.at[0, "f(xy)"] = df.loc[0, "a"]

# Last row
df.at[df.shape[0] - 1, "f(xy)"] = np.where(
    (
        df.loc[df.shape[0] - 2, "f(xy)"]
        < df.loc[df.shape[0] - 1, "a"] - df.loc[df.shape[0] - 1, "b"]
    ),
    df.loc[df.shape[0] - 1, "a"] - df.loc[df.shape[0] - 1, "b"],
    np.where(
        (
            df.loc[df.shape[0] - 2, "f(xy)"]
            > df.loc[df.shape[0] - 1, "a"] - df.loc[df.shape[0] - 1, "b"]
        ),
        df.loc[df.shape[0] - 1, "a"] + df.loc[df.shape[0] - 1, "b"],
        df.loc[df.shape[0] - 2, "f(xy)"],
    ),
)
print(df)
# Output
   a  b  f(xy)
0  1  2    1.0
1  3  2    1.0
2  5  2    3.0
3  7  2    5.0
4  7  2    5.0
5  7  1    6.0
6  4  1    5.0
Answered By: Laurent
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.