How to to vectorise a path-dependent function in Pandas?
Question:
What is an efficient way to vectorize a path-dependent function in pandas (i.e. a function whose interim result depends on the previous result)? Storing the result in a matrix and indexing the previous result does not improve performance over a for loop.
a
,b
->f(x,y)
on the following dataframe:
pd.DataFrame({
'a': [1,3,5,7,7,7,4],
'b': [2,2,2,2,2,1,1],
'f(xy)':[1,1,3,5,5,6,5],
})
a b f(xy)
0 1 2 1
1 3 2 1
2 5 2 3
3 7 2 5
4 7 2 5
5 7 1 6
6 4 1 5
Where the function is (meta-language):
if t==0
f(xy[t]) = a[t]
else:
if f(xy[t-1]) < a[t]-b[t]:
f(xy[t]) = a[t]-b[t]
else if f(xy[t-1]) > a[t-1]+b[t]:
f(xy[t]) = a[t]+b[t]
else:
f(xy[t]) = xy[t-1]
(t is the dataframe index)
Answers:
With the dataframe you provided:
import pandas as pd
df = pd.DataFrame(
{
"a": [1, 3, 5, 7, 7, 7, 4],
"b": [2, 2, 2, 2, 2, 1, 1],
}
)
Here is one way to do it with Numpy where and Pandas shift:
import numpy as np
# Setup
df["f(xy)"] = 0
# General case
df["f(xy)"] = np.where(
(df["f(xy)"].shift(-1) < df["a"] - df["b"]),
df["a"] - df["b"],
np.where(
(df["f(xy)"].shift(-1) > df["a"] - df["b"]),
df["a"] + df["b"],
df["f(xy)"].shift(-1),
),
)
# First row
df.at[0, "f(xy)"] = df.loc[0, "a"]
# Last row
df.at[df.shape[0] - 1, "f(xy)"] = np.where(
(
df.loc[df.shape[0] - 2, "f(xy)"]
< df.loc[df.shape[0] - 1, "a"] - df.loc[df.shape[0] - 1, "b"]
),
df.loc[df.shape[0] - 1, "a"] - df.loc[df.shape[0] - 1, "b"],
np.where(
(
df.loc[df.shape[0] - 2, "f(xy)"]
> df.loc[df.shape[0] - 1, "a"] - df.loc[df.shape[0] - 1, "b"]
),
df.loc[df.shape[0] - 1, "a"] + df.loc[df.shape[0] - 1, "b"],
df.loc[df.shape[0] - 2, "f(xy)"],
),
)
print(df)
# Output
a b f(xy)
0 1 2 1.0
1 3 2 1.0
2 5 2 3.0
3 7 2 5.0
4 7 2 5.0
5 7 1 6.0
6 4 1 5.0
What is an efficient way to vectorize a path-dependent function in pandas (i.e. a function whose interim result depends on the previous result)? Storing the result in a matrix and indexing the previous result does not improve performance over a for loop.
a
,b
->f(x,y)
on the following dataframe:
pd.DataFrame({
'a': [1,3,5,7,7,7,4],
'b': [2,2,2,2,2,1,1],
'f(xy)':[1,1,3,5,5,6,5],
})
a b f(xy)
0 1 2 1
1 3 2 1
2 5 2 3
3 7 2 5
4 7 2 5
5 7 1 6
6 4 1 5
Where the function is (meta-language):
if t==0
f(xy[t]) = a[t]
else:
if f(xy[t-1]) < a[t]-b[t]:
f(xy[t]) = a[t]-b[t]
else if f(xy[t-1]) > a[t-1]+b[t]:
f(xy[t]) = a[t]+b[t]
else:
f(xy[t]) = xy[t-1]
(t is the dataframe index)
With the dataframe you provided:
import pandas as pd
df = pd.DataFrame(
{
"a": [1, 3, 5, 7, 7, 7, 4],
"b": [2, 2, 2, 2, 2, 1, 1],
}
)
Here is one way to do it with Numpy where and Pandas shift:
import numpy as np
# Setup
df["f(xy)"] = 0
# General case
df["f(xy)"] = np.where(
(df["f(xy)"].shift(-1) < df["a"] - df["b"]),
df["a"] - df["b"],
np.where(
(df["f(xy)"].shift(-1) > df["a"] - df["b"]),
df["a"] + df["b"],
df["f(xy)"].shift(-1),
),
)
# First row
df.at[0, "f(xy)"] = df.loc[0, "a"]
# Last row
df.at[df.shape[0] - 1, "f(xy)"] = np.where(
(
df.loc[df.shape[0] - 2, "f(xy)"]
< df.loc[df.shape[0] - 1, "a"] - df.loc[df.shape[0] - 1, "b"]
),
df.loc[df.shape[0] - 1, "a"] - df.loc[df.shape[0] - 1, "b"],
np.where(
(
df.loc[df.shape[0] - 2, "f(xy)"]
> df.loc[df.shape[0] - 1, "a"] - df.loc[df.shape[0] - 1, "b"]
),
df.loc[df.shape[0] - 1, "a"] + df.loc[df.shape[0] - 1, "b"],
df.loc[df.shape[0] - 2, "f(xy)"],
),
)
print(df)
# Output
a b f(xy)
0 1 2 1.0
1 3 2 1.0
2 5 2 3.0
3 7 2 5.0
4 7 2 5.0
5 7 1 6.0
6 4 1 5.0