How can I speed up the computation of a specific function?

Question:

I have a df and need to count how many adjacent columns have the same sign as other columns based on the sign of the first column, and multiply by the sign of the first column.

What I need to speed up is the calc_df function, which runs like this on my computer:

%timeit calc_df(df)
6.38 s ± 170 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The output of my code is:

        a_0       a_1       a_2       a_3       a_4       a_5       a_6       a_7       a_8       a_9
0  0.097627  0.430379  0.205527  0.089766 -0.152690  0.291788 -0.124826  0.783546  0.927326 -0.233117
1  0.583450  0.057790  0.136089  0.851193 -0.857928 -0.825741 -0.959563  0.665240  0.556314  0.740024
2  0.957237  0.598317 -0.077041  0.561058 -0.763451  0.279842 -0.713293  0.889338  0.043697 -0.170676
3 -0.470889  0.548467 -0.087699  0.136868 -0.962420  0.235271  0.224191  0.233868  0.887496  0.363641
4 -0.280984 -0.125936  0.395262 -0.879549  0.333533  0.341276 -0.579235 -0.742147 -0.369143 -0.272578

0    4.0
1    4.0
2    2.0
3   -1.0
4   -2.0

My code is as follows, where the generate_data function generates demo data, which is consistent with my actual data volume.

import numpy as np
import pandas as pd
from numba import njit

np.random.seed(0)

pd.set_option('display.max_columns', None)
pd.set_option('expand_frame_repr', False)


# This function generates demo data.
def generate_data():
    col = [f'a_{x}' for x in range(10)]
    df = pd.DataFrame(data=np.random.uniform(-1, 1, [280000, 10]), columns=col)
    return df


@njit
def calc_numba(s):
    a = s[0]
    b = 1
    for sign in s[1:]:
        if sign == a:
            b += 1
        else:
            break
    b *= a
    return b


def calc_series(s):
    return calc_numba(s.to_numpy())


def calc_df(df):
    df1 = np.sign(df)
    df['count'] = df1.apply(calc_series, axis=1)
    return df


def main():
    df = generate_data()
    print(df.head(5))
    df = calc_df(df)
    print(df['count'].head(5))
    return


if __name__ == '__main__':
    main()

Asked By: jaried

||

Answers:

You can use vectorial code here.

For example with a mask:

df1 = np.sign(df)
m = df1.eq(df1.iloc[:,0], axis=0).cummin(1)
out = df1.where(m).sum(1)

Output (5 first rows):

0    4.0
1    4.0
2    2.0
3   -1.0
4   -2.0
dtype: float64

Time to run on whole data:

269 ms ± 37.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Faster alternative:

df1 = np.sign(df)
m = df1.eq(df1.iloc[:,0], axis=0).cummin(1)
out = m.sum(1)*df1.iloc[:,0]

148 ms ± 27.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

And you can probably do even better with pure numpy (you have to write a cummin equivalent).

Answered By: mozway

Someone answered me this way:

Using your solution and with numpy:

import numpy as np
import pandas as pd
from numba import njit

np.random.seed(0)

pd.set_option('display.max_columns', None)
pd.set_option('expand_frame_repr', False)


# This function generates demo data.
def generate_data() -> pd.DataFrame:
    col = [f'a_{x}' for x in range(10)]
    np.random.seed(0)
    df = pd.DataFrame(data=np.random.uniform(-1, 1, [280000, 10]), columns=col)
    return df

# This function generates demo data.
def generate_data_array() -> np.array:
    np.random.seed(0)
    return np.random.uniform(-1, 1, [280000, 10])
%%timeit df = generate_data()
df1 = np.sign(df)
m = df1.eq(df1.iloc[:,0], axis=0).cummin(1)
out_df = m.sum(1)*df1.iloc[:,0]

76.6 ms ± 2.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit array = generate_data_array()
array2 = np.sign(array)
array3 = np.minimum.accumulate(np.equal(array2, np.expand_dims(array2[:,0], axis=1)), 1)
out_array = array3.sum(axis=1) * array2[:,0]

44.3 ms ± 1.35 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Answered By: jaried