How can a single column .apply() be made faster in Python Pandas?

Question:

Learned how to run a profiler for a code that needs many iterations in hopes to make the run times for sustainable. Turns out this take up 55-58% of the run time:

data['CDA_Factor_Avg'] = data.apply(lambda row : data['CDA_Factor'].loc[ starting_date : row.name ].mean(), axis=1)

Resulting in a Pandas dataframe ‘data’, columns ‘CDA_Factor_Avg’ and ‘CDA_Factor’ like:

CDA_Factor CDA_Factor_Avg
1 1
4 2.5
9 4.66

Where the mean is only ever taken up to the current cell. The Index is datetime. Does anyone see any better alternatives?

Thank you!

Asked By: Matthew Rozanoff

||

Answers:

You can use a expanding mean:

>>> df["CDA_Factor"].expanding().mean()
0    1.000000
1    2.500000
2    4.666667
Name: CDA_Factor, dtype: float64
Answered By: Chrysophylaxs

You might divide cumsum by number of elements following way

import pandas as pd
df = pd.DataFrame({"CDA_Factor":[1,4,9]})
df["CDA_Factor_Avg"] = df["CDA_Factor"].cumsum() / range(1,4)
print(df)

gives output

   CDA_Factor  CDA_Factor_Avg
0           1        1.000000
1           4        2.500000
2           9        4.666667
Answered By: Daweo
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.