How to compute expectancy in a dataframe across rows
Question:
I have a dataframe that contains day, symbol, strategy, and pnl. I want to analyze and compare pnl in a couple of ways.
I’d like to get the win-rate & expectancy when grouped by symbol and strategy. So I’ve done this:
def stats(s):
winrate = s['isWinner']['count'] / (s['isWinner']['count'] + s['isLoser']['count'])
expectancy = s['isWinner']['mean'] * winrate - s['isLoser']['mean'] * (1.0 - winrate)
df["isWinner"] = df['pnl'] >= 0
df["isLoser"] = df['pnl'] < 0
df2 = df.groupby(['day', 'symbol', 'strategy', 'isWinner']).agg({'pnl': ['count', 'mean', 'std', 'min', 'max']})
df2.groupby(['day', 'symbol', 'strategy']).agg(stats)
Apparently, I can’t do s['isWinner']
in the stats
function. What am I doing wrong?
Once the stats function works, how do I add winrate and expectancy to df2?
Am I going about this the right way? Is it necessary to create df2 from df, or is there a better way?
Answers:
I’m sure there is a more pythonic way to do this, but it works.
df["isWinner"] = df['pnl'] >= 0
df["isLoser"] = df['pnl'] < 0
grouped = df.groupby(['strategy'])
stats = pd.DataFrame()
for name, grp in grouped:
wincount = grp['isWinner'].values.sum()
loscount = grp['isLoser'].values.sum()
winrate = wincount / (wincount+loscount)
profits = grp[grp['isWinner']]['pnl'].sum()
losses = grp[grp['isLoser']]['pnl'].sum()
avgwin = profits/wincount
avglos = losses/loscount
expectancy = winrate * avgwin + (1.0 - winrate) * avglos
row = {'name': name, 'wincount': wincount, 'losscount': loscount, 'winrate': winrate, 'profits': profits, 'losses': losses, 'avgwin': avgwin, 'avgloss': avglos, 'expectancy': expectancy}
stats = stats.append(row, ignore_index=True)
stats
Results:
index
name
wincount
losscount
winrate
profits
losses
avgwin
avgloss
expectancy
1
Follow outside candles OPTIONS Filter 15
93.0
105.0
0.4696969696969697
36898.0
-20096.0
396.752688172043
-191.3904761904762
84.85858585858587
6
Scalp outside candles OPTIONS
11.0
20.0
0.3548387096774194
3409.0
-2971.0
309.90909090909093
-148.55
14.129032258064527
0
Follow outside candles OPTIONS
650.0
980.0
0.3987730061349693
200595.0
-178813.0
308.60769230769233
-182.46224489795918
13.36319018404906
10
Scalp outside v5
535.0
886.0
0.3764954257565095
125250.0
-108215.0
234.11214953271028
-122.13882618510158
11.988036593947925
5
Outside v4
151.0
163.0
0.48089171974522293
6257.0
-5105.0
41.437086092715234
-31.319018404907975
3.6687898089171966
4
Outside v3
110.0
172.0
0.3900709219858156
7813.0
-6852.0
71.02727272727273
-39.83720930232558
3.4078014184397176
2
Outside v1
113.0
151.0
0.42803030303030304
10498.0
-9790.0
92.90265486725664
-64.83443708609272
2.68181818181818
13
Scalp outside v8
607.0
729.0
0.45434131736526945
79443.0
-79559.0
130.87808896210873
-109.13443072702331
-0.08682634730538297
15
Scalp v2
78.0
103.0
0.430939226519337
1702.0
-1748.0
21.82051282051282
-16.97087378640777
-0.25414364640883846
3
Outside v2
81.0
117.0
0.4090909090909091
7603.0
-7773.0
93.8641975308642
-66.43589743589743
-0.8585858585858475
14
Scalp v1
47.0
53.0
0.47
833.0
-1402.0
17.72340425531915
-26.452830188679247
-5.690000000000001
7
Scalp outside v2
87.0
124.0
0.41232227488151657
18476.0
-20010.0
212.367816091954
-161.3709677419355
-7.270142180094808
8
Scalp outside v3
66.0
90.0
0.4230769230769231
9255.0
-10768.0
140.22727272727272
-119.64444444444445
-9.698717948717949
12
Scalp outside v7
27.0
52.0
0.34177215189873417
4015.0
-5784.0
148.7037037037037
-111.23076923076923
-22.392405063291136
11
Scalp outside v6
29.0
60.0
0.3258426966292135
6816.0
-8878.0
235.0344827586207
-147.96666666666667
-23.168539325842687
9
Scalp outside v4
48.0
104.0
0.3157894736842105
8015.0
-11622.0
166.97916666666666
-111.75
-23.730263157894747
I have a dataframe that contains day, symbol, strategy, and pnl. I want to analyze and compare pnl in a couple of ways.
I’d like to get the win-rate & expectancy when grouped by symbol and strategy. So I’ve done this:
def stats(s):
winrate = s['isWinner']['count'] / (s['isWinner']['count'] + s['isLoser']['count'])
expectancy = s['isWinner']['mean'] * winrate - s['isLoser']['mean'] * (1.0 - winrate)
df["isWinner"] = df['pnl'] >= 0
df["isLoser"] = df['pnl'] < 0
df2 = df.groupby(['day', 'symbol', 'strategy', 'isWinner']).agg({'pnl': ['count', 'mean', 'std', 'min', 'max']})
df2.groupby(['day', 'symbol', 'strategy']).agg(stats)
Apparently, I can’t do s['isWinner']
in the stats
function. What am I doing wrong?
Once the stats function works, how do I add winrate and expectancy to df2?
Am I going about this the right way? Is it necessary to create df2 from df, or is there a better way?
I’m sure there is a more pythonic way to do this, but it works.
df["isWinner"] = df['pnl'] >= 0
df["isLoser"] = df['pnl'] < 0
grouped = df.groupby(['strategy'])
stats = pd.DataFrame()
for name, grp in grouped:
wincount = grp['isWinner'].values.sum()
loscount = grp['isLoser'].values.sum()
winrate = wincount / (wincount+loscount)
profits = grp[grp['isWinner']]['pnl'].sum()
losses = grp[grp['isLoser']]['pnl'].sum()
avgwin = profits/wincount
avglos = losses/loscount
expectancy = winrate * avgwin + (1.0 - winrate) * avglos
row = {'name': name, 'wincount': wincount, 'losscount': loscount, 'winrate': winrate, 'profits': profits, 'losses': losses, 'avgwin': avgwin, 'avgloss': avglos, 'expectancy': expectancy}
stats = stats.append(row, ignore_index=True)
stats
Results:
index | name | wincount | losscount | winrate | profits | losses | avgwin | avgloss | expectancy |
---|---|---|---|---|---|---|---|---|---|
1 | Follow outside candles OPTIONS Filter 15 | 93.0 | 105.0 | 0.4696969696969697 | 36898.0 | -20096.0 | 396.752688172043 | -191.3904761904762 | 84.85858585858587 |
6 | Scalp outside candles OPTIONS | 11.0 | 20.0 | 0.3548387096774194 | 3409.0 | -2971.0 | 309.90909090909093 | -148.55 | 14.129032258064527 |
0 | Follow outside candles OPTIONS | 650.0 | 980.0 | 0.3987730061349693 | 200595.0 | -178813.0 | 308.60769230769233 | -182.46224489795918 | 13.36319018404906 |
10 | Scalp outside v5 | 535.0 | 886.0 | 0.3764954257565095 | 125250.0 | -108215.0 | 234.11214953271028 | -122.13882618510158 | 11.988036593947925 |
5 | Outside v4 | 151.0 | 163.0 | 0.48089171974522293 | 6257.0 | -5105.0 | 41.437086092715234 | -31.319018404907975 | 3.6687898089171966 |
4 | Outside v3 | 110.0 | 172.0 | 0.3900709219858156 | 7813.0 | -6852.0 | 71.02727272727273 | -39.83720930232558 | 3.4078014184397176 |
2 | Outside v1 | 113.0 | 151.0 | 0.42803030303030304 | 10498.0 | -9790.0 | 92.90265486725664 | -64.83443708609272 | 2.68181818181818 |
13 | Scalp outside v8 | 607.0 | 729.0 | 0.45434131736526945 | 79443.0 | -79559.0 | 130.87808896210873 | -109.13443072702331 | -0.08682634730538297 |
15 | Scalp v2 | 78.0 | 103.0 | 0.430939226519337 | 1702.0 | -1748.0 | 21.82051282051282 | -16.97087378640777 | -0.25414364640883846 |
3 | Outside v2 | 81.0 | 117.0 | 0.4090909090909091 | 7603.0 | -7773.0 | 93.8641975308642 | -66.43589743589743 | -0.8585858585858475 |
14 | Scalp v1 | 47.0 | 53.0 | 0.47 | 833.0 | -1402.0 | 17.72340425531915 | -26.452830188679247 | -5.690000000000001 |
7 | Scalp outside v2 | 87.0 | 124.0 | 0.41232227488151657 | 18476.0 | -20010.0 | 212.367816091954 | -161.3709677419355 | -7.270142180094808 |
8 | Scalp outside v3 | 66.0 | 90.0 | 0.4230769230769231 | 9255.0 | -10768.0 | 140.22727272727272 | -119.64444444444445 | -9.698717948717949 |
12 | Scalp outside v7 | 27.0 | 52.0 | 0.34177215189873417 | 4015.0 | -5784.0 | 148.7037037037037 | -111.23076923076923 | -22.392405063291136 |
11 | Scalp outside v6 | 29.0 | 60.0 | 0.3258426966292135 | 6816.0 | -8878.0 | 235.0344827586207 | -147.96666666666667 | -23.168539325842687 |
9 | Scalp outside v4 | 48.0 | 104.0 | 0.3157894736842105 | 8015.0 | -11622.0 | 166.97916666666666 | -111.75 | -23.730263157894747 |