DIAMONDS ON THE LINE: Profits through Investment Gaming

Transcription

DIAMONDS ON THE LINE: Profits through Investment Gaming
DIAMONDS ON THE LINE:
Profits through Investment Gaming
Clay Graham
DEPAUL UNIVERSITY
The boys (and girls) are back in town!
Let’s Look at Why We’re Here?
Build an analytical model for investing in a
baseball game’s outcome resulting in:
• Picking a team(s)
• Quantifying level(s) of investment
In order to:
• Maximize expected value of profits
• Subject to: economic constraints and risk tolerance
“It’s Not Gambling!”
Jeff Ma
Sniff and Kick
(in academic parlance-research)
Input from Our Crack Research Team
Access to Ultra High Speed Computers
(in use at Wharton?)
Did you know?
Percentage
Inequities Between Game and Line
80%
70%
60%
50%
40%
30%
20%
10%
0%
Overrated
70%
Undervalued
54%
30%
Home
Favored %
Source: http://oddswarehouse.com/
46%
Road
Winning %
“It’s flat-out scary, BABY.”
Dick Vitale
Sources: (1)
http://www.prnewswire.com/news-releases/sports-betting-tops-one-trillion-us-dollars-says-bookmaker-to-the-billionaires-273768381.html,
accessed February 11,2015
(2) ESPN the Magazine, February 16,2015
Scoring Linked to K/BB
Growing Strike Zone?
(runs/game vs K/BB – time line)
r2 = .80
Source: http://espn.go.com/mlb/stats/team/_/stat/ , accessed 2/8/2015
How it Works
Mapping Path to Profits
Data
Information
Reservoir
What's in a
Line
Production
Function
Matchup's
Batter vs
Pitcher
Road vs
Home
LINE
μ
Implied
P(W)
Park Factor
αβ
parameters
EDGE
Filters
Governing
Constraints
Decision &
Feedback
GAME
σ
Gamma
Functions
EVRuns
Road
Normalized
IP(W)
Key:
DECISION
EVRuns
Home
P(W)
Road path
Home Path
Joint path
Objective
Profit by Capitalizing on the
Market Inequities Between the
Game and the Line
What is the Money Line?
Money Line is the
Price of an Investment (bet)
Road -113: Favorite, risk 113 to win 100
Home: 105: Underdog, risk 100 to win 105
Juice, “Vig”, and other Mysteries
Even bet defined: Home -110, Road -110
• Bet 110 to win 100
• Dime line
• House:
Receives 220 for bets (2 @ 110)
Pays out 210 to winner (original bet + winnings)
Keeps 10 as profit 4.5% (10/220) – this is the Vig
Winning Lines 2007-2014
(median: -115)
(mean: -105)
Underdog wins 40%
Favorite wins 60%
Source: http://oddswarehouse.com/
Implied Probability of Winning (line)
Normalized Implied P(W)
Given Lines: Road -113, Home 105
Implied probability of winning, calculation
• Road: 113 / (113+100) = 53.0%
• Home: 100 / (105 + 100) = 48.8%
• Total 101.8%
Normalize (sum of probabilities =100%)
• Road: 53.0% / 101.8% = 52.1%
• Home 48.8% / 101.8% = 47.9%
In the Words of
the Great Western Philosopher
“It's tough to make predictions,
especially about the future.”
-Yogi Berra
Baseball’s Pythagorean Theorem
Probability of the Home Team Winning:
P(Whome) = (Runshome)2 / {(Runshome)2 + (Runsroad)2}
Building the Production Function
Runs / Out
Fundamental Elements of Productivity
•
•
•
•
•
Singles
Doubles
Triples
Home runs
Base on balls
runs/out = f(%1, %2, %3, %HR, %BB)
expected value runs / 9 innings = (27 * runs/out)
Production Function Runs / Out
(note: forced zero intercept)
Multiple Regression for Runs/Out
Summary
ANOVA Table
Explained
Unexplained
Multiple
R
R-Square
Adjusted
R-Square
StErr of
Estimate
0.744
0.553
0.553
0.054
Degrees of
Freedom
Sum of
Squares
Mean of
Squares
F-Ratio
p-Value
4
7286
26.640
21.529
6.660
0.003
2253.897
< 0.0001
Coefficient
Standard
Error
t-Value
p-Value
0
0.3928
0.9053
1.7361
0.1716
NA
0.0094
0.0221
0.0307
0.0155
NA
41.6966
40.9909
56.5698
11.0396
NA
< 0.0001
< 0.0001
< 0.0001
< 0.0001
Regression Table
Constant
%1
%2+3
%HR
%BB
source: www.retrosheet.org/gamelogs/, for years 2011,2012 and 2013
Confidence Interval
95%
Lower
Upper
NA
0.3743
0.8620
1.6760
0.1411
NA
0.4113
0.9486
1.7963
0.2021
Production Distribution: Runs/Out
source: www.retrosheet.org/gamelogs/, for years 2011,2012 and 2013
Runs/9 Inning Game Distributed
as a Gamma Function
source: www.retrosheet.org/gamelogs/, for years 2011,2012 and 2013
Gamma Function also has Mathematically
Desirable Characteristics
Gamma function’s two parameters
(domain lower boundary = 0)
" shape , $ scale
1st moment
2nd moment
: = " * $ , F2 = $2 * "
Solving for " and $ in terms of: : and F2
" = :2 / F2
$ = F2 / :
 Used to calculate: each team’s expected run production
Matchups
Dynamic Club Statistics:
Continuously Calculate key Variables
Time
Variance
Ranking
Calculating Game’s Matchup Variances
(F2Road Run
Scored
+
(F2Road Runs allowed
F2Home Runs Allowed) = F2Road vs Home Scoring
+
F2Home Runs Scored) = F2Home vs Road Scoring
Scoring Prediction Tabulation
Inputs to Gamma Distribution
Matchups:
Batter-Pitcher (micro)→ μ
Road-Home performance (macro)→σ2
Fore each team
Imposing Matchup Constraints (filters)
Time Separate the Wheat from the Chaff
Filters
Governing
Constraints
Variable
OB+S
K/BB
PA
Effective Outcomes
Notes: (1) Too good to be true
Road
Home
Net Rank
<-10
>9
% Δ EVR
<-25%
>15%
% Δ K/BB
<-50%
>75%
% Δ OB+S
<25%
>10%
NIP(W)
>47%
>45%
Edge min
>0%
>0
Edge max1
<21%
<17%
PArd
>71
>10
PAhm
>11
>52
Data base age
≈ 65 days
“Rankmeister” and Time Period Impact
Filters
Governing
Constraints
Rank Advantage Varies with Database Tabulation Period
64 days
Rank Advantage Home = 13
2 - 4 = -2
26 – 11 = 15
108 days
-2
15
13
Rank Advantage Road = -6
1 - 8 = -7
11 – 10 = 1
-7
1
-6
Multiple Selection Criteria
From regression equation
Batter pitcher matchup calculations
Road Team Products Should be
Adjusted for Park Characteristics
Park Factor
1
Source: http://espn.go.com/mlb/stats/parkfactor
notes: (1) Inverse Fibonacci weights
Monte Carlo Simulation
Winning Margin - Runs
Density function:
Home Team Winning Margin
Δ runs = Γ("hm,$ hm)home - Γ("rd,$ rd)road
Monte Carlo –
Run Differential Between Teams
P(Whome) = 57%
Probability of Winning Through
Monte Carlo Simulation
Density functions
are incorporated rather than
point estimates (averages)
P(win)home =
Γ("hm,$ hm)home2 / (Γ("hm,$ hm)home2 + Γ("rd,$ rd)road2)
Monte Carlo –
Each Team’s Density Function
P(Whome) = 58%
Monte Carlo Simulation
Probability of Winning
Monte Carlo vs. Monolithic Results
56.5%
P(Whome) = f(run differential) = 57%
57.9%
P(Whome) = f(density function) = 58%
Monolithic: P(Whome) = Γ(",$)home2 / (Γ(",$)home2 + Γ(",$)road2) = 63%
It all Leads to Gaining an EDGE
Just what is the EDGE?
Simply stated:
EDGE = P(Win) – Normalized Implied P(Win)
game
line
Basic Investment Function - %Bankroll
(expected winning percentage ≈ 55%-58%)
Generalized “S” Curve
(used to fit variable EDGE function)
% Bankroll (staking) =
Ab+((At-Ab) / (1+Exp (-(EDGE-X0) / W)))
Where:
Ab = minimum proportion of bankroll - base
At = maximum proportion of bankroll - top
W = transition slope - width
X0 = shifting factor
EDGE = P(W) – NIP(W)
Dynamic Investment Function
Changes with Probability of Winning
Too
good
to be
true
Does it Work?
Bankroll More Than Doubled
in Just Two Months
Source: tabulated over 2014 season
Original Bankroll up over Ten fold
previous slide time period
Source: tabulated over 2014 season
Feeling So Good!
Post Season
Source: tabulated over 2014 season
Summary of Some Operational Results
Winning percentage 68%,
Average daily return on at risk capital 35%,
Overall return on original bankroll 1,425%,
Average bets per day 1.91,
Average bet size 3.5% of available bankroll,
Percent of games invested 23%,
EDGE based investment results in a doublings of profits
Modeling Contributions
Runs/out enhances accuracy of modeling,
Game day batter-starting pitcher matchups effectively
feed the production function,
Road-Home variance matchup generates scoring F2,
Dynamic time variable drives algorithm,
Monte Carlo used to more effectively determine
probability of winning,
Genetic programming and filters powerful profit
optimizer.
“Go where the numbers take you!”™
Only 37 Days until the Season Opener!
Our Year!
Time for Your
Questions!
Epilog
What Happened Since SSAC15?
Nate Silver challenge
Assault of the hedge funds