Lecture 3 Notes - Orincancercare.org

Transcription

Lecture 3 Notes - Orincancercare.org
BABS 502
Regression Based Forecasting
March 4, 2014
(c) Martin L. Puterman
1
Simple and Multiple Regression
• A widely used set of statistical tools that are useful for:
– forecasting
– data summary
– adjustment for uncontrolled factors
• Basic idea is to fit an equation of the following form relating a dependent
variable to one or more independent variables
y = 0 + 1x1 + 2x2 + 3x3 + …
• It’s power is that by choosing the y and xi’s in different ways a wide
range of different effects can be taken into account.
• The theoretical model assumes that each observation is subject to an
additive error which is normally distributed with mean zero and the
same variance for every observation so that one observes the signal
and noise components in aggregate.
• In forecasting the signal part provides the point forecast and the random
part provides an accuracy measure.
(c) Martin L. Puterman
2
Regression in forecasting - trend
extrapolation
• Fit a trend to historical data
– linear
– quadratic
– exponential
Yt = a + bt
Yt = a + bt + ct2
Yt = aebt or Log (Yt) = a + bt
• Assumption is that the same trend occurred
throughout the past and that it will persist into future
• Fit using lm or tslm in R
– Quadratic fit - tslm(y~poly(trend,2,raw=TRUE))
• Extensive regression theory available to guide use
(c) Martin L. Puterman
3
Trend Regression – Births Data
Linear Trend
Quadratic Trend
Cubic Trend
(c) Martin L. Puterman
4
Cubic regression forecast of barley yields in BC
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
2.094e+00 1.498e-01 13.983 < 2e-16 ***
poly(trend, 3, raw = TRUE)1 -3.348e-02 1.229e-02 -2.724 0.00762 **
poly(trend, 3, raw = TRUE)2 7.752e-04 2.713e-04 2.857 0.00520 **
poly(trend, 3, raw = TRUE)3 -4.273e-06 1.699e-06 -2.515 0.01350 *
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3681 on 100 degrees of freedom
Multiple R-squared: 0.2517, Adjusted R-squared: 0.2293
F-statistic: 11.21 on 3 and 100 DF, p-value: 2.096e-06
(c) Martin L. Puterman
5
Some R commands for regression
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
birthsts<-ts(births[,2],start=1946)
b<-births{,2]
t<-births[,1]
t2<-t^2
plot(b)
plot(t,b,type="l")
lines(lm(b~t)$fit,col=2,lwd=2)
lines(lm(b~t+t2)$fit,col=3,lwd=2)
#residuals
r<-residuals(lm(b~t))
plot(t,r)
acf(r)
print(acf(r))
dwtest(b~t)
summary(lm(b~poly(t,3,raw=TRUE)))
dwtest(b~poly(t,3,raw=TRUE))
•
•
•
•
#http://www.r-bloggers.com/polynomial-regressiontechniques/
plot(t,b,type="l")
fit3<-lm(b~poly(t,2,raw=TRUE))
lines(t,predict(fit2),col=2)
•
•
•
•
•
•
•
#Using ts regression commands to get fits and plots
tslm(birthsts~trend)
fitq<-tslm(birthsts~trend+I(trend^2))
fq<-forecast(fitq,h=5,level=95)
summary(fq)
plot(fq)
lines(fitted(fitq),col=2)
•
•
•
•
•
#fitting cubic trend with forecast intervals
fitc<-tslm(birthsts~poly(trend,3,raw=TRUE))
fc<-forecast(fitc,h=8,level=95)
plot(fc)
lines(fitted(fitc),col=3)
•
dwtest(b~poly(t,3,raw=TRUE))
(c) Martin L. Puterman
6
Dummy Variables
• Dummy Variables are independent variables in regression that
assume the values of either 0 or 1.
– A value 1 means a condition is present; a value 0 means it is not.
– When an observation in regression corresponds to a condition
being present, then the value of that observation is decreased or
increased by a constant amount equal to the value of the coefficient
of the dummy variable in the regression.
• If a condition has three possible values; say “high”, “medium”
or “low”. We encode its value with two dummy variables. The
first variable, High, equals 1 if the condition is “high” and zero
otherwise and the second variable, Medium, equals 1 if the
condition is “medium” and zero otherwise. When the condition
is “low” both values are zero. The Baseline condition “low” is
reflected in the constant in the regression equation.
• In time series regression, we use dummy variables for seasons,
and use S-1 dummy variables if there are S seasons. We are
free to chose the baseline season from which all others are
measured.
(c) Martin L. Puterman
7
Trend Regression with Seasonality
• My experience suggests that a quadratic trend
regression plus (additive) seasonality is useful for
forecasting
• Uses “dummy variables” for seasons
• Must be fit with regression software
• Equation with linear trend and additive monthly
seasonality
Yt = a + bt + dt2+ c2Febt + c3Mart + … + c12Dect
• Also enables multiple levels of seasonality such as
weekly and monthly.
(c) Martin L. Puterman
8
Trend Regression with Seasonality
• In previous Febt, Mart, … are dummy variables
– they equal 1 if observation Yt is from the indicated month and 0
otherwise
– Note that there is no dummy variable for January
• January is the baseline for comparison
Examples:
Yt = a + bt
Yt = a + bt + c2
Yt = a + bt + c3
Observation t in January
Observation t in February
Observation t in March
• In R, tslm automatically generates seasonal dummies
for a ts object
– tslm(y~trend+season)
(c) Martin L. Puterman
9
Trend Regression With Seasonality Example
Intercept
Trend
I(Feb)
I(Mar)
I(Apr)
I(May)
I(Jun)
I(Jul)
I(Aug)
I(Sep)
I(Oct)
I(Nov)
I(Dec)
189.88
0.36
-23.85
-19.96
-34.87
-25.17
-8.98
11.88
11.72
-17.19
-26.08
-28.46
-8.46
Some forecasts:
Jan: F156(1) = 189.88 + 0.36*157 = 246.40
Feb: F156(2) = 189.88 + 0.36*158 - 23.85
= 222.91
Mar: F156(3) = 189.88 + 0.36*159 - 19.96
= 227.16
(c) Martin L. Puterman
10
Regression Example: Forecast
Updating During Season
• Goal: Improve total sales forecasts using interim sales data
• Data; early forecast, interim sales and total sales data for a
wide range of products.
• Fitted Model:
Total Sales = 120 + .6 Interim Sales +.3 Early Forecast
• Example: Early Forecast of Total Sales = 3000; Interim
Sales =1400
• Revised Total Sales Forecast
Total Sales = 120 + .6*1400 + .3*3000 = 1860
• Forecast Standard Deviation is Regression RMSE
(c) Martin L. Puterman
11
Regression Example: Impact of
Advertising
• Goal: Take into account effect of advertising
expenditures on sales
• Data; Sales and advertising expenditures in previous
quarter
• Fitted Model:
Salest = 15 + 10 Quartert + .8 Salest-1 +.4 (Advertisingt-1) 1/2
• Example: Sales in last quarter = 2000 and Advertising in
previous quarter = 10,000
• Total Sales Forecast
Sales = 15 + .8*2000 + .4*100 =1655
• Forecast Standard Deviation is Regression RMSE
(c) Martin L. Puterman
12
Some special concerns when using regression
with time series data
• Often the usual regression assumption of uncorrelated
errors is violated
– This means that the residuals contain information.
Case A: This is usually due to model mis-specification; i.e.
omission of important variables
Case B: But sometimes we have what we think is a good model
and there is nothing obvious to add.
• Difficulty – Standard errors are underestimated so model
seems better than it really is.
– Concept: Since observations are not independent, there is less
information in the data than you would think
– Reject Ho: βj = 0 when we shouldn’t.
• Detection
– Some systematic pattern in residual plot vs. time
– Durbin-Watson Test (see next slide).
– (Best approach) ACF of residuals
(c) Martin L. Puterman
13
Durbin-Watson test; comments
• The Durbin-Watson test is a not so good alternative
to using the ACF of the residuals but it is widely used
probably because of historical reasons.
• It is based on the Durbin-Watson test statistic D.
• It tests only for first order autocorrelation in the
errors.
– Formally it tests H0:  =0 vs. Ha:  0
– The test is reject H0 and conclude that there is
autocorrelation in the residuals if D is well below 2 or well
above 2; I suggest being imprecise here. I would worry
about values less than 1.4 or greater than 2.6.
• In economic data, when  is not zero, it is usually
positive.
(c) Martin L. Puterman
14
Regression with (auto)-correlated residuals
Approaches for obtaining more reliable estimates;
– Add variables, such as trend squared, or use the lagged
dependent variable as an explanatory variable. (See sales and
advertising example on previous slide; Salest-1 is a lagged
variable.)
– Use time series regression models – which except for a special
case (AR1 errors) requires advanced software such as SAS or R.
• Package Orcutt in R obtains estimates for models with AR(1) errors
• See Hyndman text section 9.1 for more on this.
– Difference data if lag one autocorrelation is large (and positive)
and software such as that above is not available.
(c) Martin L. Puterman
15
Regression with auto-correlated errors
Model
yt = β0 + β1x1t + ••• + βmxmt + εt
where
εt = ρ εt-1 + νt
and
νt ~ N(0, σ2) and independent
The quantity ρ is called the first order auto- correlation or serial correlation
parameter and is between -1 and +1.
The Corchrane-Orcutt procedure, which is in the R package “Orcutt”, estimates the
regression coefficients and ρ for this model. The help manual describes the
algorithm in some detail.
Note that usually the regression coefficients will not change much from ordinary
regression but their standard errors will be larger.
(c) Martin L. Puterman
16
Example – BC Incorporations
Trend Regression
T-Value
to test
H0:B(i)=0
10.010
3.209
Prob
Level
0.0000
0.0059
Reject
H0 at
5%?
Yes
Yes
Power
of Test
at 5%
1.0000
0.8503
Serial Correlation of Residuals Section
Serial
Serial
Serial
Lag
Correlation Lag
Correlation Lag
Correlation
1
0.7473
9
-0.2205
17
0.0000
2
0.3632
10
-0.0183
18
0.0000
3
-0.0190
11
0.1487
19
0.0000
4
-0.2897
12
0.2349
20
0.0000
5
-0.4198
13
0.0000
21
0.0000
6
-0.4632
14
0.0000
22
0.0000
7
-0.4478
15
0.0000
23
0.0000
8
-0.3732
16
0.0000
24
0.0000
Above serial correlations significant if their absolute values are greater than 0.485071
BC vs Year
35000
28333
BC
Regression Equation Section
Regression Standard
Independent Coefficient Error
Variable
b(i)
Sb(i)
Intercept
18672.3162 1865.4549
trend
584.1152
182.0498
21667
15000
1990 1992 1994 1997 1999 2001 2003 2006 2008 2010
Durbin-Watson Test For Serial Correlation
Parameter Value
Durbin-Watson Value
Prob. Level: Positive Serial Correlation
Prob. Level: Negative Serial Correlation
Did the Test Reject
H0: Rho(1) = 0?
0.3573
0.0000
1.0000
Year
Yes
No
(c) Martin L. Puterman
17
Same data using serial
correlation routine
Run Summary Section
Parameter
Dependent Variable
Number Ind. Variables
Weight Variable
R2
Adj R2
Coefficient of Variation
Mean Square Error
Square Root of MSE
Ave Abs Pct Error
Parameter
Rows Processed
Rows Filtered Out
Rows with X's Missing
Rows with Weight Missing
Rows with Y Missing
Rows Used in Estimation
Sum of Weights
Completion Status
Autocorrelation (Rho)
Value
17
0
0
0
0
17
16.000
Normal Completion
0.8523
Regression Equation Section
Regression Standard
Independent Coefficient Error
Variable
b(i)
Sb(i)
Intercept
10850.3237 12603.3258
T-Value
to test
H0:B(i)=0
0.861
Prob
Level
0.4038
Reject
H0 at
5%?
No
trend
1.576
0.1374
No
1244.8410
Value
BC
1
None
0.1506
0.0899
0.4879
4628200
2151.325
17.034
790.0657
(c) Martin L. Puterman
18
Same data but adding extra variables
Regression Equation Section
Regression
Independent
Coefficient
Variable
b(i)
Intercept
22290.5091
dummy
-37509.5091
dummyXyear
3036.6909
year
-73.6909
Standard
Error
Sb(i)
1303.6382
7155.5803
518.8170
192.2110
T-Value
to test
H0:B(i)=0
17.099
-5.242
5.853
-0.383
Prob
Level
0.0000
0.0002
0.0001
0.7076
Reject
H0 at
5%?
Yes
Yes
Yes
No
Power
of Test
at 5%
1.0000
0.9979
0.9997
0.0646
• In above – dummy = 1 if year for year > 2001 and
dummmyXyear allows for a shift in trends.
• Note there is still some autocorrelation present, lag 1
serial autocorrelation equals .32 (which is insignificant)
and the Durbin-Watson Test is significant but much less so
than without extra variables.
• The purpose of this example was to show that
autocorrelation can result from the omission of
independent variables.
(c) Martin L. Puterman
19
What if seasonality is multiplicative and
we want to use regression?
•
•
Problem; Model on nominal scale assumes additive effect of seasonal dummy
variables.
Solution: Do regression on the logarithmic scale. This means that we transform
the dependent variable by taking logarithms (base 10 or base e) and then do
regression.
• Why does this work? Multiplicative seasonality is additive on the log scale!
•
Thus we can do forecasts using the model on the log scale and then transform
back to the original scale by exponentiating the forecast on the log scale.
– Example: If forecast on Log10 scale is 3.4, then forecast on the nominal (original) scale
is 103.4 = 2511.9 units.
•
•
•
But predictions based on these transformations are often biased.
Alternative ad hoc approach: Deseasonalize data; fit model to deseasonalized
data and then multiply back by seasonal factors to get forecasts. This is how time
series decomposition works.
Also trends and dummy’s on the log-scale have nice interpretations. Consider the
model for 4 seasons
log10(yt) = 2 + .014t + .19 Season2t -.13 Season3t + .04 Season4t
• Then the value of the series is increasing 1.4% per period.
• The value in Season2 is about 19% above the what the trend alone predicts for that
season.
(c) Martin L. Puterman
20