STAC67H3: Regression Analysis Fall, 2014 Instructor: Jabed Tomal October 30, 2014
Transcription
STAC67H3: Regression Analysis Fall, 2014 Instructor: Jabed Tomal October 30, 2014
STAC67H3: Regression Analysis Fall, 2014 Instructor: Jabed Tomal Department of Computer and Mathematical Sciences University of Toronto Scarborough Toronto, ON Canada October 30, 2014 Jabed Tomal (U of T) Regression Analysis October 30, 2014 1 / 52 Multiple Regression I First-Order Model with Two Predictor Variables A first-order (linear in the predictor variables) regression model with two predictor variables is as follows: Yi = β0 + β1 Xi1 + β2 Xi2 + i ; i = 1, 2, · · · , n, where, Yi is the response in the ith trial, Xi1 and Xi2 are the values of the two predictor variables in the ith trial. The parameters of the regression model are β0 , β1 , and β2 , and the error term is i . Assuming E{i } = 0, the regression function for the above model is E{Y } = β0 + β1 X1 + β2 X2 . Jabed Tomal (U of T) Regression Analysis October 30, 2014 2 / 52 Multiple Regression I First-Order Model with Two Predictor Variables Meaning of Regression Coefficients. The parameter β1 indicates the change in the mean response E{Y } per unit increase in X1 when X2 is held constant. Likewise, the parameter β2 indicates the change in the mean response E{Y } per unit increase in X2 when X1 is held constant. Jabed Tomal (U of T) Regression Analysis October 30, 2014 3 / 52 Multiple Regression I First-Order Model with More than Two Predictor Variables The regression model with p − 1 predictor variables X1 , X2 , · · · , Xp−1 is Yi = β0 + β1 Xi1 + β2 Xi2 + · · · + βp−1 Xi,p−1 + i ; i = 1, 2, · · · , n, is called a first-order model with p − 1 predictor variables. This model can also be written as Yi = β0 + p−1 X βk Xik + i ; i = 1, 2, · · · , n. k =1 or, if we let Xi0 ≡ 1, the above model can be written as Yi = p−1 X βk Xik + i ; i = 1, 2, · · · , n. k =0 Jabed Tomal (U of T) Regression Analysis October 30, 2014 4 / 52 Multiple Regression I First-Order Model with More than Two Predictor Variables Assuming that E{i } = 0, the regression function is written as E{Y } = β0 + β1 X1 + β2 X2 + · · · + βp−1 Xp−1 . Meaning of Regression Coefficients. The parameter βk indicates the change in the mean response E{Y } with a unit increase in the predictor variable Xk , when all other predictor variables in the regression model are held constant. Jabed Tomal (U of T) Regression Analysis October 30, 2014 5 / 52 Multiple Regression I General Linear Regression Model The general linear regression model, with normal error terms, is defined as Yi = β0 + β1 Xi1 + β2 Xi2 + · · · + βp−1 Xi,p−1 + i ; i = 1, 2, · · · , n where: β0 , β1 , · · · , βp−1 are parameters Xi1 , Xi2 , · · · , Xi,p−1 are known constants i are independent N(0, σ 2 ). Jabed Tomal (U of T) Regression Analysis October 30, 2014 6 / 52 Multiple Regression I General Linear Regression Model Letting X0i ≡ 1, the general linear regression model is written as Yi = p−1 X βk Xik + i k =0 Since, E{i } = 0, the regression function is written as E{Y } = β0 + β1 X1 + β2 X2 + · · · + βp−1 Xp−1 . Jabed Tomal (U of T) Regression Analysis October 30, 2014 7 / 52 Multiple Regression I General Linear Regression Model The general linear regression model with normal error terms implies that the observations Yi are independent normal variables, with mean E{Yi } = β0 + β1 Xi1 + β2 Xi2 + · · · + βp−1 Xi,p−1 and constant variance σ 2 . Jabed Tomal (U of T) Regression Analysis October 30, 2014 8 / 52 Multiple Regression I General Linear Regression Model The general linear regression model encompasses a vast variety of situations. 1. p − 1 Predictor Variables. When X1 , X2 , · · · , Xp−1 represent p − 1 different predictor variables, the general linear regression model is a first-order model in which there are no interaction effects between the predictor variables. Jabed Tomal (U of T) Regression Analysis October 30, 2014 9 / 52 Multiple Regression I General Linear Regression Model 2. Qualitative Predictor Variables. The general linear regression model includes not only quantitative predictor variables but also qualitative predictor variables. Jabed Tomal (U of T) Regression Analysis October 30, 2014 10 / 52 Multiple Regression I General Linear Regression Model 2. Qualitative Predictor Variables. Consider the first-order regression model Yi = β0 + β1 Xi1 + β2 Xi2 + i where: Xi1 = patient’s age Xi2 = 1 if patient is female 0 if patient is male The response function of the regression model is E{Y } = β0 + β1 X1 + β2 X2 . Jabed Tomal (U of T) Regression Analysis October 30, 2014 11 / 52 Multiple Regression I General Linear Regression Model 2. Qualitative Predictor Variables. For male patients, X2 = 0 and response function becomes: E{Y } = β0 + β1 X1 . For female patients, X2 = 1 and response function becomes: E{Y } = (β0 + β2 ) + β1 X1 . The two response functions represent parallel straight lines with different intercepts. In general, we represent a qualitative variable with c classes by means of c − 1 indicator variables. Jabed Tomal (U of T) Regression Analysis October 30, 2014 12 / 52 Multiple Regression I General Linear Regression Model 3. Polynomial Regression. Polynomial regression models contain squared and higher-order terms of the predictor variable(s). An example of polynomial regression model with one predictor is as follows Yi = β0 + β1 Xi + β2 Xi2 + i . Jabed Tomal (U of T) Regression Analysis October 30, 2014 13 / 52 Multiple Regression I General Linear Regression Model 4. Transformed Variables. Models with transformed variables are special cases of the general linear regression model. Consider the following model with a transformed Y variable log Yi = β0 + β1 Xi1 + β2 Xi2 + β3 Xi3 + i . Letting Yi0 = log Yi , we get the following model Yi0 = β0 + β1 Xi1 + β2 Xi2 + β3 Xi3 + i . which is in the form of general linear regression model. Jabed Tomal (U of T) Regression Analysis October 30, 2014 14 / 52 Multiple Regression I General Linear Regression Model 5. Interaction Effects. The general linear regression model includes models with interaction effects (the effect of one predictor variable depends on the levels of other predictor variables). An example with two predictor variables X1 and X2 is Yi = β0 + β1 Xi1 + β2 Xi2 + β3 Xi1 Xi2 + i . Letting Xi3 = Xi1 Xi2 , we get the following model Yi = β0 + β1 Xi1 + β2 Xi2 + β3 Xi3 + i . which is in the form of general linear regression model. Jabed Tomal (U of T) Regression Analysis October 30, 2014 15 / 52 Multiple Regression I General Linear Regression Model 6. Combination of Cases. A general linear regression model may combine several of the elements we have just noted. Consider the following regression model 2 2 Yi = β0 + β1 Xi1 + β2 Xi1 + β3 Xi2 + β4 Xi2 + β5 Xi1 Xi2 + i . 2 , Z = X , Z = X 2 , Z = X X , we Letting Zi1 = Xi1 , Zi2 = Xi1 i3 i2 i4 i5 i1 i2 i2 get the following model Yi = β0 + β1 Zi1 + β2 Zi2 + β3 Zi3 + β4 Zi4 + β5 Zi5 + i . which is in the form of general linear regression model. Jabed Tomal (U of T) Regression Analysis October 30, 2014 16 / 52 Multiple Regression I General Linear Regression Model Meaning of Linear in General Linear Regression Model. The term linear model refers to the fact the general linear regression model is linear in the parameters; it does not refer to the shape of the response surface. A regression model is linear in the parameters when it can written in the form: Yi = β0 ci0 + β1 ci1 + β2 ci2 + · · · + βp−1 ci,p−1 + i . where the terms ci0 , ci1 , etc., are coefficients involving the predictor variables. Jabed Tomal (U of T) Regression Analysis October 30, 2014 17 / 52 Multiple Regression I General Linear Regression Model in Matrix Terms: We write the general linear regression model in matrix terms as following Y = X n×1 β + n×p p×1 n×1 where The vector of response is Y1 Y2 Y = . n×1 .. Yn Jabed Tomal (U of T) Regression Analysis October 30, 2014 18 / 52 Multiple Regression I General Linear Regression Model in Matrix Terms: The matrix of constants is 1 X11 X12 · · · X1,p−1 1 X21 X22 · · · X2,p−1 X = . .. .. .. . . . n×p . . . . . 1 Xn1 Xn2 · · · Xn,p−1 The vector of parameters is β0 β1 β2 .. . β = p×1 βp−1 Jabed Tomal (U of T) Regression Analysis October 30, 2014 19 / 52 Multiple Regression I General Linear Regression Model in Matrix Terms: The error vector is 1 2 =. n×1 .. n which contains independent normal random variables with expectation E{} = 0 Jabed Tomal (U of T) Regression Analysis October 30, 2014 20 / 52 Multiple Regression I General Linear Regression Model in Matrix Terms: and variance-covariance matrix: 2 σ 0 0 ··· 0 0 σ2 0 · · · 0 0 0 σ2 · · · 0 Var{} = = σ2 I n×n . .. .. . . . n×n .. . .. . . 0 0 0 · · · σ2 Jabed Tomal (U of T) Regression Analysis October 30, 2014 21 / 52 Multiple Regression I General Linear Regression Model in Matrix Terms: The random vector Y has expectation: E(Y) = Xβ n×1 n×1 and the variance-covariance matrix of Y is the same as that of Var{Y} = σ 2 I n×n Jabed Tomal (U of T) Regression Analysis n×n October 30, 2014 22 / 52 Multiple Regression I Estimation of Regression Coefficients: The least squares criterion for the general linear regression model is Q = (Y − Xβ)T (Y − Xβ) The least squares estimators are those values of β0 , β1 , · · · , βp−1 that minimize Q. The least squares normal equations for the general linear regression model are: XT Xb = XT Y Jabed Tomal (U of T) Regression Analysis October 30, 2014 23 / 52 Multiple Regression I Estimation of Regression Coefficients: and the least squares estimators are: b0 b1 −1 b2 XT X Y b = = XT X .. p×1 p×p p×p . bp−1 Jabed Tomal (U of T) Regression Analysis October 30, 2014 24 / 52 Multiple Regression I Estimation of Regression Coefficients: The maximum likelihood estimator of β can be obtained by maximizing the following likelihood function with respect to β 1 1 T 2 L(β, σ ) = exp − 2 (Y − Xβ) (Y − Xβ) 2σ (2πσ 2 )n/2 The maximum likelihood estimator of β is the same as the least squares estimator of β: −1 ˆ = XT X β XT X Y p×1 Jabed Tomal (U of T) p×p Regression Analysis p×p October 30, 2014 25 / 52 Multiple Regression I Fitted Values and Residuals ˆi be denoted by Y: ˆ Let the vector of the fitted values Y ˆ Y1 Y ˆ2 ˆ = Y . n×1 .. ˆn Y ˆi be denoted by e: and the vector of the residual terms ei = Yi − Y e1 e2 ˆ = e .. . n×1 en Jabed Tomal (U of T) Regression Analysis October 30, 2014 26 / 52 Multiple Regression I Fitted Values and Residuals In matrix notation, we have: ˆ = X Y n×1 b n×p p×1 and ˆ = Y − X e = Y − Y n×1 Jabed Tomal (U of T) n×1 n×1 n×1 Regression Analysis b n×p p×1 October 30, 2014 27 / 52 Multiple Regression I Fitted Values and Residuals ˆ can be expressed in terms of the hat The vector of fitted values Y matrix H as follows: ˆ = X X0 X Y −1 X0 Y or, equivalently: ˆ = H Y Y H = X X0 X −1 n×1 n×n n×1 where: n×n Jabed Tomal (U of T) Regression Analysis X0 October 30, 2014 28 / 52 Multiple Regression I Fitted Values and Residuals The vector of residuals can be expressed as: e = (I − H) Y n×1 Jabed Tomal (U of T) Regression Analysis October 30, 2014 29 / 52 Multiple Regression I Fitted Values and Residuals The variance-covariance matrix of the residuals is: σ 2 {e} = σ 2 (I − H) n×n which is estimated by: s2 {e} = MSE (I − H) n×n Jabed Tomal (U of T) Regression Analysis October 30, 2014 30 / 52 Multiple Regression I Analysis of Variance The sum of squares for the analysis of variance in matrix terms are 1 1 SST = Y0 Y − Y0 JY = Y0 I − J Y n n SSE = e0 e = (Y − Xb)0 (Y − Xb) = Y0 Y − b0 X0 Y = Y0 [I − H] Y 1 1 0 0 SSR = b X Y − Y JY = Y H − J Y n n 0 0 where Jabed Tomal (U of T) 1 1 ··· 1 1 1 · · · 1 J = . . . .. . . . n×n . . . . 1 1 ··· 1 Regression Analysis October 30, 2014 31 / 52 Multiple Regression I Analysis of Variance Table: ANOVA Table for General Linear Regression Model. Source of Variation Regression Error Total Jabed Tomal (U of T) SS SSR = b0 X0 Y − 1 n Y0 JY SSE = Y0 Y − b0 X0 Y SST = Y0 Y − 1 n Y0 JY Regression Analysis df p−1 MS MSR = SSR p−1 n−p MSE = SSR n−p n−1 October 30, 2014 32 / 52 Multiple Regression I F Test for Regression Relation We set the following hypotheses to test whether there is a regression relation between the response variable Y and the set of X variables X1 , X2 , · · · , Xp−1 : H0 : β1 = β2 = · · · = βp−1 = 0 versus HA : not all βk (k = 1, 2, · · · , p − 1) equal zero. We use the following test statistics MSR . MSE Reject H0 at α level of significance if F∗ = F ∗ > F (1 − α; p − 1, n − p). Jabed Tomal (U of T) Regression Analysis October 30, 2014 33 / 52 Multiple Regression I Coefficient of Multiple Determination The coefficient of multiple determination, denoted by R 2 , is defined as R2 = SSR SSE =1− SST SST which ranges from 0 to 1. R 2 assumes the value 0 when all bk = 0(k = 1, 2, · · · , p − 1), and the value 1 when all Y observations ˆi for all i. fall directly on the fitted regression surface, i.e., when Yi = Y The adjusted coefficient of multiple determination, denoted by Ra2 , adjusts R 2 by dividing each sum of squares by its associated degrees of freedom: n − 1 SSE 2 Ra = 1 − n − p SST Jabed Tomal (U of T) Regression Analysis October 30, 2014 34 / 52 Multiple Regression I Inferences about Regression Parameters The least squares and maximum likelihood estimators in b are unbiased: E{b} = β The variance-covariance matrix of b: σ 2 {b0 } σ{b0 , b1 } σ{b1 , b0 } σ 2 {b1 } σ 2 {b} = .. .. p×p . . · · · σ{b0 , bp−1 } · · · σ{b1 , bp−1 } .. .. . . 2 σ{bp−1 , b0 } σ{bp−1 , b1 } · · · σ {bp−1 } Jabed Tomal (U of T) Regression Analysis October 30, 2014 35 / 52 Multiple Regression I Inferences about Regression Parameters In short: σ 2 {b} = σ 2 X0 X −1 p×p Jabed Tomal (U of T) Regression Analysis October 30, 2014 36 / 52 Multiple Regression I Inferences about Regression Parameters The estimated variance-covariance matrix of b: s2 {b0 } s{b0 , b1 } · · · s{b0 , bp−1 } s{b1 , b0 } s2 {b1 } · · · s{b1 , bp−1 } 2 s {b} = .. .. .. . . . p×p . . . 2 s{bp−1 , b0 } s{bp−1 , b1 } · · · s {bp−1 } is given by: −1 s2 {b} = MSE X0 X p×p Jabed Tomal (U of T) Regression Analysis October 30, 2014 37 / 52 Multiple Regression I Tests for βk The null and alternative hypotheses are: H0 : βk = 0 versus HA : βk 6= 0 The test statistic is: t∗ = bk s{bk } Decision Rule: Reject H0 at α level of significance if |t ∗ | > t(1 − α/2; n − p). Jabed Tomal (U of T) Regression Analysis October 30, 2014 38 / 52 Multiple Regression I Interval Estimation of βk For the normal error general linear regression model bk − βk ∼ t(n − p) ; k = 0, 1, · · · , p − 1 s{bk } Hence, the 100(1 − α)% confidence interval is: bk ± t(1 − α/2; n − p)s{bk } Jabed Tomal (U of T) Regression Analysis October 30, 2014 39 / 52 Multiple Regression I Joint Inferences If g parameters are to be estimated jointly (where g ≤ p), the confidence limits with family confidence coefficient 1 − α are: bk ± B s{bk } where B = t(1 − α/2g; n − p) Jabed Tomal (U of T) Regression Analysis October 30, 2014 40 / 52 Multiple Regression I Interval Estimation of E{Yh } To estimate the mean response at Xh1 , Xh2 , · · · , Xh,p−1 , let us define the vector: 1 Xh1 Xh2 Xh = .. p×1 . Xh,p−1 The mean response to be estimated is: E{Yh } = X0h β The estimated mean response corresponding to Xh is ˆ h = X0 b Y h Jabed Tomal (U of T) Regression Analysis October 30, 2014 41 / 52 Multiple Regression I Interval Estimation of E{Yh } The estimator is unbiased: ˆ h } = X0 β = E{Yh } E{Y h and its variance is: ˆ h } = σ 2 X0 (X0 X)−1 Xh σ 2 {Y h ˆ h can be expressed as a function of σ 2 {b} The variance of Y ˆ h } = X0 σ 2 {b}Xh σ 2 {Y h Jabed Tomal (U of T) Regression Analysis October 30, 2014 42 / 52 Multiple Regression I Interval Estimation of E{Yh } ˆ h in matrix notation is The estimated variance of Y ˆ h } = MSE X0 (X0 X)−1 Xh = X0 s2 {b}Xh s2 {Y h h The 1 − α confidence limits for E{Yh } are: ˆh ± t(1 − α/2; n − p)s{Y ˆh } Y Jabed Tomal (U of T) Regression Analysis October 30, 2014 43 / 52 Multiple Regression I Prediction for New Observation Yn(new) The 1 − α confidence limits for a new observation Yn(new) corresponding to Xh , the specified values of the X variables, are: ˆh ± t(1 − α/2; n − p)s{pred} Y where: s2 {pred} = MSE Jabed Tomal (U of T) 1 + X0h (X0 X)−1 Xh Regression Analysis October 30, 2014 44 / 52 Multiple Regression I Prediction for g New Observation Yn(new) Simultaneous Scheffe prediction limits for g new observations at g different levels Xh with family confidence coefficient 1 − α are given by: ˆh ± S s{pred} Y where: S 2 = g F (1 − α; g, n − p) Jabed Tomal (U of T) Regression Analysis October 30, 2014 45 / 52 Multiple Regression I Prediction for g New Observation Yn(new) Alternatively, Bonferroni simultaneous prediction limits for g new observations at g different levels Xh with family confidence coefficient 1 − α are given by: ˆh ± B s{pred} Y where: B = t(1 − α/2g; n − p) Jabed Tomal (U of T) Regression Analysis October 30, 2014 46 / 52 Multiple Regression I Diagnostics and Remedial Measures Most of the diagnostic procedures for simple linear regression that we described carry over directly to multiple regression. Box plots, sequence plots, stem-and-leaf plots, and dot plots for each of the predictor variables and for the response variable can provide helpful, preliminary univariate information about these variables. Scatter Plot Matrix: Scatter plots of the response variable against each predictor variable can aid in determining the nature and strength of the bivariate relationships between each of the predictor variables and the response. Jabed Tomal (U of T) Regression Analysis October 30, 2014 47 / 52 Multiple Regression I Diagnostics and Remedial Measures Residual Plots: A plot of the residuals against the fitted values is useful for assessing the appropriateness of the multiple regression function and the constancy of the variance of the error terms, as well as providing information about outliers. A plot of the residuals against time or against some other sequence can provide diagnostic information about possible correlation between the error terms in multiple regression. Box plots and normal probability plots of the residuals are useful for examining whether the error terms are reasonably normally distributed. Plots of residuals against each of the predictor variables can provide further information about the adequacy of the regression function with respect to that predictor variable. Jabed Tomal (U of T) Regression Analysis October 30, 2014 48 / 52 Multiple Regression I Diagnostics and Remedial Measures Residuals should also be plotted against important predictor variables that were omitted from the model, to see if the ommitted variables have substantial additional effects on the response variable that have not yet been recognized in the regression model. Residuals should be plotted against interaction terms to check if potential interaction effects have not included in the regression model. A plot of the absolute residuals or the squared residuals against the fitted values is useful for examining the constancy of the variance of the error terms. Jabed Tomal (U of T) Regression Analysis October 30, 2014 49 / 52 Multiple Regression I Diagnostics and Remedial Measures If nonconstancy is detected, a plot of the absolute residuals or the squared residuals against each of the predictor variables may identify one or several of the predictor variables to which the magnitude of the error variability is related. Breusch-Pagan Test for Constancy of Error Variance: The Breusch-Pagan test for constancy of the error variance in multiple regression is carried out exactly the same as for simple linear regression when the error variance increases or decreases with one of the predictor variables. If the error variance is assumed to be related to q ≥ 1 predictor variables, the chi-squared test statistic involves q degrees of freedom. Jabed Tomal (U of T) Regression Analysis October 30, 2014 50 / 52 Multiple Regression I Example Exercise 6.5. Brand Preference. In a small-scale experimental study of the relation between degree of brand liking (Y ) and moisture content (X1 ) and sweetness (X2 ) of the product, the following results were obtained from the experiment bases on a completely randomized design (data are coded): i: Xi1 : Xi2 : Yi : Jabed Tomal (U of T) 1 4 2 64 2 4 4 73 3 4 2 61 ··· ··· ··· ··· Regression Analysis 14 10 4 95 15 10 2 94 16 10 4 100 October 30, 2014 51 / 52 Multiple Regression I Example 1 Obtain the scatter plot matrix and the correlation matrix. What information do these diagnostic aids provide here? 2 Fit regression model Yi = β0 + β1 Xi1 + β2 Xi2 + i to the data. State the estimated regression function. How is b1 interpreted here? Jabed Tomal (U of T) Regression Analysis October 30, 2014 52 / 52