Supplement 13A: Partial F Test
Transcription
Supplement 13A: Partial F Test
Supplement 13A: Partial F Test Purpose of the Partial F Test For a given regression model, could some of the predictors be eliminated without sacrificing too much in the way of fit? Conversely, would it be worthwhile to add a certain set of new predictors to a given regression model? The partial F test is designed to answer questions such as these by comparing two linear models for the same response variable. The extra sum of squares is used to measure the marginal increase in the error sum of squares when one or more predictors are deleted from a model. Conversely, the extra sum of squares measures the marginal reduction in the error sum of squares when one or more predictors are added to a model. Eliminating Some Predictors We will start by showing how to assess the effect of eliminating some predictors from a model that contains k predictors. The model containing all the predictors is called the full model: (13A.1) Y = 0 + 1X1 + 2X2 + … + kXk A model with fewer predictors is a reduced model. We estimate the linear regression for each of the two models, and then look at the error sum of squares (SSE) from the ANOVA table for each model. We can use the following notation, assuming that m predictors were eliminated in the reduced model: Full model SSE: Reduced model SSE: Extra SSE: SSEFull SSEReduced SSEReduced – SSEFull df Full = n–k–1 df Reduced = n–k–1+m df=( n–k–1+m) – ( n–k–1) = m The partial F test statistic is the ratio of two variances. The numerator is the difference in error sums of squares (the “extra sum of squares”) between the two models, divided by the number of predictors eliminated. The denominator is the mean squared error for the full model (SSEFull) divided by its degrees of freedom. (13A.2) Fcalc SSEReduced SSEFull m if m predictors are eliminated SSEFull n k 1 Degrees of freedom for this test will then be (m, n–k–1). If only one predictor has been eliminated, then m = 1. We can calculate the p-value for the partial F test using =F.DIST.RT(Fcalc, m, n–k–1). Illustration: Predicting Used Car Prices CarPrice Table 13A.1 shows a data set consisting of 40 observations on prices of used cars of a particular brand and model (hence controlling for an obviously important factor that would affect prices). The response variable is Y (SellPrice) = sale price of the vehicle (in thousands of dollars). We have observations on three potential predictors: X1(Age) = age of car in years, X2 (Mileage) = miles on odometer (in thousands of miles), X3 (ManTran) = 1 if manual transmission, 0 otherwise. The three predictors are viewed as non-stochastic, independent variables (we can later investigate the latter assumption by looking at VIFs, if we wish). TABLE 13A.1 Selling Price and Characteristics of 40 Used Cars CarPrice X1 X2 X3 Y Age Mileage ManTran SellPrice 13 148.599 0 0.370 2 17.367 0 29.810 13 174.904 0 0.390 … … … … 10 145.886 0 11.210 8 93.22 0 12.270 5 75.907 0 19.260 Note: Only the first and last three observations are shown here. The units for SellPrice and Mileage have been adjusted to thousands to improve data conditioning. Eliminating a Single Predictor Let us first test whether the single predictor ManTran could be eliminated to achieve a more parsimonious model than using all three predictors. We are comparing two potential linear regression models: Full model: Reduced model: SellPrice = 0 + 1 Age + 2 Mileage + 3 ManTran SellPrice = 0 + 1 Age + 2 Mileage Here are the ANOVA tables from these two regressions, presented side-by-side: Full Model ANOVA table Source SS Regression 2,334.5984 Error 199.1586 Total 2,533.7570 df 3 36 39 MS 778.1995 5.5322 Reduced Model ANOVA table Source SS df Regression 2,314.3730 2 Error 219.3840 37 Total 2,533.7570 39 MS 1,157.1865 5.9293 The elimination of ManTran increases the sum of squared errors, as you would expect (you have already learned that extra predictors can never decrease the R2.even if they are not significant). Although the predictor ManTran is contributing something to the model’s overall explanatory power (reduced SSE) the question remains whether ManTran is making a statistically significant extra contribution. The calculations are: Full model: Reduced model: Extra SSE: SSEFull =199.1586 df Full = n–k–1 = 40–3–1 = 36 SSEReduced=219.3840 df Reduced = n–k–1+m = 40–3–1+1 = 37 SSEReduced – SSEFull df=(n–k–1+m) – (n–k–1) = 1 Fcalc SSEReduced SSEFull 219.3840 199.1586 20.2254 m 1 3.6559 5.5322 SSEFull 199.1586 n k 1 36 From Excel, we obtain the p-value =F.DIST.RT(3.65559,1,36) = .0639. Therefore, if we are using α = .05, we would say that the extra sum of squares is not significant (i.e., ManTran does not make a significant marginal contribution). Instead of using the p-value, we could reach the same conclusion by comparing Fcalc = 3.6559 with F.05(1,36) =F.INV.RT(0.05,1,36) = 4.114 to draw the same conclusion. In effect, the hypotheses we are testing are: H0: 3 = 0 H1: 3 ≠ 0 The test statistic is not far enough from zero to reject the hypothesis H0: 3 = 0. You may already have realized that if we are only considering the effect of one single predictor, we could reach the same conclusion from its t-statistic in the fitted regression of the full model: Regression output Variables Coefficients Intercept 33.7261 Age -1.6630 Mileage -0.0584 ManTran? -1.6538 Std. Error 0.9994 0.2938 0.0224 0.8650 t (df=36) 33.747 -5.660 -2.610 -1.912 p-value 7.60E-29 1.98E-06 .0131 .0639 In the single predictor case, the partial F test statistic is equal to the square of the corresponding t test statistic in the full model. The t-test uses the same degrees of freedom as the denominator of the partial F test, so the p-values will be the same as long as we use a two-tailed t-test (to eliminate the sign so that rejection in either tail could occur): Predictor ManTran: Excel’s p-value: tcalc2 = (-1.912)2 = 3.656 =T.DIST.2T(1.912,36) = .0639 In effect, the hypotheses we are testing are: H0: 3 = 0 H1: 3 ≠ 0 In the case of a single predictor, we could get by without using the partial F test. It is shown here because it illustrates the test in a simple way, and reveals the connection between F and t distributions. An advantage of the t-test is that it could also be used to test a one-sided hypothesis (e.g., H1: 3 < 0) which might be relevant in the case of this example (all our predictors seem to have an inverse relationship with a car’s selling price). Eliminating More Than One Predictor We now turn to the more general case of using the partial F test to assess the effect of eliminating m predictors simultaneously (where m > 1). This can be especially useful when we have a large model with many predictors that we are thinking of eliminating because their effects seem to be weak in the full model. To test the effects of discarding m predictors at once, the hypotheses are: H0: All the j = 0 for a subset of m predictors in the full model H1: Not all the j = 0 (at least some of the m coefficients are non-zero) For example, suppose we want to know whether we can eliminate both Mileage and ManTran at once. The hypotheses are: H0: 2 = 0 and 3 = 0 H1: One or both coefficients are non-zero The models to be compared are: Full model: Reduced model: SellPrice = 0 + 1 Age + 2 Mileage + 3 ManTran SellPrice = 0 + 1 Age Here are the ANOVA tables from these two regressions, presented side-by-side: Full Model ANOVA table Source SS Regression 2,334.5984 Error 199.1586 Total 2,533.7570 df 3 36 39 MS 778.1995 5.5322 Reduced Model ANOVA table Source SS df Regression 2,269.8421 1 Residual 263.9148 38 Total 2,533.7570 39 MS 2,269.8421 6.9451 The elimination of both Mileage and ManTran increases the sum of squared errors, as you would. The question is whether these two predictors are making a statistically significant extra contribution to reducing the sum of squared errors. The calculations are: Full model: Reduced model: Extra SSE: Fcalc SSEFull =199.1586 df Full = n–k–1 = 40–3–1 = 36 SSEReduced=263.9148 df Reduced = n–k–1+m = 40–3–2+2 = 38 SSEReduced – SSEFull df=(n–k–1+m) – (n–k–1) = m = 2 263.9148 199.1586 64.7562 2 11.7053 5.5322 199.1586 36 From Excel, we obtain the p-value =F.DIST.RT(11.7053,2,36) = .0001. If we are using α = .05, we would say that the extra sum of squares is highly significant (i.e., these two predictors do make a significant marginal contribution). Alternatively, we can compare Fcalc = 11.705 with F.05(2,26) =F.INV.RT(0.05,2,36) = 3.259 to draw the same conclusion. Adding Predictors We have been discussing eliminating predictors. The calculations for adding predictors to a linear model are similar if we define the “full” model as the “big” model (more predictors) and the “reduced” model as the “small” model (fewer predictors). The “extra sum of squares” is still the difference between the two sums of squares: (13A.3) Fcalc SSE for big model SSE for small model Number of extra predictors SSE for big model n k 1 More Complex Models We can use variations on these partial F tests based on error sums of squares for other purposes. For example, we can test whether two coefficients in a model are the same (e.g., 2 = 3) or to calculate the effects of any given predictor given the presence of other sets of predictors in the model (using coefficient of partial determination). Such tests are ordinarily reserved for more advanced classes in statistics, and may entail using more specialized software. Full Results for Car Data CarPrice To allow you to explore the car data on your own, full results are shown below for the full model based on the used car data. SellPrice is negatively affected by Age and Mileage (both highly significant) and marginally so by ManTran (p-value significant at α = .10 but not at α = .05. You can also look at the data file and do your own regressions. Regression Analysis R² Adjusted R² R Std. Error ANOVA table Source Regression Residual Total SS 2,334.5984 199.1586 2,533.7570 0.921 0.915 0.960 2.352 n 40 k 3 Dep. Var. SellPrice df 3 36 39 Regression output Variables Coefficients Std. Error Intercept 33.7261 0.9994 Age -1.6630 0.2938 Mileage -0.0584 0.0224 ManTran? -1.6538 0.8650 MS 778.1995 5.5322 F 140.67 t (df=36) 33.747 -5.660 -2.610 -1.912 p-value 7.60E-29 1.98E-06 .0131 .0639 p-value 6.17E-20 confidence interval 95% lower 95% upper 31.6993 35.7530 -2.2589 -1.0671 -0.1038 -0.0130 -3.4081 0.1004 VIF 6.384 6.371 1.014 It appears that as a car ages, it loses about $1,663 in value per year (ceteris paribus). Similarly, for each extra mile driven, a car loses on average about $58. Cars with manual transmission seem to sell for about $1,654 less than those with automatic transmission (remember, the brand and model are controlled already). There is evidence of multicollinearity between Age and Mileage, which would be expected (as cars get older, they accumulate more miles). This would require further consideration, by the analyst. Section Exercises 13A.1 Instructions: Use α = .05 in all tests. (a) Perform a full linear regression to predict ColGrad% using all eight predictors in DATA SET E shown here. State the SSE and df for the full model. (b) Fit a reduced linear regression model by eliminating predictor Age. State the SSE and df for the reduced model. (c) Calculate the partial F test statistic to see whether predictor Age was significant. (d) Calculate the p-value for the partial F-test. What is your conclusion? (e) Does your conclusion from the partial F test agree with the test using the t-statistic in the full model regression? (e) Fit a reduced regression model by eliminating two predictors Age and Seast simultaneously. State the SSE and df for the reduced model. (f) Calculate the partial F test statistic to see whether predictors Age and Seast can both be eliminated. State your conclusion. References Kutner, Michael H.; Christopher J. Nachtsheim; and John Neter. Applied Linear Regression Models. 4th ed. McGraw-Hill/Irwin, 2004, pp. 256-271.