Overview Sample Commands 10/9/2011 Lab 5: Multiple regression and non‐
Transcription
Overview Sample Commands 10/9/2011 Lab 5: Multiple regression and non‐
10/9/2011 Overview • Multiple Linear Regression • Non‐Linear Regression Lab 5: Multiple regression and non‐ linear estimation (avoiding the polynomial) – Polynomials – Specific curves 1 2 Sample Commands Command graph matrix x1 x2 x3 y,half regress y x1 x2 x3 predict newvar, resid predict newvar Command Draws a scatterplot matrix. Performs ordinary least squares (OLS) regression of variable y on several independent variables x1, x2, and x3. Generates a new variable (e) equal to the residuals obtained from the most recent regression run in STATA The predicted values of y are placed in the new variable newvar The residuals are placed in the variable newvar The standardized residuals are placed in the variable newvar The studentized residuals are placed in the variable newvar The leverage values are placed in the variable newvar Creates a new variable, newvar, containing the Cook’s D influence measures (see below) The Dfits influence measures are placed in the variable newvar predict newvar, resid predict newvar, rstandard predict newvar, rstudent predict newvar, hat predict newvar, cooksd predict newvar, dfits Overview nl exp2 y x nl (weight = ({b0} * age)/({b1} + age)), initial(b0 50 b1 20) Uses iterative nonlinear least squares to fit a 2‐ parameter exponential growth model (building in function exp2) Use iterative nonlinear least squares to fit a custom model (in this case a Michaelis‐Menton equation) with specified initial parameter values. Draws a residual versus fitted (predicted values) plot, automatically based on the most recent regression. Graphs the residuals against the values of the predictor variable x1 rvfplot rvpplot x1 graph twoway scatter e yhat, yline(0) hettest Sample Commands Overview Draws a residual versus predicted values plot using the variables e and yhat. A horizontal line is drawn at y = 0. Performs Cook and Weisberg’s test for heteroskedasticity. 3 Multiple Regression 4 Assessing fit • graph matrix latitude elevation antdensity, half • regress antdensity latitude elevation • rvfplot, yline(0) Residuals Potential outliers Leverage points Influential points Homogeneity of residual variance 5 6 1 10/9/2011 Variance of residuals Multicolinearity • vif • hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of antdensity chi2(1) Prob > chi2 = = Variable | VIF 1/VIF -------------+---------------------elevation | 1.03 0.968050 latitude | 1.03 0.968050 -------------+---------------------Mean VIF | 1.03 3.70 0.0544 7 Multicolinearity 8 Multicolinearity • collin elevation latitude • collin elevation latitude Collinearity Diagnostics Collinearity Diagnostics SQRT RVariable VIF VIF Tolerance Squared ---------------------------------------------------elevation 1.03 1.02 0.9681 0.0319 latitude 1.03 1.02 0.9681 0.0319 ---------------------------------------------------Mean VIF 1.03 SQRT RVariable VIF VIF Tolerance Squared ---------------------------------------------------elevation 1.03 1.02 0.9681 0.0319 latitude 1.03 1.02 0.9681 0.0319 ---------------------------------------------------Mean VIF 1.03 Cond Eigenval Index --------------------------------1 2.7708 1.0000 2 0.2289 3.4792 3 0.0003 97.1188 --------------------------------Condition Number 97.1188 Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept) Det(correlation matrix) 0.9681 Cond Eigenval Index --------------------------------1 2.7708 1.0000 2 0.2289 3.4792 3 0.0003 97.1188 --------------------------------Condition Number 97.1188 Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept) Det(correlation matrix) 0.9681 9 Evaluation 10 Nonlinear Regression • Write out the equation for the multiple regression that you have just fit. • Are there influential points (i.e., ABS(DFFITS > 2*sqrt(k/n)))? • Are there points with high leverage (point with leverage [H] greater than (2k+2)/n) that you should be concerned about? • Do the studentized residuals identify any potential outliers (>±2)? • Is the response variable (ANTDENSITY) linearly related to both independent variables? • What do you conclude about the multiple regression? 11 • We are going to attempt to fit two curves to these data. The first, a quadratic (2nd‐order polynomial) will have the form Y = a + b*X + c*X2 • The second will be a Michaelis Menton equation with an intercept: Y = a + (c*X)/(d+X) 12 2 10/9/2011 Interpreting parameters Fitting parameters Equation Parameter Initial Value Y = a + b*X + c*X^2 a 2 b c 1 ‐0.1 a 5 c d 50 35 Y = a + (c*X)/(d+X) 13 Fitting the curve 14 ‘Automatic’ commands don’t work • Note that you will have to use predict commands to create the residuals and predicted values for plotting because commands like rvpplot, rvfplot, hettest commands do not work after the nl command. • For example to view a plot of residuals versus predicted values then you could use the following commands: • predict residuals, resid • predict yhat • scatter residuals yhat • nl (weight = {a} + {b}*age + {c}*age^2) • nl (weight = {b0} + {b1}*age + {b2}*age^2), initial (b0 2 b1 1 b2 ‐1) • nl (weight = ({a} + {c} * age)/({d} + age)), initial(a 10 c 50 d 20) 15 16 Evaluation • In completing your assessment of the polynomial and Michaelis‐ Menton equations you should be able to answer the following questions: • What is the equation for the polynomial that you fit to the data? • What are your observations about the appropriateness of the polynomial model (fit, residuals, etc.)? • Write out the Michaelis‐Menton equation that you fit to the data. • What are your observations about the appropriateness of the Michaelis Menton model (fit, residuals, etc.)? • Based on fit (estimates of R2 – although you can just compare the Regression SS because the TSS will be the same), which model is best? Is the statistically ‘best’ model the one that you would use (i.e., best biological interpretation)? Why or why not? 17 3