Regression Analysis
Transcription
Regression Analysis
InVivoStat User Guides – Regression Analysis Version 1.0 May 2015 InVivoStat Regression Analysis Module Tipsheet Page 1 of 15 InVivoStat User Guides – Regression Analysis Version 1.0 May 2015 The Regression Analysis module in InVivoStat is available within the Additional Analyses sub-menu in the Statistics drop-down menu and is entitled ‘Regression Analysis’. The user interface is: The Regression Analysis module performs linear regression and multiple linear regression. The user can fit a model that includes continuous factors, multiple treatment (factorial) factors, other design (blocks) factors and a single covariate. All interactions involving the continuous and treatment factors are included in the statistical model but none of the interactions involving the blocking factors are included. The user can also check the interactions involving the covariate by choosing the ‘Assess covariate interactions’ option in the Output Options window. Page 2 of 15 InVivoStat User Guides – Regression Analysis Version 1.0 May 2015 1 Setting up the model Once the dataset has been opened, the user can select the variables for the analysis by dragging and dropping them from the ‘Available variables’ list into the ‘Response’, ‘Continuous factors’, ‘Treatments (factorial)’, ‘Other design (blocks)’ and ‘Covariate’ boxes. Once selected, the user has the option of applying a transformation to the response variable and the continuous factors, either log10, loge, square root, arcsine or rank. If selected the covariate will be transformed using the same transformation, unless otherwise specified by the user. If a covariate is selected, then the user has the option of selecting the ‘Primary factor’. This factor is used to categorise the scatterplot (produced in the output). The Primary factor should be one of the factors of interest to the experimenter. Page 3 of 15 InVivoStat User Guides – Regression Analysis Version 1.0 May 2015 2 Selecting the analysis options There are several results from the regression analysis that are available to the user. These are selected before running the analysis. These include: 1) ANOVA table Produces overall tests of the effect of the terms in the statistical model. 2) Coefficients If only one continuous factor is selected, the model coefficients (slope and intercept) are automatically generated. These are calculated separately for the combinations of any treatment factors. If more than one continuous factor is selected, then the coefficients of the model parameters can be generated by selecting this option. 3) Adjusted R-squared Allows the user to check how successful the model fit is in explaining the variation in the data. Both the R-squared and Adjusted R-squared statistics are given. 4) Significance level The default is 5%, although this can be changed. Diagnostic plots: 5) Residuals vs. predicted plot Allows the user to check the variance assumption of the parametric analysis. 6) Normal probability plot Allows the user to check the normality assumption of the parametric analysis. 7) Cook’s distance plot Allows the user to check for outliers in the dataset. 8) Leverage plot Allows the user to check the effect of the individual observations on the model. Page 4 of 15 InVivoStat User Guides – Regression Analysis Version 1.0 May 2015 3 Output Results Response and covariate InVivoStat identifies the response being analysed and also the covariate (if one is selected). This section also describes any transformations that have been applied. Scatterplots of the raw data, including best-fit regression lines InVivoStat produces a scatterplots of the raw data. This should be used to identify possible outliers. On the plot the X-axis corresponds to the levels of the continuous factors and the Y-axis corresponds to the response. The plots are categorised by the combinations of the treatment factors. Note the best-fit lines are not related to the statistical model fitted in this module but simply the best-fit lines through the data as given on the plot, i.e. they are not adjusted for any covariate, blocks or unequal sample sizes. Estimates of the coefficients of the best-fit regression lines If a single continuous factor is selected, then a table of the coefficients (slope and intercept) of the best-fit regression lines is given. This table is categorised by the levels of the treatment factors where appropriate. Categorised scatterplot of the raw data (ANCOVA only) When fitting a covariate in a statistical analysis, certain assumptions are made. This plot allows the user to test these assumptions. Underneath the plot is a list of the assumptions and also advice on how the plot can be used to assess them. ANOVA/ANCOVA table The ANOVA/ANCOVA table gives tests of the overall effect of the model terms. InVivoStat presents the Type I model fit within this module. Below the table any statistically significant effects are listed. Table of model coefficients If requested, this table contains the model coefficients. By adding together these coefficients, where appropriate, the user can identify the regression equations. R-squared and Adjusted R-squared statistics If requested the R-squared and Adjusted R-squared statistics are given. Diagnostic plots If requested InVivoStat produces the residuals vs. predicted plot, the normal probability plot, the Cook’s distance plot and the Leverage plot. The residuals plotted on the residuals vs. predicted plot are the standardized residuals as these can provide a test for outliers. Any observation with a residual greater (or less than) 3 could be considered an outlier. Analysis description and references A description of the analysis performed is given. Finally a list of references for the methods applied in the analysis is given. Page 5 of 15 InVivoStat User Guides – Regression Analysis Version 1.0 May 2015 4 Sample output Options: Page 6 of 15 InVivoStat User Guides – Regression Analysis Version 1.0 May 2015 InVivoStat Linear Regression Analysis Response and covariate The Observation response is currently being analysed by the Linear Regression Analysis module, with Baseline observation fitted as a covariate. Scatterplots of the raw data, including best-fit regression lines Note: The best-fit regression lines included on the plot are not adjusted for the covariate. Page 7 of 15 InVivoStat User Guides – Regression Analysis Version 1.0 May 2015 Estimates of the coefficients of the best-fit regression lines Intercept estimate Slope estimate F Control 10.6405 -0.4881 F Treatment -11.3019 0.5872 M Control 6.7366 -0.2998 M Treatment -10.3445 0.5310 Categorisation factor level combinations Note: The estimates of the regression coefficients are not adjusted for the covariate. Covariate plot of the raw data (ignoring continuous factor) Page 8 of 15 InVivoStat User Guides – Regression Analysis Version 1.0 May 2015 Tip: Is it worth fitting the covariate? You should consider the following: a) Is there a relationship between the response and the covariate?... It is only worth fitting the covariate if there is a strong positive (or negative) relationship between them. The lines on the plot should not be horizontal. b) Is the relationship similar for all treatments?... The lines on the plot should be approximately parallel. c) Is the covariate influenced by the treatment?... We assume the covariate is not influenced by the treatment so there should be no separation of the treatment groups along the x-axis on the plot. These issues are discussed in more detail in Morris (1999). Analysis of Covariance (ANCOVA) table Sums of squares Degrees of freedom Mean square Fvalue p-value Baseline observation 0.05 1 0.049 0.48 0.4935 Bodyweight 0.01 1 0.015 0.15 0.7056 Gender 0.03 1 0.025 0.25 0.6213 Treatment group 0.05 1 0.051 0.50 0.4873 Bodyweight * Gender 0.09 1 0.090 0.89 0.3555 Bodyweight * Treatment group 0.53 1 0.530 5.23 0.0318 Gender * Treatment group 0.01 1 0.008 0.08 0.7843 Bodyweight * Gender * Treatment group 0.00 1 0.000 0.00 0.9767 Residuals 2.33 23 0.101 Comment: ANCOVA table calculated using a Type I model fit, see Armitage et al. (2001). Conclusion: There is a statistically significant effect of Bodyweight * Treatment group. Page 9 of 15 InVivoStat User Guides – Regression Analysis Version 1.0 May 2015 Table of model coefficients Estimate Lower 95% CI Upper 95% CI (Intercept) 11.652 -7.503 30.807 Baseline observation 0.220 -0.230 0.670 Bodyweight -0.543 -1.476 0.390 GenderM -3.428 -28.259 21.402 Treatment group -20.657 -53.472 12.158 Bodyweight * GenderM 0.165 -1.046 1.376 Bodyweight * Treatment group 1.010 -0.601 2.622 GenderM * Treatment group 0.502 -39.658 40.662 Bodyweight * GenderM * Treatment group -0.028 -2.001 1.945 Note: These model coefficients can be added together to obtain the model-based estimates of the relationships between the factors and the response, see Chambers and Hastie (1992). R-squared and Adjusted R-squared statistics R-squared Adjusted R-sq Estimate 0.2477 -0.0140 The R-squared is the fraction of the variance explained by the model. A value close to 1 implies the statistical model fits the data well. Unfortunately adding additional variables to the statistical model will always increase R-sq, regardless of their importance. The Adjusted R-sq adjusts for the number of terms in the model and may decrease if over-fitting has occurred. If there is a large difference between R-sq and Adjusted R-sq, then non-significant terms may have been included in the statistical model. Page 10 of 15 InVivoStat User Guides – Regression Analysis Version 1.0 May 2015 Diagnostic plots Tip: On this plot look to see if the spread of the points increases as the predicted values increase. If so the response may need transforming. Tip: Any observation with a residual less than -3 or greater than 3 (SD) should be investigated as a possible outlier. Page 11 of 15 InVivoStat User Guides – Regression Analysis Version 1.0 May 2015 Tip: Check that the points lie along the dotted line. If not then the data may be nonnormally distributed. Page 12 of 15 InVivoStat User Guides – Regression Analysis Version 1.0 May 2015 Cook's distance plot This plot should be used to assess whether there are any potential outliers in the dataset. Observations where the Cook's distance are above the cut-off line should be investigated further. Note the cut-off line has been calculated using the 4/n approach, where n is the number of observations in the dataset. Page 13 of 15 InVivoStat User Guides – Regression Analysis Version 1.0 May 2015 Leverage plot This plot indicates the relative influence of the observations. Observations with a high leverage may be unduly influencing the statistical model. Analysis description The data were analysed using an ANCOVA approach, with continuous factor Bodyweight and treatment factors Gender, Treatment group and Baseline observation as the covariate. For more information on the theoretical approaches that are implemented within this module, see Bate and Clark (2014). Page 14 of 15 InVivoStat User Guides – Regression Analysis Version 1.0 May 2015 Statistical references Bate ST and Clark RA. (2014). The Design and Statistical Analysis of Animal Experiments. Cambridge University Press. Armitage P, Matthews JNS and Berry G. (2001). Statistical Methods in Medical Research. 4th edition; John Wiley & Sons. New York. Chambers JM and Hastie TJ. (1992). Statistical Models in S. Wadsworth and BrooksCole advanced books and software. Morris TR. (1999). Experimental Design and Analysis in Animal Sciences. CABI publishing. Wallingford, Oxon (UK). R references R Development Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org. Barret Schloerke, Jason Crowley, Di Cook, Heike Hofmann, Hadley Wickham, Francois Briatte, Moritz Marbach and Edwin Thoen (2014). GGally: Extension to ggplot2. R package version 0.4.5. http://CRAN.R-project.org/package=GGally Erich Neuwirth (2011). RColorBrewer: ColorBrewer palettes. R package version 1.05. http://CRAN.R-project.org/package=RColorBrewer H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York, 2009. H. Wickham. Reshaping data with the reshape package. Journal of Statistical Software, 21(12), 2007. Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. URL http://www.jstatsoft.org/v40/i01/. Hadley Wickham (2012). scales: Scale functions for graphics. R package version 0.2.3. http://CRAN.R-project.org/package=scales John Fox and Sanford Weisberg (2011). An {R} Companion to Applied Regression, Second Edition. Thousand Oaks CA: Sage. URL: http://socserv.socsci.mcmaster.ca/jfox/Books/Companion Lecoutre, Eric (2003). The R2HTML Package. R News, Vol 3. N. 3, Vienna, Austria. Louis Kates and Thomas Petzoldt (2012). proto: Prototype object-based programming. R package version 0.3-10. http://CRAN.R-project.org/package=proto Page 15 of 15