axis major reduced regression rma
Transcription
axis major reduced regression rma
Scaling in Biology Lecture 2 1/24/07 -Statistical Tool Kit -Regression models -Calculating regression parameters - Critical Assumptions -Importance of error -Alternative regression models What is scaling? How attributes of a system change with changes in dimension - a functional relationship! A change in the basic physical quantities (Mass, Length, Time, Temp) MLT (T) •Changes in organismal mass, length, volume •Changes in spatial extent (area) •Changes in time (temporal dynamics) •Variation in temperature Physical measurements: Energy, density, power, velocity etc. can be broken down to units of MLT Dimensional Analysis Scaling Analysis - Building a tool kit Scaling approach asks: “If I vary X how does Y change?” We are interested in functional relationship between X and Y How do we assess the functional relationship? Many ways Simplest model - linear or power function A central tool in scaling studies is the regression model One Example: How does a change in organismal size influences trait Y? -Measure mass or length (X), measure response variable (Y) -Characterize functional relationship - regression model Plot data . . . . . -relationship non-linear? -transform data . . . . Log transform data (Log10) - relationship a straight line? Why log transform? Statistical justification . . . . Standard parametric statistics assume residual variation is normally distributed. Transform allometric data residuals then tend to be normally distributed. - can use parametric statistics based on gaussian distributions. Fit regression model to characterize functional relationship Deeper issues: Reveals Fundamental Aspects of Biology . . . Some may say that log-log plots “hide important variation” Not a correct statement . . . . On log transformed axes - a constancy of residual variation means that for a given value of X the proportion of variation not explained by Y is proportionally constant (!) This is an important insight! Only by log transformation can one clearly document this Remember: Power-relationships and constant proportions of variance point to importance of multiplicative (not additive) processes in biology. Important - many scaling attributes are non-linear! A log scale allows one to view the multiplicative nature of biological phenomena Biology is often multiplicative and rarely (?) additive . . . A 1g change in the mass of a mouse is large but it is miniscule for an elephant! Regression Tutorial Diameter(cm) Tracheid diam 5 1.2 2 3 7.5 10.2 30 48 57 17 13 9 4.5 2.8 4 3.8 4 5 7 7.3 14 15 16.7 34.8 22 14 17 19 27 30 38 44 45 32 29 26.8 21 16 18 19 20 23 25 28 30 29.5 30 36.8 Allometric study of variation in plant xylem tracheid dimensions Traceid Diam. (um) 50 40 30 20 Stem Diam (cm) Xi Yi Diameter(cm) Tracheid diam 5 1.2 2 3 7.5 10.2 30 48 57 17 13 9 4.5 2.8 4 3.8 4 5 7 7.3 14 15 16.7 34.8 22 14 17 19 27 30 38 44 45 32 29 26.8 21 16 18 19 20 23 25 28 30 29.5 30 36.8 Log10 Diameter Log10 Tracheid Diam 0.69897 0.07918125 0.30103 0.47712125 0.87506126 1.00860017 1.47712125 1.68124124 1.75587486 1.23044892 1.11394335 0.95424251 0.65321251 0.44715803 0.60205999 0.5797836 0.60205999 0.69897 0.84509804 0.86332286 1.14612804 1.17609126 1.22271647 1.54157924 1.34242268 1.14612804 1.23044892 1.2787536 1.43136376 1.47712125 1.5797836 1.64345268 1.65321251 1.50514998 1.462398 1.42813479 1.32221929 1.20411998 1.25527251 1.2787536 1.30103 1.36172784 1.39794001 1.44715803 1.47712125 1.46982202 1.47712125 1.56584782 60 50 40 30 20 10 0 10 Log-Log plot 1.7 50 1.6 40 1.5 1.4 30 1.3 20 1.2 Stem Diam (cm) 60 50 40 30 20 10 0 10 1.1 0 0.5 1 1.5 Log 10 Stem Diameter Statistical Fit of a Regression Model There are several types of linear regression models Model I (Least Squares) Model II (Major Axis, Reduced Major Axis) OLS Bisector Technique Principle components, independent contrasts . . . Each differs in assumptions of where the distribution of ‘error’ resides between the two variables of interest. Important issue for characterizing scaling slope and constant Any functional relationships in biology 2 Slope Model I (or Least Squares i.e. LS) regression Most regression packages are Model I (ordinary least squares) regression OLS - historically has been used - Fit best-fit line through data which passes through mean values of Y and X - OLS regression minimizes the sum of squares of the deviations of observed Y values from the major axis to fit the major axis through average of X and Y. - In OLS regression the deviations of observed values are parallel to the Y-axis of the bivariate plot. Overview of OLS Regression OLS regression - Y on X minimizes (Y ’)2 and (Y ’’)2 OLS regression - X on Y minimizes (X ’)2 and (X ’’)2 Beware of stats packages!!! Good SAS S+/R JMP SPSS Beware . . . Microsoft Excel Canned graphing programs Critical variables for calculating regression parameters Sample Variance Number of X values Standard deviation s X2 = n 1 (xi ! x )2 " n ! 1 i =1 Average of all x values A given x value 1 n s = (yi ! y )2 " n ! 1 i=1 2 Y SX = s 2X n Sample Covariance "(x i SXY = ! x )( yi ! y ) i =1 n !1 Calculate ‘Pearson product moment correlation coeficient’ Sample covariance SXY Sx SY r= Std deviation of X and Y If fit is 100% then r = 1 (no residual variation) Fitted Linear regression model slope intercept Yi = ! LS + " LS Xi Yi = ! LS + "LS Xi + # Empirical measurements have error The LS line ‘chooses’ values of !LS and ß LS that minimizes n $ {Y ! (" i i =1 LS + # LS Xi )} 2 Slope ! LS Y " Y )( X " X ) ( S # = =r S # (X " X ) i i i 2 i Intercept i Y X !LS = Y " #LS X Remember - intercept is log transformed! Tracheid diameter (µm) 1.7 1.6 1.5 1.4 1.3 1.2 y =1.123 + 0.308x r = 0.980 1.1 0 0.5 1 1.5 2 Log 10 Stem Diameter Calculate the Confidence Intervals for slope and intercept (95% CI when alpha = 0.05) Calculate the standard error (SE) SE = Standard Deviation n SX = Degrees of Freedom (df) ! For a regression model df = n - 2 s 2X Calculate Confidence intervals, OLS regression [ CI = [P + t ] SE ] CI = P ! t" (df ) SEP ! (df ) (95% confidence limits t0.05(df)) P P denotes parameter of interest (ßLS, !LS) SE = Standard error of regression parameter Test for ßLS = 0 or ßLS = predicted value of a priori model http://shazam.econ.ubc.ca/intro/critval.htm Will give you values for critical t Assumptions of OLS Regression Values of X do not randomly vary (X is measured without error!) The expected relationship between Y and X is linear (Y = ! + " X). Values of Y for any specified value of X are independently and normally distributed Yi = " + ! Xi + #i -where # is the random deviation of the error term -which is assumed to be normally distributed with a mean value equal to zero (Residual deviation from fitted LS line normally distributed) Samples of Y along the regression line have a common variance ($2 ) that is the variance of #i (the variance is independent of the magnitude of Y or X) Assumes each X value is unique (X varies independently). For each value of Xi however, values of Yi are not fixed by investigator -but- instead vary randomly such that they have a normal distribution about Xi. i.e. Xi is known exactly so that there is no measurement error in Xi. All measurement error is in Yi There are times when OLS assumptions may not hold May be error in measurements of Xi What to do? Alternative regression models Model II regression (or principle axis regressions) (Major Axis, Reduced Major Axis - focus on RMA) -As in OLS regression the criteria for establishing the major axis) through X and Y minimize the sum of squares. -Fit the major axis through Y and X Model II regression (or principle axis regressions) (Major Axis, Reduced Major Axis - focus on RMA) -As in LS regression the criteria for establishing the major axis) through X and Y minimize the sum of squares. -Fit the major axis through Y and X In OLS regression the deviations of observed values are parallel to the Y-axis of the bivariate plot (no error in Xi). Model II regression - the deviations are perpendicular to the regression line established by the major axis regression line (Measurement error likely in values of Xi and Yi). Summary of Regression Models Two major types of Model II Regression (SMA or standard axis regression) Major axis regression Minimizes the sum of the distances (z’)2 and (z’’)2 (MA major axis regression) Reduced major axis regression Minimizes the sum of the products (x`y’) and (x’’y’’) From Warton et al. 2006 (or OLS) (or RMA) (or Major axis regression) Summary Model I (OLS regression) -Deviations of observed values of Y are parallel to the Y -axis (all error on Y axis). Model II (RMA regression etc.) - Deviations are perpendicular to the regression line (error in both Y AND X). Why important? Implications for assessing the functional relationship between X and Y Model II Regression (RMA) Easy to calculate by hand . . . . Slope SX ! RMA = SY Standard deviation of X or Y Intercept ! RMA = Y " # RMA X Confidence intervals Some differing thoughts on this but . . . use same confidence intervals from OLS and adjust to new value for slope and intercept (see Sokal and Rolf) The numerical value of RMA will always be greater than that of the LS scaling exponent because ßRMA = ßOLS/rOLS Remember to use r and not r2! If the r2 value is high then the difference between OLS Model I and Model II regression will be small to negligent. "RMA = "LS/r 1.7 1.6 = 0.308/0.980 1.5 = 0.314 1.4 1.3 1.2 y =1.123 + 0.308x "LS = 0.308 r = 0.980 1.1 0 0.5 1 1.5 Log 10 Stem Diameter 2 OLS Bisector Method Isobe et al. Relation between RMA and First principle component . . OLS bisector performs better than Model II Which regression model to use?? Model I vs. Model II - If error is suspected in X (average size, max size etc.) then Model II is preferable. - If range in X is large (more than 2 orders of magnitude) then OLS regression seems to be fine. However, if less than Two orders of magnitude then at least report both OLS and RMA . . . . RMA is likely preferable. Why? Measurement error in X may be relatively larger than residual variation in Y - If you have no a priori expectation what is the dependent or independent variable then RMA preferable -If the units in both X and Y are the same (i.e. mass vs. mass) then RMA is likely best. - NOTE! if value of r is high choice really does not matter (Body size is usually measured with little error) Which RMA model to use? Read Warton et al. 2006 What about OLS bisector? Stay tuned . . . Hot statisticians are on it. For now . . . When in doubt RMA. Class Regression Exercise in R/S+ http://eeb37.biosci.arizona.edu/~brian/teaching.html Calculate regression models for two datasets (1) Using plant xylem dataset calculate slope and intercept for OLS and RMA by hand Print out spreadsheet of work Can you calculate the 95% CI ? (2) Using U.S. record tree size dataset and utilizing R/S+ calculate OLS, RMA, Bisector regression models. http://eeb37.biosci.arizona.edu/~brian/splus.html Where to learn more - Alternative regression models Software and important links http://web.maths.unsw.edu.au/~dwarton/programs.html http://www.bio.sdsu.edu/pub/andy/rma.html http://cran.r-project.org/src/contrib/Descriptions/smatr.html Ricker, W.E. 1973. Linear regressions in fishery research. Journal of the Fisheries Research Board of Canada. 30:409-434. Warton D.I., Wright I.J., Falster D.S. & Westoby M. (2006) Bivariate line-fitting methods for allometry. !Biological Reviews 81, 259-291.