Regression Analysis
Transcription
Regression Analysis
Beasley BST 622 Assignment 2 – 200 points Spring 2015 1. Based on Graveley and Littlefield (1992), a researcher conducted a study to determine the cost of J = 3 prenatal clinical staffing models: (j = 1) Physician-based; (j = 2) Mixed (M.D., R.N.) staffing; and (j = 3) Clinical Nurse Specialist with physician available for consultation. The subjects were women, 18 years of age or older, who obtained prenatal care at one of the three facilities, and who had delivered within 48 hours of the interview. The cost was defined as the amount of money billed over and above the amount covered by the patient’s health insurance (AMNT). These values were also converted to Ranks (RAMNT). There are a set of dummy-coded indicator variables (Xj) that represent group membership. The data are in file BST622-ASSN2-AMNT.xls. Use either SPSS (Analyze-Regression-Linear); SAS (PROC REG) or JMP (Fit Model); to compute the regression solution (Y = b0 + b1X1 + b2X2) for regressing AMNT and RAMNT (Dependent, Response, Y variable) on to X (Independent, Factor, X variable). (A). What is the Model R2? (B). What are test results of the Regression Models? AMNT R2 = _______ AMNT F-ratio p-value RAMNT R2 = _______ F-ratio (2 points). RAMNT p-value (4 points) (C). For AMNT, What are the values of b0? ________ (1 point) b1? ________ (1 point) b2? ________ (1 point) (D). For RAMNT, What are the values of b0? ________ (1 point) b1? ________ (1 point) b2? ________ (1 point) (E). What is the predicted value for X1 = 1? AMNT _______ RAMNT ______ (2 points) (F). What is the predicted value for X2 = 1? AMNT _______ RAMNT ______ (2 points) (G). For AMNT, How would you interpret the regression intercepts (b0)? (2 points) (H). For AMNT, How would you interpret the regression slopes (b1 and b2)? (3 points) (I). For AMNT, symbolic notation express the OMNIBUS NULL HYPOTHESIS of the REGRESSION MODEL in terms of SLOPES. (2 points) (J). What is the null hypothesis in (1I) above mean in words? (2 points) 1 Beasley BST 622 Assignment 2 – 200 points Spring 2015 2. Perform a standard J=3 one-way Analysis of Variance (ANOVA) with AMNT as the dependent variable. SPSS: Use Analyze-General Linear Model-Univariate and Select AMNT as the Dependent Variable and Group as Fixed Factor Select Options – Descriptive statistics Estimates of Effect Size Observed Power Parameter Estimates Residual Plot Select Post-Hoc and Move Group to the Post Hoc Tests for box and Select the LSD, Tukey, Sidak, and Bonferroni JMP: Change the X variable to be Nominal, then Use Analyze-Fit Y by X Under the Oneway Analysis Banner select the Means/Anova/Pooled t and Means and Std Dev Compare Means – All Pairs, Tukey HSD options SAS: proc glm data=amnt;class group;model amnt = group; means group /lsd; means group /tukey cldiff;means group /sidak;means group /bon; RUN; (A). Complete the ANOVA Source Table with SS, df, MS, and F, and p-value for AMNT. Source SS df Between ______ ___ MS F p-value _____ ____ _____ (4 points) Within _______ ___ _____ ___________________________________________________________________________ Total ________ ___ (B). For the analysis reported in 2A, express the OMNIBUS NULL HYPOTHESIS ONE-WAY ANOVA in terms of MEANS. (2 points) (C). Explain how are the H0 in (2B) and the H0 in (1I) equivalent? (3 points) (D). What is the Model R2 For the one-way ANOVA reported in 2A? R2 = _______ (2 points) (E). Explain how the R2 value in 2D relates to the R2 value for AMNT in 1A? (3 points) (F). Report the Mean Difference and Tukey HSD Simultaneous Confidence Interval for each pairwise comparison. (5 points) Mean Diff Lower Bound Upper Bound 1 vs 2 1 vs 3 2 vs 3 3. Answer the following Power Analysis Questions: Use SAS PROC POWER Code. For Omnibus F test For contrasts: proc power; onewayanova test=overall alpha = 0.05 groupmeans = 760.100937 | 832 | 763 stddev = 209.2012 ntotal = . power = .80; run; proc power; onewayanova test=contrast alpha = 0.05 contrast = (1 -1 0) groupmeans = 760.100937 | 832 | 763 stddev = 209.2012 ntotal = . power = .80; run; 2 Beasley BST 622 Assignment 2 – 200 points Spring 2015 (A). Holding AMNT results (Means and SDs) constant and assuming equal sample sizes, what would a future Total Sample Size (N) need to be for the omnibus F-test to have 80% Power (1 – = 0.80) at a two-tailed = 0.05.? (3 points) (B). Holding AMNT results (Means and SDs) constant and assuming equal sample sizes, what would a future Total Sample Size (N) need to be for the MAXIMUM PAIRWISE DIFFERENCE to have 80% Power (1 – = 0.80) at a two-tailed = 0.05 for Fisher’s LSD (no adjustment for multiple testing)? (3 points) (C). Holding AMNT results (Means and SDs) constant and assuming equal sample sizes, what would a future Total Sample Size (N) need to be for the MAXIMUM PAIRWISE DIFFERENCE to have 80% Power (1 – = 0.80) at a two-tailed = 0.05 after Bonferroni adjustment for multiple testing? (3 points) Bonferroni adjusted α = __________ (D). Holding AMNT results (Means and SDs) constant and assuming equal sample sizes, what would a future Total Sample Size (N) need to be for the MINIMUM PAIRWISE DIFFERENCE to have 80% Power (1 – = 0.80) at a two-tailed = 0.05 after Bonferroni adjustment for multiple testing? (3 points) Bonferroni adjusted α = __________ 4. Perform a standard J=3 one-way Analysis of Variance (ANOVA) with RAMNT as the dependent variable. (A). What is the Model R2? R2 = _______ (2 points) (C). Report the Mean Difference and Tukey HSD Simultaneous Confidence Interval for each pairwise comparison. (5 points) Mean Diff Lower Bound Upper Bound 1 vs 2 1 vs 3 2 vs 3 5. Nonparametric Alternative JMP: Change the X variable to be Nominal, then Use Analyze-Fit Y by X Under the Oneway Analysis Banner select the Nonparametric Wilcoxon Test SPSS: Use Analyze-Nonparametric Tests and K Independent Samples. Place AMNT and RAMNT in the Test Variables List and group as the Grouping variable and Define the range from 1 to 3. Select Descriptives option. SAS: Use PROC NPAR1WAY; CLASS Group; VAR AMNT RAMNT; RUN; and (A). What are results of the Kruskal-Wallis Test? 2 AMNT p-value 2 RAMNT p-value (2 points) (B). Divide the 2 value by (N-1): 2/(N-1) = ____________ (2 points) (C). How does the value form 5B relate to the R2 values from 4A and 1A for RAMNT? (3 points) 6. Do you think there is a causal relationship between these variables? Explain. (2 points) 3 Beasley BST 622 Assignment 2 – 200 points Spring 2015 7. Write a brief Results section that explains which Staffing Models are significantly different and if any should be preferred over another. The Results section should usually report inferential tests and effect magnitudes in the text, while descriptive statistics should be reported in a table. (7 points) 8. In a replication of Schwartz & Bronikowski (2013) a researcher randomizes two strains of mice to two laboratory conditions. One strain typically lives at lower elevation lakeshore sites and are denoted as fastliving (L-fast); they have earlier maturation at larger body size, higher reproductive rate and decreased longevity, relative to the slow-living strain (M-slow) that live in higher elevation mountain meadows. On the day of the experiment, Control animals were moved to a 27°C incubator (CONT27), and treatment animals were put under a “Heat Stress” condition in an incubator set at 37°C incubator where they were maintained for 2 hours (HEAT37). Subsequently the liver gene expression of GPX1 was measured for all mice. GPX1 codes for glutathione peroxidase, one of the most important antioxidant enzymes in humans. The data are in BST622-Assn2-Mice2x2.xls. (A). Perform a 2 x 2 (Heat by Strain) two-factor ANOVA. JMP: Use Analyze-Fit Model Select GPX1 as the Y variable Select HTCond and Strain as Model Effects by Clicking Add To create an Interaction term Select HTCond in the Select Columns variable list then Select Strain in the Model Effects list then click Cross SAS: proc glm data=mice2x2; class Strain HTCond; model GPX1= HTCond Strain HTCond*Strain; means HTCond*Strain;run; Complete the Source Table Source Heat Strain Heat x Strain Within Total df ___ ___ ___ ___ ___ SS ____ ____ ____ ____ ____ F ____ ____ ____ p-value ______ ______ ______ (8 points) (B). Perform a one-way ANOVA on the J = 4 groups. Report the following Source Table Note: The SAS code below can be used to answer Questions 8B, 8C, and 9A-9C JMP: Use Analyze-Fit Model Select GPX1 as the Y varaible Select Group or Grp as a Model Effect by Clicking Add After clicking Run Model Under the Group Banner, Select LS Means Tukey HSD Option Select LS Means Contrast and create the 3 contrasts above. Use the Plus and Minus keys to create the Contrast. SAS: PROC GLM data=mice2x2; class GRP; model GPX1 = GRP; means GRP contrast contrast contrast / tukey cldiff; 'Heat Main' GRP 'Strain Main' GRP 'Interaction' GRP Source Between Within Total df ___ ___ ___ 1 -1 1 -1; 1 1 -1 -1; 1 -1 -1 1;run; SS ____ ____ ____ F ____ p-value ______ (4 points) 4 Beasley BST 622 Assignment 2 – 200 points Spring 2015 (C). Perform all pairwise comparisons using Tukey’s HSD Simultaneous Confidence Intervals. (5 points) Mean Diff Lower Bound Upper Bound Lfast-C vs Lfast-H Lfast-C vs Mslow-C Lfast-C vs Mslow-H Lfast-H vs Mslow-C Lfast-H vs Mslow-H Mslow-C vs Mslow-H (D). How does the interpretation of the results of the one-way ANOVA with Pairwise Comparisons differ from the results of the two-way ANOVA with analyses of main effects and interaction. (3 points) 9. Using Contrast statements (A). What is Null Hypothesis for the main effect of Heat? What is the result? H0: ___ LfastC t or F = ______ p = _____ (3 points) t or F = ______ p = _____ (3 points) ___ LfastH ___ MslowC ___ MslowH = 0 (B). What is Null Hypothesis for the main effect of Strain. H0: ___ LfastC ___ LfastH ___ MslowC ___ MslowH = 0 (C). What is the Null Hypothesis for the Heat x Substance interaction effect. (Hint: Interactions are multiplicative effects of the main effects). H0: ___ LfastC ___ LfastH ___ MslowC ___ MslowH = 0 t or F = ______ p = _____ (3 points) 10. Use PROC POWER onewayanova code with contrasts to answer the following questions. Example SAS code proc power; onewayanova test=contrast alpha = 0.05 contrast = (1 1 -1 -1) groupmeans = 4.7365 | 14.3382 stddev = 6.352795 ntotal = . power = .80; run; | 7.2235 | 6.0655 (A). Holding these results (Means and SDs) constant, what would a future Total Sample Size (N) need to be for the test of the Heat Main Effect to have 80% Power (1 – = 0.80) at a two-tailed = 0.05.? (3 points) (B). Holding these results (Means and SDs) constant, what would a future Total Sample Size (N) need to be for the test of the Strain Main Effect to have 80% Power (1 – = 0.80) at a two-tailed = 0.05.? (3 points) (C). Holding these results (Means and SDs) constant, what would a future Total Sample Size (N) need to be for the test of the Heat by Strain Interaction Effect to have 80% Power (1 – = 0.80) at a two-tailed = 0.05.? (3 points) 5 Beasley BST 622 Assignment 2 – 200 points Spring 2015 11. Write an interpretation of the results of the two-way ANOVA in the form of an “expanded” Results Section. (7 points) 12. This analysis can also be completed by regressing GPX1 onto contrast variables that represent the effects. Use either SPSS (Analyze-Regression-Lfastinear); SAS (PROC REG) or JMP (Fit Model); to compute the regression solution (E(Y) = b0 + b1CH + b2CS+ b3CI) for regressing GPX1 (Dependent, Response, Y variable) on to the Contrast Variables (Independent, Factor, X variable). (A). What are results of the Regression Models? R2 = _______ F= (B). What are the values of b0? ________ (1 point) (C). What are the Results (3 points) p= b1? ________ (1 point) b1 t-test p-value _______ _______ (1 point) (1 point) b2? ________ (1 point) b2 t-test p-value _______ _______ (1 point) (1 point) b3? ________ (1 point) b3 t-test p-value _______ ________ (1 point) (1 point) (D). How does these results in 12A-C compare to previous analyses in 9A-C? (3 points) 13. A researcher bases a pilot study on Caslake et al. (2008, Am. J.of Clin. Nutrition, 88(3): 618-629) to investigate the effects genotype on cardiovascular biomarker response to fish oils. Eighty African-American adults, aged 30–45 years, were prospectively recruited according to age, sex, and APOE genotype. Half of the participants were randomly assigned to ingest three 700 mg EPA+DHA/d (700FO) capsules per day for an 8week intervention period. The other subjects consumed control oil capsules on the same regimen. The hypotheses of main interest was whether changes in HDL levels (HDL_DIFF) were affected by Treatment (700FO vs Control) and APOE genotype, and whether the Treatment effect differed across APOE genotypes (Treatment x Genotype interaction). The data are in file: BST622-Assn2-FOHDL2012.xls. (A). Perform a 2 x 3 (Treatment by Genotype) two-factor ANOVA. Source SS df F Treatment ___ ____ ____ APOE ___ ____ ____ Treat x APOE ___ ____ ____ Within ___ ____ Total ___ ____ p-value _____ _____ _____ (8 points) (B). Conduct Follow-up Analyses. (7 points) i). If only Main effects are statistically significant, conduct the appropriate follow-up tests. ii). If the interaction is statistically significant, perform Simple Main Effects analysis as a follow-up. proc glm;class treat APOE; model HDL_DIFF = treat APOE treat*APOE;means treat*APOE; lsmeans APOE / adjust=tukey cl pdiff tdiff ; lsmeans treat*APOE / slice=APOE;run; (C). Write a brief interpretation of these results in the form of an “expanded” Results Section (7 points) 6 Beasley BST 622 Assignment 2 – 200 points Spring 2015 14. Perform a one-way ANOVA on the J = 6 groups; where j = 1 (Control-E2); j = 2 (Control-E3); j = 3 (Control-E4); j = 4 (Treatment-E2); j = 5 (Treatment-E3); and j = 6 (Treatment-E4); proc glm;class group; model HDL_DIFF = group; contrast 'contrast 1' group 1 1 1 -1 -1 -1; contrast 'contrast 2' group 1 0 -1 1 0 -1, group -1 2 -1 -1 2 -1 ; contrast 'contrast 3' group 1 -1 0 -1 1 0 , group 0 -1 1 0 1 -1 ; contrast 'contrast 4' group 1 0 0 -1 0 0;run; (A). Report the following Source Table for the One-Way ANOVA. Source df SS F Between Group ___ ____ ____ Within Group ___ ____ Total ___ ____ (B). What are results for Contrast 1? F-ratio p-value ______ (4 points) p-value (2 points). (C). Explain how the result in 14B relates to the results 13A. (D). What are results for Contrast 2? F-ratio (2 points) p-value (2 points). (E). Explain how the result in 14D relates to the results 13A. (F). What are results for Contrast 3? F-ratio (2 points) p-value (2 points). (G). Explain how the result in 14F relates to the results 13A. (H). What are results for Contrast 4? F-ratio (2 points) p-value (2 points). (I). Explain how the result in 14H relates to the results 13B. (2 points) (J). What is the null hypothesis for contrast for the Simple Main Effect of Treatment (CON vs 700FO) at APOE = E4? (2 points) H0: ___ C.E2 ___ C.E3 ___ C.E4 ___ T.E2 ___ T.E3 ___ T.E4 =0 7 Beasley BST 622 Assignment 2 – 200 points Spring 2015 15. In the data set BST622-Assn2-FOHDL2012.xls, Tx is an effect coding scheme, Control group (CON) is coded -1 and Treatment group (700FO) is coded +1. AP_add is an linear polynomial code that captures the additive genetic effect of the APOE marker. E2 is coded -1; E3 is coded 0; E4 is coded +1. AP_dom is an quadratic polynomial code that captures the non-additive (dominant) genetic effect of the APOE marker. E2 is coded -0.5; E3 is coded 1; E4 is coded +0.5. GxTAdd is a cross-product of Tx and AP_add that represents the interaction of Treatment with the additive genetic effect. GxTDom is a cross-product of Tx and AP_dom that represents the interaction of Treatment with the nonadditive genetic effect. Running the following SAS PROC REG code and note the similarity proc reg data=bsthdl2; model HDL_DIFF = Tx AP_add AP_dom GxTAdd GxTDom / scorr2 tol; TRT: test Tx =0; GENE: test AP_add = AP_dom =0; TxG: test GxTAdd = GxTDom = 0;run; (A). Report the following ANOVA Source Table. Source df Model (Regression) ___ Error (Residual) ___ Total ___ SS ____ ____ ____ (B). What are results for the TRT test statement? F-ratio F ____ p-value ______ (4 points) p-value (2 points). (C). Explain how the result in 15B relates to the results in 14B and 13A. (D). What are results for the GENE test statement? F-ratio (2 points) p-value (2 points). (E). Explain how the result in 15D relates to the results in 14D and 13A. (F). What are results for the TxG test statement? F-ratio (2 points) p-value (2 points). (G). Explain how the result in 15F relates to the results in 14F and 13A. (2 points) 8 BST 622 Assignment 2 – 200 points Beasley Spring 2015 Extra Credit 1 (10 points). In the data set BST622-Assn2-FOHDL2012.xls Tdum is an dummy code, Control group (CON) is coded 0 and Treatment group (700FO) is coded +1. GxDAdd is a cross-product of Tdum and AP_add that represents the interaction of Treatment with the additive genetic effect. GxDDom is a cross-product of Tdum and AP_dom that represents the interaction of Treatment with the nonadditive genetic effect. proc reg data=bsthdl2; model HDL_DIFF = Tdum AP_add AP_dom GxDAdd GxDDom / scorr2 tol; TRT: test Tdum =0; GENE: test AP_add = AP_dom =0; TxG: test GxDAdd = GxDDom = 0;run; Note any differences between these results and other linear model approaches to analyzing a factorial ANOVA design and offer possible explanations for these differences. Multiple Group Comparisons (ANOVA Models) EXTRA CREDIT (up to 20 points) EC2. Determine the F-ratio which results from the given one-way ANOVA data. Source df SS MS F Between 4 30.5 _____ ______ Within ____ _____ _____ ________________________________________________ Total 99 165.0 (4 points) EC3. For the data in question EC1, the estimated percent variance in the dependent variable accounted for by the independent variable is: 2 = R2 = _______________ (2 points) In a one factor ANOVA with J = 4 groups and nj = 5 subjects per group: EC4. Y 1 = 22 Y 2 = 24 What is the value for SS Between? Y 3 = 20 Y 4 = 26 EC5. s1 = 2.0 s2 = 2.2 What is the value for SS Within? s3 = 2.1 s4 = 2.3 _________________ _________________ EC6. Reconstruct the ANOVA Source Table Source df SS MS F Between ____ _____ ______ _____ Within ____ _____ ______ ________________________________________________ Total _____ (4 points) (4 points) (4 points) EC7. For the data in question EC5, the estimated percent variance in the dependent variable accounted for by the independent variable is: 2 = R2 = _______________ (2 points) 9