Small-Sample C.I.s for one- sample, two-sample and (matched) paired data
Transcription
Small-Sample C.I.s for one- sample, two-sample and (matched) paired data
Small-Sample C.I.s for onesample, two-sample and (matched) paired data Chapter 7: Estimation and Statistical Intervals 2/20/12 Lecture 15- Xuanyao He 1 Review: In-Class Exercise 1. What z critical value in the large-sample two-sided CI for µ should be used to obtain each of the following confidence levels? a. 97% b. 80% c. 75% 2. Given these two-sided C.I.s for µ: (6.71, 13.29) and (4.85, 15.15) a. What is the value of the sample mean? b. One of them is for confidence level = 90%, another for 95%, which one has the 90% confidence level, why? 2/20/12 Lecture 15- Xuanyao He 2 3. Suppose you want 90% of the area under the sampling distribution of x to lie within ± 1 unit of a population mean µ. Suppose the population standard deviation is 3. a) Find the minimum sample size n that satisfies this requirement. n = (1.645 *3/1 )^2 = 25, approximately. b) What if the standard deviation is not given? We only know the max. obs is 20, and min. is 10. Use range/4 to estimate standard deviation, and it’s (20-10)/4 = 2.5, therefore n = (1.645 *2.5/1 )^2 = 17, approximately. 2/20/12 Lecture 15- Xuanyao He 3 7.4 Small-sample intervals Based on Normal Population Distributions X −µ Z= σ n • By replacing σ with s, we introduce a little extra variability (due to random sampling, and s is a biased point estimator of σ) by this substitution. • However, with a large n, the sampling distribution still remains approximately normal. • Originally we started with X −µ – Hence, Z = s n is still justified. As well as the corresponding confidence interval, etc. • However, for small n (n< 25), this is no longer true! 2/20/12 Lecture 15- Xuanyao He 4 Effect of small samples using s • With small samples from the normal population distribution, using s, is much more variable. • However, we can still standardize X using s, we just need to get the “new” sampling distribution. • The standardized value will have a new distribution, called the t (or Student’s t) distribution: X −µ t= s n has the t distribution with n – 1 degrees of freedom. 2/20/12 Lecture 15- Xuanyao He 5 t distribution • There is a different t distribution for each sample size (a.k.a. degrees of freedom or df ) • The degrees of freedom for the t-statistic “come” from the sample standard deviation s. – Recall: we had an “n – 1” in the calculation of s. • Good news – The density curve of a t distribution is: • Symmetric • Bell-shaped • Centered at 0 2/20/12 Lecture 15- Xuanyao He 6 t distribution • The higher the degrees of freedom (df) are, the narrower the spread of the t distribution n1 < n 2 df = n2 df = n1 0 • As the df increase, the t density curve approaches the N(0, 1) curve more closely – When df à ∞, t à z (standard normal). • Generally it is more spread than the normal, especially if the df are small 2/20/12 Lecture 15- Xuanyao He 7 t distribution 2/20/12 Lecture 15- Xuanyao He 8 t distribution – table • On Page 566 of the textbook • For two-sided CI, locate the value of central area and find the corresponding tcritical in the row of calculated d.f. • For one-sided CI, locate the relevant cumulative area value and find corresponding t-crit within the row of d.f. 2/20/12 Lecture 15- Xuanyao He 9 One sample t confidence interval • So the only thing that changes with the confidence intervals is we substitute the z critical value by the t critical value. • One sample t confidence interval s X ± (t critical value) n • t critical value from a t distribution with df = n – 1, – if that d.f. does not appear in the table on Page 566, pick up the closet df to it; – if the df is some number in between two d.f.s in the table, pick up the smaller one to be conservative, e.g. df = 35, then approach it by df = 30 in the table; • (And again, we could do upper and lower confidence bounds as well) 2/20/12 Lecture 15- Xuanyao He 10 Example 5 • • • 2/20/12 From a running production of corn soy blend we take a sample to measure content of vitamin C. The results are: 26 31 23 22 11 22 14 31 Find a 95% confidence interval for the content of vitamin C in this production. Notice: df = 8 – 1 = 7 here. s X ± (t critical value) n 7.191 = 22.5 ± 2.365 × ≈ (16.49, 28.51) Lecture 15- Xuanyao He 8 11 7.5 Two sample t confidence intervals s12 s22 X 1 − X 2 ± (t crit ) + n1 n2 • The only difficulty here is the degrees of freedom. We no longer have a simple n – 1. df = (s 2 1 (s 2 1 2 2 n1 + s n2 2 ) ( 2 2 2 ) 2 ) n1 s n2 + n1 − 1 n2 − 1 • If the df is not an integer, round down to be conservative (I.E. df = 9.86, use 9) 2/20/12 Lecture 15- Xuanyao He 12 Example 6 • Metabolism rates of 12 random women and 7 random men were measured. n1 = 12 x1 = 1235.1 s1 = 188.3 n2 = 7 x2 = 1600 s 2 = 189.2 • Find a 95% confidence interval for the difference in mean metabolism between men and women • Remember to interpret the interval! 2/20/12 Lecture 15- Xuanyao He 13 df = (s 2 1 n1 + s n2 ) 2 s ( 1 n1 ) n1 − 1 2 2 2 + 2 2 s ( 2 n2 ) 2 = 12.6357 ≈ 12, n2 − 1 So t-critical (for 95%)= 2.179, and 95% C.I. is 2 1 2 2 s s X 1 − X 2 ± (t crit ) + = −364.9 ± 2.179 × 89.825 n1 n2 = [−560.629, −169.171] 2/20/12 Lecture 15- Xuanyao He 14 (Matched) Paired Data • Oftentimes, data is collected in pairs which creates the illusion of two samples, although in reality there is really only one sample. • Example—exams scores for STAT 350 Obs Exam 1 Exam 2 1 74 87 2 89 86 3 83 79 … • Why is this considered only one sample? They are taken from the same group of individuals. • Other examples: pre and post results, married men vs. women, measurements from twins 2/20/12 Lecture 15- Xuanyao He 15 Paired Data—treat like one sample • “Trick”—take the difference of the scores first, then study the “differences” as a single sample Obs Exam 1 Exam 2 Difference 1 74 82 8 2 89 86 -3 3 83 79 -4 … • Find the mean of the differences. • Find the standard deviation of the differences. • What is the relevant confidence interval formula? df? 2/20/12 Lecture 15- Xuanyao He 16 t C.I. for Paired Data 2/20/12 Lecture 15- Xuanyao He 17 Example 7 • Suppose a sample of n students were given a diagnostic test before and after completing a module. Here we have n1 = n2 = 10 students, the data is as follows: • Find a 95% confidence interval for the differences. 2/20/12 Lecture 15- Xuanyao He 18 • Calculate the 90% C.I. for the difference between post and pre scores. 2/20/12 Lecture 15- Xuanyao He 19 After Class… • Review sections 7.4 (till Pg 316) and 7.5 • Read sections 8.1 and 8.2 • Prepare your exam 1 – Office hour: 3:30 – 6pm today – Handwritten Cheatsheet; SAT-calculator; student ID • Hw#5 is due by this Wed, 5pm. • Lab #3 – this Wed. Due by beginning of Fri’s class 2/20/12 Lecture 15- Xuanyao He 20