Inference Based on a Single Sample size point estimation testing of statistical hypotheses
Transcription
Inference Based on a Single Sample size point estimation testing of statistical hypotheses
Inference Based on a Single Sample t We discuss point estimation, confidence interval estimation and testing of statistical hypotheses. t We also discuss reliability of estimation and sample size determination. t We distinguish two cases depending on the sample size large sample or small sample. Copyright © by the Dept. of Statistics, University of Connecticut 8.1 t The population mean W, and population proportion p are unknown parameters. We make inferences about W and p using sample data. t We estimate the unknown parameter W, based on the sample mean X obtained from a large sample Ýn ³ 30Þ or a small sample Ýn < 30Þ. å t The sample proportion p estimates p. t We also test hypotheses about W and p. t The population standard deviation a is usually an unknown parameter also. It is estimated by the sample standard deviation s. t For simplicity, we sometimes assume a is known when we make inference for W. Copyright © by the Dept. of Statistics, University of Connecticut 8.2 Def. 1: A point estimator of an unknown population parameter is a rule or formula that enables us to calculate a single number based on sample data which estimates the parameter. t A point estimator of the population mean W is the sample mean X. t A point estimator of the population proportion p is the å sample proportion p. t A point estimator of the population standard deviation a is the sample standard deviation s. Copyright © by the Dept. of Statistics, University of Connecticut 8.3 We wish to estimate a population parameter such as W, the population mean, and p, the population proportion. Until now, we have only considered point estimates; recall that X estimates W. Point estimates only give us a single value to represent the corresponding parameter. Usually, we wish to have an interval of plausible values. Copyright © by the Dept. of Statistics, University of Connecticut 8.4 Confidence Intervals Def. 2: An interval estimator (or confidence interval) is a formula that tells us how to use sample data to calculate an interval that estimates a population parameter. In general, a confidence interval for a parameter has the form Point est. ±(critical value)(std. error of point est.) t Interpretation: If assumptions are met, and we repetedly calculate confidence intervals for the parameter using different samples, then, 100Ý1 ? JÞ% of those intervals will capture the unknown population parameter. Copyright © by the Dept. of Statistics, University of Connecticut 8.5 t Associated with an interval estimate is a confidence coefficient that measures the confidence we place on the estimator. t 95% confident ö 0.95 confidence coefficient. 99% confident ö 0.99 confidence coefficient. Copyright © by the Dept. of Statistics, University of Connecticut 8.6 Def. 3: The confidence coefficient is the probability that an interval estimator captures the population parameter if this estimator is used repeatedly a very large number of times. t The confidence level is the confidence coefficient expressed as a percentage. t Let S : population parameter. Confidence coeff. = 0.95 ö in the long run, 95% of the interval estimates will contain S. t Denote confidence coefficient by 1 ? J. t Common confidence levels are 99%, 95% and 90%. t 1 ? J= 0.95 öJ= 0.05 Copyright © by the Dept. of Statistics, University of Connecticut 8.7 Confidence Intervals for W Case I. Large Sample Size Ýn ³ 30Þ t We construct a large sample 100Ý1 ? JÞ% confidence interval (C.I.) for W t The sampling distrinution of X is Normal with mean W and std. dev. a X = an t X has an exact Normal distribution if the Population distribution is itself NÝW, a 2 Þ. t If the Population distribution is NOT normal, since n is large, the CLT allows us to assume that X is Normal. t When a is unknown, we may replace with the sample std. dev. s, without much penalty. Copyright © by the Dept. of Statistics, University of Connecticut 8.8 t Given n, X, J, s, or a, we compute the 100Ý1 ? JÞ% confidence interval as X±z J/2 6 a x? = X±z J/2 6 a ,i.e., n X ? z J/2 6 a , X + z J/2 6 a n n Note: If the population distribution is Normal, and a is known, n may be less than 30, since the sampling distribution of X will be Normal (by Theorem 1). Copyright © by the Dept. of Statistics, University of Connecticut 8.9 t Suppose a is not known and suppose n ³ 30. Replace a by the sample standard deviation s. Then the 100Ý1 ? JÞ% C.I. for W is X ± z J/2 6 s n Recall s= Fx 2i ? ÝFx i Þ 2 /n . n?1 and shortcut formula: t s= FÝx 2i ?2 ? n x Þ/Ýn ? 1Þ Copyright © by the Dept. of Statistics, University of Connecticut 8.10 z J/2 Values for Commonly Used 100Ý1 ? JÞ J J/2 z J/2 90% .10 .05 1.645 95% .05 .025 1.96 99% .01 .005 2.575 Copyright © by the Dept. of Statistics, University of Connecticut 8.11 Example 1: n = 80 observations are taken from a NÝW, a 2 Þ population. Both W and a 2 are unknown. We are given X = 14.1, s = 2.6. (a) Find a 95% confidence interval for W. Here, Ý1 ? JÞ = 0.95 ö J = 1 ? .95 = 0.05. The 95% large sample C.I. for W is X±z J/2 6 s n where Copyright © by the Dept. of Statistics, University of Connecticut 8.12 From Table IV, when J/2 = .025, z J/2 = z .025 = 1.96 Substitute X = 14.1, s = 2.6 into the C.I. formula: 14.1 ± Ý1.96Þ Ý2.6Þ 80 = 14.1 ± Ý1.96ÞÝ.2907Þ = 14.1 ± .5698 = Ý14.1 ? .5698, 14.1 + .5698Þ = Ý13.5302, 14.6698Þ. Interpretation: If assumptions have been met, and we repeatedly calculate the C.I. using different random samples, 95% of those confidence intervals will capture W. Copyright © by the Dept. of Statistics, University of Connecticut 8.13 b) Find a 99% confidence interval for W. The C.I. is X ± z J/2 6 s . n We must find z J/2 when 1 ? J = 0.99 ö J = 1 ? .99 = .01, i.e., J/2 = 0.005 Copyright © by the Dept. of Statistics, University of Connecticut 8.14 From Table IV, z .005 = 2.58. The large sample 99% C.I. for W is X ±Ý2.58Þ s n Ý2.6Þ 14.1 ±Ý2.58Þ 80 = 14.1 ± 0.749 = Ý13.351, 14.849Þ. Copyright © by the Dept. of Statistics, University of Connecticut 8.15 c) Compare the intervals in (a) and (b). 95% C.I. : Ý13.5302, 14.6698Þ 99% C.I. : Ý13.351, 14.849Þ The 99% C.I. is wider than the 95% C.I. t If Ý1 ? JÞ increases, the confidence interval is wider, when s and n remain fixed. Copyright © by the Dept. of Statistics, University of Connecticut 8.16 d) Do the confidence intervals in (a) and (b) depend on population normality? NO. Since n = 80 > 30, by the Central Limit Theorem, X 2 is approximately N W, an . So the large sample C.I. is valid even if the population is not normally distributed. When n is large Ýn ³ 30Þ, the CLT guarantees that the sampling distribution of X is approximately normal. Copyright © by the Dept. of Statistics, University of Connecticut 8.17 Example 2: Suppose instead of drawing n = 80 observations from a population which is NÝW, a 2 Þ, we have n = 32. Suppose X = 14.1, s = 2.6. a) Find a 95% confidence interval for W. Recall z J/2 when 1 ? J = 0.95 was z .025 = 1.96. The large sample 95% C.I. for W is 14.1 ± Ý1.96Þ Ý2.6Þ 32 = 14.1 ± .9009 = Ý13.1991, 15.0009Þ Copyright © by the Dept. of Statistics, University of Connecticut 8.18 b) Compare the 95% C.I. for W when n = 80 and n = 32. C.I. when n = 80 : Ý13.5302, 14.6698Þ C.I. when n = 32 : Ý13.1991, 15.0009Þ t For fixed Ý1 ? JÞ and s, reduce the sample size ö C.I. becomes wider increase the sample size ö C.I. becomes narrower Copyright © by the Dept. of Statistics, University of Connecticut 8.19 Example 3: Find the confidence level corresponding to the following confidence intervals for W. a)X ± Ý1.96Þ a n . We know Ý1.96Þ = z J/2 . Hence, we must find J/2 and hence 100Ý1 ? JÞ%. PÝZ > 1.96Þ = 0.5 ? PÝ0 < Z < 1.96Þ = 0.5 ? 0.475 = 0.025 So, J 2 = .025 ö J = 2Ý.025Þ = 0.05 ö 1 ? J = 1 ? 0.05 = 0.95 a confidence level is 100Ý1 ? JÞ% = 95%. Copyright © by the Dept. of Statistics, University of Connecticut 8.20 b) Given X ± Ý1.645Þ a . Find the confidence level. n PÝZ > 1.645Þ = 0.5 ? PÝ0 < Z < 1.645Þ = 0.5 ? 0.45 = 0.05 So, J/2 = 0.05 ö J = 2Ý.05Þ = 0.10 ö 1 ? J = 1 ? 0.10 = 0.90 The confidence coefficient is 0.90. The confidence level is 90%. Copyright © by the Dept. of Statistics, University of Connecticut 8.21 Example 4: A random sample of 110 lightning flashes in a certain region resulted in an average radar echo duration of .81 sec. and a sample standard deviation of .34 sec. Compute a 99% confidence interval for the true average echo duration W. Copyright © by the Dept. of Statistics, University of Connecticut 8.22 Example 5: A random sample of 36 smoking members of a labor union revealed that the mean number of cigarettes smoked during a 5 day period per person was 75. The standard deviation was 12 cigarettes. Construct a 99% confidence interval for the true mean number of cigarettes smoked in the population. Copyright © by the Dept. of Statistics, University of Connecticut 8.23 Sample Size Determination We show how to determine n in two situations. Situation 1: t Suppose we want to estimate W to within some bound B, with a confidence of Ý1 ? JÞ. t Suppose we know the population standard deviation a. t We must determine what sample size n is required to achieve this bound B. Recall: where, a x? = a n . Bound B = z J/2 6 a x? = z J/2 6 a n . Then, solving for n, we get Ýz J/2 Þ 2 6 a 2 n= B2 Copyright © by the Dept. of Statistics, University of Connecticut 8.24 Situation 2: t Suppose we want a confidence interval width of w, when estimating W with a 100Ý1 ? JÞ% confidence level. t Suppose that a is known. t We must determine n to achieve the width w. We are given that z J/2 6 a n = w 2 Solving for n, we get, 4Ýz J/2 Þ 2 6a 2 n= w2 Copyright © by the Dept. of Statistics, University of Connecticut 8.25 Example 6: We wish to estimate W to within a bound B = 0.2, with confidence level 1 ? J = 0.95, when a = 6.1. Find the sample size n. We saw under Situation 1 that Ýz J/2 Þ 6 a 2 n= B2 2 1 ? J = .95 ö J = .05 ö J = .025 2 From Table IV, z J/2 = z .025 =1.96. Substituting into the formula, Ý1.96Þ 2 6Ý6.1Þ 2 = 585.84 n= 2 Ý62Þ i.e., n = 586 (after rounding). Copyright © by the Dept. of Statistics, University of Connecticut 8.26 Example 7: Suppose it costs you $10 to draw a sample of size n = 1. You have a budget of $1,200. a) Suppose you want to estimate W with 95% confidence, with w = 4, given a = 12. Find n. 4Ýz J/2 Þ 2 6 a 2 We saw that n = w2 1 ? J = .95 ö J2 = .025 ö From Table IV. z J/2 = 1.96 Copyright © by the Dept. of Statistics, University of Connecticut 8.27 Substituting into the formula, 4Ý1.96Þ 2 6Ý12Þ 2 n = = 138.30 2 Ý4Þ We need n = 139 samples (after rounding) at a cost of $Ý139ÞÝ10Þ = $1, 390. The budget, however, will not cover this. Copyright © by the Dept. of Statistics, University of Connecticut 8.28 b) What happens in (a) if 1 ? J = .90? Now, J = .10 ö J2 = .05 from Table IV, z J/2 = 1.645 We saw that, n = 4Ý1.645Þ 2 6Ý12Þ 2 Ý4Þ 2 = 97.42 So you need n = 98 samples (after rounding). Cost = Ý98ÞÝ10Þ = $980. The budget covers this. Copyright © by the Dept. of Statistics, University of Connecticut 8.29 Case II. Small sample Size Ýn < 30Þ Since n is small, we have two problems: 1. The CLT does not apply. 2. The sample std. dev. s is a poor estimate of the population std. dev. a. To proceed with inference, we must 1. assume that the data comes from a Normal distribution, and 2. use the Student’s t ?table. Given n, X, s, and J, we compute the 100Ý1 ? JÞ% confidence interval as X±t J/2 , n?1 s n where t J/2 , n?1 is the 100Ý1 ? J/2Þth percentile from a Student’s t ?distribution with n ? 1 degrees of freedom. Copyright © by the Dept. of Statistics, University of Connecticut 8.30 Notes about the Student’s t ?distribution t t ?table values are larger than the normal table values for the same J t small sample confidence intervals are therefore wider than large sample intervals for the same J and same s t As the degrees of freedom increase, the t ?distribution and the normal distribution get closer t percentiles in the t ?table for the row corresponding to infinite ÝKÞ degrees of freedom match the percentiles of the normal distribution t note that the degrees of freedom we use are related to the denominator in the formula for the sample standard deviation Copyright © by the Dept. of Statistics, University of Connecticut 8.31 Example 8: Chronic exposure to asbestos fiber is a well-known health hazard. Construction workers who had been exposed to asbestos over a prolonged period were examined. Among the data is pulmonary compliance (cm^3/cm H20) for each of 16 subjects 8 months after the exposure period. (Pulmonary compliance is a measure of lung elasticity, or how effectively the lungs are able to inhale and exhale.) 167.9 180.8 184.8 189.8 207.2 208.4 226.3 194.8 200.2 201.9 227.7 228.5 206.9 232.4 239.8 258.6 Construct a 95% confidence interval for the mean pulmonary complacence of for construction workers exposed to asbestos. Copyright © by the Dept. of Statistics, University of Connecticut 8.32 Example 9: A random sample of 16 complaints filed by customers of a large department store revealed a mean time required for processing of 30 minutes and a standard deviation of 8 minutes. Construct a 95% confidence interval for the true mean required to process customer complaints. Assume that the population of processing times is approximately normally distributed. Copyright © by the Dept. of Statistics, University of Connecticut 8.33 Confidence Interval for p For example, a. What proportion of college graduates in U.S.A. read mystery novels avidly ? Out of n = 225 students sampled, X = 120 are avid mystery readers. Then, X = 120 = 0.5333 = å p n 225 b. what proportion of the country supports abortion ? A survey of n = 1025 people shows that X = 500 support abortion. Then, X = 500 = 0.4878 = å p n 1025 Copyright © by the Dept. of Statistics, University of Connecticut 8.34 t The aim is to make inferences about p. This is a binomial parameter problem. å Number of success t Sample proportion = p = . Number of trials. å p is a random variable. å t The sampling distribution of p is the probability å distribution of the r.v. p å t The mean of the sampling distribution of p is p. å t The std. dev. of the sampling distribution of p is . a åp = pÝ1?pÞ n å t For large samples, the sampling distribution of p is approximately Normal, provided n > 30, np ³ 5 and npÝ1 ? pÞ ³ 5. å å t A check is to see whether Ý p ? 3a åp , p + 3a åp Þ contain 0 or 1. If this interval does not include 0 or 1, then the large sample approximation may be used. Copyright © by the Dept. of Statistics, University of Connecticut 8.35 The 100Ý1 ? JÞ% confidence interval for p is åå pq å p ± z J/2 n å å where q = 1 ? p. Copyright © by the Dept. of Statistics, University of Connecticut 8.36 Example 10: Construct a 95% confidence interval for p å based on a random sample with n = 400, p = .42. We first check whether the condition for the å approximate normal distribution of p is met. å ap = åå pq n = Ý.42ÞÝ.58Þ = .0247 400 å å å Ý p ? 3a p , p + 3a åp Þ = Ý.42 ? 3Ý.0247Þ, .42 + 3Ý.0247Þ = Ý.42 ? .0741, .42 + .0741Þ = Ý.3459, .4941Þ Does not include 0 or 1 ì large sample. Copyright © by the Dept. of Statistics, University of Connecticut 8.37 The 95% C.I. for pis å p ± z J/2 6 a åp ; From Table IV, z J/2 = 1.96. a 95% C.I. for p is .42 ± Ý1.96ÞÝ.0247Þ = .42 ± .0484 = Ý.3716, .4684Þ Copyright © by the Dept. of Statistics, University of Connecticut 8.38 Example 11: A pizza parlor is considering replacing its oven with a new one. The new oven is particularly suited for baking large (16-in) pizzas. Let p denote the proportion of all pizzas ordered that are large. A random sample of n = 150 pizza orders yielded 120 pizzas that were large. Construct a 97% confidence interval for the true proportion of large pizzas at this restaurant. Copyright © by the Dept. of Statistics, University of Connecticut 8.39 Example 12: In a random sample of 200 college students, 10 stated that they own a motorcycle. Construct a 99% confidence interval for the true proportion owning motorcycles. Copyright © by the Dept. of Statistics, University of Connecticut 8.40 Example 13: A hatchery examined a random sample of 574 eggs from a supplier and found 10 to be infertile. Construct a 93% confidence interval for the true proportion of infertile eggs. Copyright © by the Dept. of Statistics, University of Connecticut 8.41 Sample Size Determination for Inference about p The 100Ý1 ? JÞ% C.I. for p is å p ± z J/2 6 åå pq n Width and Bound of this C.I. are Width = 2 6 z J/2 6 a åp (W) pq n = 2 6 z J/2 6 Bound = z J/2 6 pq n (B) Copyright © by the Dept. of Statistics, University of Connecticut 8.42 To determine 100Ý1 ? JÞ% C.I. for p within a bound B, we need a sample size of Ýz J/2 Þ 2 6 ÝpqÞ n= B2 We do not know p, q. Replace by their estimates. Then åå Ýz J/2 Þ 2 6 Ý p qÞ n= B2 Copyright © by the Dept. of Statistics, University of Connecticut 8.43 Example 14: We wish to estimate the proportion p of TV viewers who watch a prime-time comedy show on May å 24. We expect that p u 630. Find n in order to make the width of the confidence interval to be ² .01. W = .01 ì B = W = .005 2 so that åå Ýz J/2 Þ 2 6 p q n= B2 Ý1.96Þ 2 Ý.3ÞÝ.7Þ = Ý.005Þ 2 = 32269.44 > 32270 Copyright © by the Dept. of Statistics, University of Connecticut 8.44