3.3 Statistical Inference with one sample from a population
Transcription
3.3 Statistical Inference with one sample from a population
3.3 Statistical Inference with one sample from a population 3.3.1 Introduction A hypothesis is a theory that has neither been proven nor disproven. A statistical test will never prove nor disprove a hypothesis with 100% certainty. 1/1 Statistical Hypotheses A statistical hypothesis is one that can be tested using a random sample or samples. It relates to a parameter of the population, commonly the population mean or proportion e.g. a) 30% of the adult population smoke. b) On average, Americans are heavier than Japanese. 2/1 Statistical Hypotheses In a statistical test we test between two opposing hypotheses, the null hypothesis H0 and the alternative HA (sometimes referred to as H1 ). The null hypothesis always contains an equality e.g. in case a) we test H0 : p = 0.3. In case b), the hypothesis in words states that two population means (say µX : mean mass of Americans and µY : mean mass of Japanese) are unequal. This cannot be the null hypothesis. The null hypothesis in this case is H0 : µX = µY i.e. on average Americans and Japanese weigh the same. 3/1 One-tailed and two-tailed tests The alternative HA can be either directional (one-tailed) or non-directional (two-tailed). Two-tailed alternatives simply state that there is a difference. One-tailed alternatives state what type of difference there is. 4/1 One-tailed and two-tailed tests e.g. in case a) HA : p 6= 0.3 would be the appropriate two-tailed alternative hypothesis i.e. the proportion of smokers in the population simply differs from 30%. In case b) the appropriate one-tailed alternative hypothesis would be HA : µX > µY i.e. on average Americans weigh more than Japanese. 5/1 One-tailed and two-tailed tests We use two-tailed alternatives when the initial hypothesis (written in words) i) does not state what sort of difference is expected and ii) the source of the hypothesis is unknown or can be assumed to be unbiased. This is true in case a), hence we use a two-tailed alternative. This is not true in case b), the hypothesis given in words states that on average Americans weigh more that Japanese. Hence, the alternative in this case is one-tailed i.e. HA : µX > µY . 6/1 One-tailed and two-tailed tests Suppose the hypothesis is given by a source that is likely to biased, for example, a hypothesis of a producer regarding a product of his. The alternative should state that the product is worse than the producer states. i.e. if a car producer states that his car consumes µ0 litres/100km, then the alternative should be that the car consumes more petrol i.e. HA : µ > µ0 . 7/1 General procedure of statistical testing Before the test is carried out, it is assumed that the null hypothesis (H0 ) is correct. The test is based on a sample of observations. If the sample is close to what we would expect under H0 , then we DO NOT REJECT H0 . This does not indicate that H0 is true, but it is a reasonable hypothesis given our data. If the sample is far away from what we would expect under H0 , then we accept HA is correct i.e. WE REJECT H0 . In this case it is likely that H0 is false, but we can never be 100% sure of this. 8/1 3.3.2 Types of Error A type I error is committed when a true null hypothesis is rejected. A type II error is committed when a false null hypothesis is not rejected. Suppose a drug company is testing a new drug to see whether a new drug is more effective than the presently used drug. The null hypothesis here is that the two drugs are as effective as each other. The alternative is that the new drug is better than the old drug. 9/1 Types of statistical error A type I error occurs when the test indicates that the new drug is better, but in fact it is not (H0 is incorrectly rejected). This leads to costs, since such a conclusion implies the introduction of a new drug which is no better (and possibly worse) than the old drug. A type II error occurs when the test indicates that the new drug is not better, but in fact it is (H0 should be rejected, but is not). This leads to costs, since such a conclusion implies that an improved drug is not introduced to the market. 10 / 1 Types of statistical error The probability of a type I error is the significance level of a test and denoted by α. The significance level is defined by the person carrying out the test. Clearly, the significance level of a test should be small (commonly ≤ 5% ). The stronger the evidence required to reject H0 , the lower the significance level (e.g. when the costs of rejecting H0 are high, we wish to avoid wrongly rejecting H0 and so we reduce α). However, the more we try to avoid type I errors (as the significance level is decreased), the more likely type II errors become. 11 / 1 Types of statistical error The probability of a type II error is denoted by β. The power of a test is defined to be 1-β. This is the probability of correctly rejecting a false null hypothesis. The power of a test cannot be measured in practice, since it depends on the (unknown) value of the parameter which is the subject of the test. 12 / 1 The power of a test For example, consider the test of the hypothesis that 30% of the adult population smoke. H0 : p = 0.3 versus H1 : p 6= 0.3. If in reality p = 0.5, the power of this test will be greater than if p = 0.4 (the further H0 is from reality, the more likely it is that we reject it). For a fixed significance level, the power of a test increases as the sample size increases. Ideally, we would like a test to have a low significance level and high power (i.e. the likelihood of either type of error is small). This can only be achieved if we have a large sample. 13 / 1 3.3.3 The process of hypothesis testing The procedure is as follows 1. State H0 and HA . 2. Choose the appropriate test statistic, T . This can be thought of as a measure of distance from H0 . If e.g. H0 : p = 0.3 is true, then we expect that close to 30% of our sample will smoke (there will be some random variation around this population proportion). 3. Calculate the realisation of the test statistic, t, based on the sample. 14 / 1 The process of hypothesis testing 4. Either a) calculate the p-value of the test (this is a measure of the ”credibility” of a null hypothesis. Statistical packages give this value). If the p-value of the test is less than the significance level then we reject H0 , or b) determine the appropriate critical value (this is a critical ”distance” from H0 ). If the realisation of the test statistic exceeds this critical value then we reject H0 . 5. Based on the p-value of the test (or the critical value and realisation of the test statistic) state your conclusion in words. 15 / 1 3.3.4 Testing hypotheses for a population mean (σ known or n > 30) The null hypothesis is H0 : µ = µ0 (µ0 given). The test statistic is Z= X − µ0 S.E .(X ) Given that the null hypothesis is true, then this statistic has approximately a standard normal distribution (independently of the distribution that the observations come from). Note that when the sample mean is close to µ0 , then the realisation of the test statistic is close to 0. In this case, we do not reject H0 . Realisations of the test statistic far from 0 correspond to the sample mean being ”significantly different” from µ0 . In this case, we should (in general) reject H0 . 16 / 1 Testing hypotheses for a population mean (σ known or n > 30) If the population variance σ 2 (or standard deviation σ) is known, we use σ S.E .(X ) = √ . n When σ is unknown, we use s S.E .(X ) ≈ √ . n 17 / 1 Calculation of the p-value for two-sided tests Suppose the alternative is non-directional i.e. HA : µ 6= µ0 . The p-value of the test is given by p = P(|Z | > |t|) = 2P(Z > |t|), where t is the realisation of the test statistic. The p-value is the probability that given H0 is true a randomly chosen sample favours the alternative more than the sample observed. Low values of the p-value indicate that H0 should be rejected. 18 / 1 Calculation of the p-value for two-sided tests 19 / 1 Interpretation of the p-value p > 0.05 indicates that there is no evidence against H0 (do not reject at the 5% level). 0.01 < p < 0.05 indicates that there is evidence against H0 (reject at the 5% level but not at the 1% level). 0.001 < p < 0.01 indicates that there is strong evidence against H0 (reject at the 1% level but not at the 0.1% level). p < 0.001 indicates that there is very strong evidence against H0 (reject at the 0.1% level). 20 / 1 The critical value of such a test The critical value of such a test at a significance level of α is Zα/2 = t∞,α/2 . We reject H0 if and only if |t| > Zα/2 = t∞,α/2 . It should be noted that if |t| > Zα/2 = t∞,α/2 then the p-value is less than α. 21 / 1 The critical value of such a test 22 / 1 Example 3.3.1 The average weight of a sample of 100 students is 72kg with a standard deviation of 12kg. Test the hypothesis that on average students weigh 75kg at a significance level of 1%. 23 / 1 Example 3.3.1 i) First, we state the hypotheses H0 : µ = 75; HA : µ 6= 75. ii) Second, we choose the appropriate test statistic. For a hypothesis regarding the population mean with one large sample, we use X − µ0 . Z= S.E .(X ) We use the approximation 12 s S.E .(X ) ≈ √ = √ = 1.2 n 100 24 / 1 Example 3.3.1 iii) We calculate the realisation of the test statistic t= 72 − 75 = −2.5 1.2 iv) We can calculate the p-value of the test p = 2P(Z > |t|) = 2P(Z > 2.5) = 2 × 0.00621 = 0.01242 25 / 1 Example 3.3.1 v) Based on this, we can state our conclusion. Since p > α = 0.01, we do not reject H0 at a significance level of 1%. Hence, we do not reject the hypothesis that the average weight of students is 75kg. 26 / 1 Example 3.3.1 Instead of calculating the p-value, we can base our conclusion on the appropriate critical value. iv) The critical value for a non-directional test is Zα/2 = t∞,α/2 = t∞,0.005 = 2.576. v) Based on this, we make our conclusion. Since |t| = 2.5 < 2.576, we do not reject H0 . Hence, we do not reject the hypothesis that the average weight of students is 75kg. 27 / 1 Duality between confidence intervals and two-sided tests Result Suppose we are testing H0 : µ = µ0 against HA : µ = µ0 . We should reject H0 at a significance level of 100α% if and only if µ0 does not belong to the 100(1 − α)% confidence interval for the population mean. e.g. we reject H0 at a significance level of 5% if and only if µ0 does not belong to the 95% confidence interval for the population mean. ”Confidence level + Significance level” = 100%. 28 / 1 Duality between confidence intervals and two-sided tests Intuition: The values in the confidence interval are credible values of the population mean at the appropriate significance level. Hence, we can carry out the test H0 : µ = µ0 against HA : µ = µ0 by calculating the appropriate confidence interval and basing our conclusion on the confidence interval. 29 / 1 Example 3.3.2 The average weight of a sample of 100 students was 72kg with a standard deviation of 12kg. Test the hypothesis that on average students weigh 75kg at a significance level of 1%. 30 / 1 Example 3.3.2 Since the significance level is 1%, we calculate a 99% confidence interval for the population mean. Since we have a large sample, the confidence interval is given by X ± t∞,α/2 S.E .(X ) ≈ X ± st∞,α/2 √ n 31 / 1 Example 3.3.2 We have α = 0.01, t∞,α/2 = t∞,0.005 = 2.576. The 99% confidence interval is given by 72 ± 12 × 2.576 √ = 72 ± 3.1 = [68.9, 75.1] 100 Since 75 belongs to this confidence interval, we do not reject H0 (the hypothesis that the average weight of all students is 75kg). 32 / 1 Use of the duality thoerem Note that using duality we can test a number of different hypotheses at a given significance level. e.g. in this case, we would not reject the null hypothesis that the average weight of students is 70kg at a significance level of 99%, since 70 also belongs to the confidence interval. By calculating the p-value (or realisation of the test statistic), we can test a particular null hypothesis at various significance levels. i.e. we can give more precise information on the weight of evidence against a null hypothesis. 33 / 1 Right-sided tests We consider two types of one-sided tests. Right-sided tests H0 : µ = µ0 ; HA : µ > µ0 . In this case, we reject H0 if the sample mean is ”significantly greater” than µ0 . The p-value is given by p = P(Z > t). Note, large positive realisations of the test statistic (associated with small p-values) occur when the sample mean is significantly greater than the hypothetical population mean µ0 . 34 / 1 Right-sided tests 35 / 1 One-sided tests As before, the null hypothesis is rejected if p < α. The critical value is given by Zα = t∞,α . We reject H0 if t > Zα = t∞,α . 36 / 1 Right-sided tests 37 / 1 Left-sided tests In this case we test between H0 : µ = µ0 ; HA : µ < µ0 . In this case, we reject H0 if the sample mean is ”significantly lower” than µ0 . The p-value is given by p = P(Z < t). 38 / 1 Left-sided tests 39 / 1 Left-sided tests Note, large negative realisations of the test statistic (associated with small p-values) occur when the sample mean is significantly lower than the hypothetical population mean µ0 . As before, the null hypothesis is rejected if p < α. The critical value is given by −Zα = −t∞,α . We reject H0 if t < −Zα = −t∞,α . 40 / 1 Left-sided tests 41 / 1 Example 3.3.3 A manufacturer states that his light bulbs function on average for 1000hrs. The mean working life of a sample of 81 bulbs was measured to be 920hrs with a standard deviation of 360hrs. Is the manufacturers claim reasonable at a significance level of 5%? 42 / 1 Example 3.3.3 i) We state our hypotheses H0 : µ = 1000; HA : µ < 1000. Note that this alternative states that the bulbs are worse than the producer states. i.e. this is a left-sided test. ii) The appropriate test statistic is Z= X −µ . S.E .(X ) 43 / 1 Example 3.3.3 iii) We calculate the realisation of the test statistic s 360 S.E .(X )≈ √ = √ = 40 n 81 920 − 1000 t= = −2. 40 iv) The p-value for this test is p = P(Z < t) = P(Z < −2) = P(Z > 2) = 0.02275. 44 / 1 Example 3.3.3 v) Conclusion. Since p < 0.05 = α, we reject H0 . We have evidence that the statement of the producer is unfounded. 45 / 1 Example 3.3.3 iv) We can also base our conclusion on the appropriate critical value. For a left-sided test, the appropriate critical value is −Zα = −t∞,α = −1.645. v) Since t = −2 < −Zα , we reject H0 at the 5% level. We have evidence that the statement of the producer is unfounded. 46 / 1 3.3.5 Testing hypotheses for a population mean (with a small sample, n < 30) In this case we use the test statistic T = where X − µ0 , S.E .(X ) s S.E .(X ) = √ . n Given H0 is true, if the observations come from a normal distribution, then this statistic has a Student t distribution with n − 1 degrees of freedom. Note: if the observations come from a distribution which is not normal, then this will not be true. 47 / 1 Testing hypotheses for a population mean (with a small sample, n < 30) We cannot calculate p-values by hand using tables. Hence, inference is based on the appropriate critical value read from the table for the Student t-distribution (Table 7). Again, the test statistic is a measure of how far the data are away from H0 . 48 / 1 Two sided tests We reject the null hypothesis if and only if |t| > tn−1,α/2 , where tn−1,p satisfies P(T > tn−1,p ) = p when T has a student t-distribution with n − 1 degrees of freedom. 49 / 1 Two-sided tests 50 / 1 Example 3.3.4 The average weight of a sample of 25 students was 72kg with a standard deviation of 12kg. Test the hypothesis that on average students weigh 75kg at a significance level of 5% . 51 / 1 Example 3.3.4 i) We state the hypotheses H0 : µ = 75; HA : µ 6= 75. ii) We choose the appropriate test statistic T = X − µ0 . S.E .(X ) Given H0 is true and the data come from a normal distribution, this statistic has a student t-distribution with n − 1 degrees of freedom. 52 / 1 Example 3.3.4 iii) We calculate the realisation of the test statistic s 12 S.E .(X )≈ √ = √ = 2.4 n 25 72 − 75 t= = −1.25. 2.4 iv) We read the appropriate critical value from the table for the Student t-distribution. iv) Since this is a two-tailed test, the significance level is α = 0.05 and the sample size is small, the critical value is tn−1,α/2 = t24,0.025 = 2.064. 53 / 1 Example 3.3.4 v) We state our conclusion. Since |t| = 1.25 < t24,0.025 = 2.064, we do not reject H0 (the hypothesis that the average weight of students is 75kg). 54 / 1 Example 3.3.4 It should be noted that weight does not have a normal distribution. However, its distribution is not highly asymmetrical and the number of observations is not very low. Hence, the distribution of the test statistic will be reasonably close to the student t-distribution. Also, the realisation of the test statistic is not particularly close to the critical value. Hence, our conclusion seems reasonable. 55 / 1 Use of duality for two-sided tests We can also use the duality between confidence intervals and two sided tests. In this case since the significance level is 5%, the appropriate confidence level is 95%. The appropriate confidence interval for the population mean (n < 30) is given by stn−1,α/2 √ n 12 × 2.064 √ =72 ± 25 =72 ± 4.95 = [67.05, 76.95]. X ± tn−1,α/2 S.E .(X )=X ± Since 75 belongs to this confidence interval, we do not reject H0 . 56 / 1 Right-sided tests We consider two types of one sided tests. The first are right sided tests. These are tests of the form H0 : µ = µ0 ; HA : µ > µ0 . We reject H0 only if the sample mean is significantly greater than µ0 . This corresponds to realisations of the test statistic significantly greater than 0. Precisely, we reject the null hypothesis if and only if t > tn−1,α . 57 / 1 Right-sided tests 58 / 1 Left-sided tests The second type of tests are left-sided tests. These are tests of the form H0 : µ = µ0 ; HA : µ < µ0 . We reject H0 only if the sample mean is significantly smaller than µ0 . This corresponds to realisations of the test statistic significantly smaller than 0. Precisely, we reject the null hypothesis if and only if t < −tn−1,α . 59 / 1 Left-sided tests 60 / 1 Example 3.3.5 A car producer states that one of his cars burns 6.2 litres of petrol per 100km. 10 magazines tested the car. The average of their results was 6.5 litres/100 km with a standard deviation of 0.3 litres/100 km. Is the statement of the producer reasonable at a 5% significance level? 61 / 1 Example 3.3.5 i) In this case the hypothesis H0 : µ = 6.2 is from a producer. The alternative states that the product is worse than the producer states (i.e. consumes more petrol). Hence, HA : µ > 6.2. ii) The test statistic is T = where S.E .(X ) = X − µ0 , S.E .(X ) √s . n If the observations come from a normal distribution, then this has a student distribution with n − 1 degrees of freedom. 62 / 1 Example 3.3.5 iii) We calculate the realisation of the test statistic. s 0.3 S.E .(X )= √ = √ ≈ 0.0095 10 10 6.5 − 6.2 t= ≈ 3.16. 0.0095 iv) We read the appropriate critical value. Since this is a right-sided test the critical value is given by tn−1,α = t9,0.05 = 1.833. 63 / 1 Example 3.3.5 v) We state our conclusion. This is a right sided test. Since t = 3.16 > tn−1,α = t9,0.05 = 1.833, we reject H0 at a significance level of 5%. Hence, there is evidence that the producers statement is unfounded. 64 / 1 3.3.6 Tests for a population proportion We only consider such tests with large samples (n > 30). The null hypothesis is H0 : p = p0 . Under the null hypothesis the standard error of the sample proportion, pˆ, is r p0 (1 − p0 ) S.E .(ˆ p) = n 65 / 1 Tests for a population proportion The test statistic, Z= pˆ − p0 , S.E .(ˆ p) has approximately a standard normal distribution. Note that this statistic is analogous to the statistic for large sample tests for a population mean. 66 / 1 Tests for a population proportion The test statistic is a measure of the distance between the sample proportion and the population proportion. We reject H0 if this difference is significantly large. The p-values and critical values for such tests can be calculated in the same way as for tests for the population mean with a large sample. 67 / 1 Example 3.3.6 100 of 300 people stated that they wanted to vote for Fine Gael at the next election. Test the hypothesis that 30% of the population wish to vote for Fine Gael at a significance level of 5%. 68 / 1 Example 3.3.6 i) We state our hypotheses H0 : p = 0.3; HA : p 6= 0.3 Since we do not know where this hypothesis is from, we use a two-sided test. ii) The test statistic is Z= pˆ − p0 . S.E .(ˆ p) 69 / 1 Example 3.3.6 iii) We calculate the realisation of the test statistic 1 100 = pˆ= 300 3 r p0 (1 − p0 ) S.E .(ˆ p )= n r 0.3 × 0.7 √ = = 0.0007 ≈ 0.02646. 300 Hence, t= 1/3 − 3 ≈ 1.26. 0.02646 70 / 1 Example 3.3.6 iv) We can calculate the p-value of the test. For a two-sided test p = 2P(Z > |t|) = 2P(Z > 1.26) = 2 × 0.1038 = 0.2176. v) Since p > α = 0.05, there is no evidence that this proportion deviates from 30% (we do not reject H0 ). 71 / 1 Example 3.3.6 Note that this conclusion can also be based on the appropriate critical value. For a two-sided test this is Zα/2 = t∞,α/2 = t∞,0.025 = 1.96. Since |t| = 1.26 < t∞,0.025 = 1.96, we do not reject H0 at a significance level of 5%. There is no evidence that the population proportion deviates from 30%. 72 / 1 Example 3.3.6 The duality between confidence intervals and two-sided tests also works for tests for the population proportion. However, when we calculate a confidence interval for a proportion, the estimate of the standard error is based on the sample proportion and not (as in the hypothesis test) on the supposed population proportion. Hence, the duality in this case is only approximate. For example, if I base the conclusion of a test on a 99% confidence interval for a population proportion, then the significance level is approximately 1%. 73 / 1 Example 3.3.7 100 of 300 people stated that they wanted to vote for Fine Gael at the next election. Calculate a 95% confidence interval for the proportion of the population wishing to vote for Fine Gael. On the basis of this confidence interval test the hypothesis that 30% of the population wish to vote for Fine Gael. 74 / 1 Example 3.3.7 The 95% confidence interval for the population proportion is pˆ ± t∞,α/2 S.E .(ˆ p ), where r S.E .(ˆ p )≈ r = pˆ(1 − pˆ) n 1/3 × 2/3 √ = 0.000741 ≈ 0.02722 300 75 / 1 Example 3.3.7 t∞,α/2 = t∞,0.025 = 1.96. The confidence interval is given by 1 pˆ ± t∞,α/2 S.E .(ˆ p )= ± 1.96 × 0.02722 3 =0.333 ± 0.053 = [0.280, 0.386] Since 0.3 belongs to this interval, we do not reject the null hypothesis that 30% of the population wish to vote for Fine Gael. The significance level of this test is approximately 5%. 76 / 1
Similar documents
Using Your TI-NSpire Calculator for Hypothesis Testing: t Dr. Laura Schultz Statistics I
More information