Document 6540364
Transcription
Document 6540364
Hypothesis testing? Hypothesis testing introduction + Hypothesis testing with one measurement variable + Hypothesis testing with one categorical variable (2 options) women. Data Preparation Practical Significance Inference (Statistical Significance) Hypothesis Testing and/or Confidence Intervals Is at the heart of hypothesis testing (in the narrow sense) Does not replace Exploration and searching for practical significance. 4 steps of hypothesis testing What is a….? see readings pg. 16 1. Set up a null hypothesis H0 2. a. Calculate the p p--value using a hypothesis test b. Calculate the confidence interval if possible 3. Make a decision about rejecting the H0 4. Write a nice final statement that summarizes your decision & contributes to the final report Statistical significance: Can we generalize findings from the sample data to the population at large? If the pp--value is low enough then yes. Making inferences from a sample to a population Written Report Exploration Exploration In Data analysis analysis:: a way of testing a hypothesis using inferences from a sample to a population with a calculation of statistical significance as the key. Hypothesis testing is but one part of statistical analysis Wider sense: sense: a way of framing the goal of the research as a statement to be tested. e.g. On average men are taller than Null hypothesis. It is a ‘fake’ hypothesis set up for testing. (e.g. There is no difference between males and females)) If we reject the null hypothesis then it looks like we have some evidence for the original (or alternate) hypothesis (On average males are taller than females.) 1 What is a….? What is ….? P-value. It is the probability of getting the sample statistic that we calculated(sample proportion… p etc. ) assuming g that it mean or p comes from a population in which the null hypothesis is true Making the decision: In this course it will be easy.. If the pp-value is <0.05 then reject the null hypothesis, (i.e. you have enough evidence that the null hypothesis is not true. ) If the p p--value is very low then either your sample statistic is in the weird zone of the distribution of sample means (i.e. it is very unlikely) or we reject the null hypothesis (i.e. it is not true) Don’t forget to explore and look at practical significance… What is a….? Final statement. Hypothesis testing one measurement variable (single mean) This depends on the research question and yp that you y are testing. g the null hypothesis Just writing ‘I reject the null hypothesis’ is often not enough…. Do 500ml water bottles really have 500ml of water in them as claimed? What does a hypothesis test of a single mean look like? Example: Do 500ml water bottles really have 500ml Example: of water in them as claimed? A method was needed in order to test whether the reported/estimated/ or previously calculated value of the mean or proportion was true without opening and measuring the volume of every bottle of water. Wei Zhou, an HIM student conducted a mini research study in which he used hypothesis testing to find this out. Open up water wei zhou on www.stataras.com And in class exercise on page 6 of exercise booklet We will replicate the analysis of his data set by following the 4 step process (readings p. 16), but first we will explore the data and see what it tells us ‘practically’ parts A and B. 2 Do 500ml water bottles really have 500ml of water in them as claimed? Do 500ml water bottles really have 500ml of water in them as claimed? What the data says: exploration and practical significance: Based on a sample of 30 bottles N Mean Std. Error of Mean Median Mode Std. Deviation Skewness Kurtosis Range Histogram 30 Valid Missing 6 0 Frequency . 525.20 .416 525.00 523(a) 2.280 .428 -.561 8 4 2 0 522 524 526 528 530 Mean =525.2 Std. Dev. =2.28 N =30 530 528 526 What do you see? volume Dark room moment step 1: H0: μ=500ml - I want to test whether μ=500ml. HA: μ<500ml or μ>500ml - you can go either way with this depending on whether you trust the company. Do 500ml water bottles really have 500ml of water in them as claimed? Step 2a: output of SPSS test Test Value = 500ml 60.528 Does that mean we don’t need a hypothesis test? 522 Do 500ml water bottles really have 500ml of water in them as claimed? The hypothesis test tcalculated Not only is the sample mean > 500ml; Every single bottle had over 500ml of water in it. volume 524 v o l u m e The sample is not normally distributed, but that is of little consequence here. df Sig. (2-tailed) 29 .000 95% C.I. C I of the Difference Mean Upper Difference Lower 25.200 24.35 26.05 Practical significance: If I find that the mean volume is ‘practically larger’ than 500ml then I can answer the research question as ‘yes’, water bottles have more than 500ml in them, but only for the sample bottles. Statistical significance: If I reject the null hypothesis (H0), then my findings of practical difference in the sample can be generalized to the population of all water bottles. If I reject the null hypothesis then I have a ‘statistically significant’ result to report. If I fail to reject the null hypothesis (H0), then any findings of practical difference may be due to (random) sampling error (and chance) there is nothing ‘statistically significant’ to report. Do 500ml water bottles really have 500ml of water in them as claimed? Step 2: estimating p-value by hand Set level of significance to α=0.05, d.f.. = n – 1 = 29 d.f look up tcriticall (two tail) = 2.045 2 045 525.2. 500 tcalculated 2.28 30 P-value tcalculated = 60.54 3 Step 2b. The 95% Confidence interval using the explore function volume Mean 95% C.I. for Mean Lower Bound Statistic 525.20 Upper Bound 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis 526.05 525.11 525.00 5.200 2.280 522 530 8 4 .428 -.561 Std. Error .416 Each ‘weird zone’, titled Reject H0 here, is equal to 0.025 (or 2.5%) since i we are using i th the two tail method. Together they make 0.05 524.35 Reject H0 Fail to Reject H0 Area = 0.025 Reject H0 -1.98 .427 .833 Step 3. make a decision Step 2 cont’d. ::- p-value estimate by hand All we can say is that the pp-value < 0.05(two tail) since all we know is that we are in the weird zone P-value <0.05 (sig = 0.000) 0 +1.98 t = 60.54 Why not just use the confidence interval? We know that we are in the ‘weird zone’ and thus can reject H0 with 95% confidence. Our sample mean (525.2ml) > 500ml; this fits with HA : μ > 500ml. Now we can argue (with 95% confidence) that the actual mean μ is > 500ml. Step 4. thoughtful concluding statement The bottled water company’s claim is not accurate. I reject the claim that μ=500ml (p=0.000)) and can sayy with more than 95% (p confidence that they are giving the buyer between 524.35ml and 526.05 ml in the average bottle. Hypothesis testing one categorical variable (2 options) Searching for weirdness in proportions. 4 Is the rate of RSI in Canada 7%? What does a hypothesis test of a single proportion look like? Example: The RSI industry reports that the rate of Example: repetitive strain injury in Canada is 7%. We wish to test the veracityy of that claim. We will test whether the reported/estimated/ previously calculated value of the proportion is true without contacting/examining every single Canadian. Luckily the Canadian Community Health survey asked the very same question: Did you sustain an RSI in the past 12 months? Again, we start with exploration and ‘practical’ analysis. Open the ‘repstrain ‘repstrain’’ data set on www.stataras.com and page 7 in exercise booklet Is the rate of RSI in Canada 7%? Exploration and practical significance Choice of method for hypothesis testing n = 2000 p = 10.7% Option 1: test of one proportion - π. good for 2 categories only – similar in approach to test of single mean. Rep. strain injury - past 12 mo. Missing Total YES NO Total Total 2,000 Frequency Percent 214 10.7 1777 88.9 1991 99.6 9 .4 2000 100.0 Frequency Valid 1,500 1,000 Option 2: χ2 (chi (chi--square) goodness of fit test good for two or more categories 500 0 YES NO Rep. strain injury - past 12 mo. Concerns? Only 9 missing – that is very low (i.e. no concerns) note: I removed a few of the columns from the frequency table output. Is the rate of RSI in Canada 7%? Step 1 – the null hypothesis Option 1: test of single proportion H0: πyes = 0.07 HA: π yes > 0.07 (or π yes < 0.07) Option 2: chichi-square goodness of fit H0: RSI? Expected Yes 7% no 93% Is the rate of RSI in Canada 7%? Step 2a: hypothesis test on SPSS option 1 Be careful in entering the test proportion. H0: π yes = 0.07 but I needed to enter the test proportion as 0.93. Let trial and error be your guide. Open up the pain data set on www.stataras.com 5 Is the rate of RSI in Canada 7%? Step 2: hypothesis test option 2 – chi square Is the rate of RSI in Canada 7%? Step 2: output option 1 Binomial Test Observed Test Prop. Prop. Category N Rep. strain NO 1777 .892516 injury past 12 mo. YES 214 .107484 Total 1991 1.000000 .93 Again, you will need to be careful in setting up the hypothesis test. Asymp. Sig. (1tailed) .000(a) In this case indicate all of the proportions you wish to test. I know that entering 0.93 is correct because in the column ‘test proportion’ the value is in the row ‘no’ not ‘yes’ – which is what I would have expected. This is one of those weird things about SPSS. Is the rate of RSI in Canada 7%? Step 2: option 2 Expected values are based on the hypothesized proportions times the Total. e.g. Expected Yes = 0.07*1991 = 139.4 You can calculate the chi square value by using the formula on pg 16 of your handouts – try it. Step 3 visualizing option 1 All we can say is that the pp-value < 0.05(two tail) since all we know is that we are in the weird zone. Fail to Reject H0 95% of area under curve Each ‘weird zone’, titled Reject H0 here, is equal to 0.025 (or 2.5%) since i we are using i th the two tail method. Together they make 0.05 Reject H0 Area = 0.025 Reject H0 -1.96 Πtest=0.07 +1.96 zcalculated = 60.54 p (1 p ) n C. I for proportion П yes =p±z p 1 p n П = .107 107 ±1.96(.0073) 1 96( 0073) П = .107 ±.014 0.093 ≤ П ≤ 0.121 9.3% ≤ П ≤ 12.1% which is nowhere near the 7% claim. yes Is the rate of RSI in Canada 7%? Step 3: option 1& 2 Since pp-value is < 0.05, I can reject the null hypothesis with 95% confidence. I can be 95% confident that π yes ≠ 00.07; 07; yes yes yes 6 Is the rate of RSI in Canada 7%? Step 4: final statement I am 95% confident that π yes ≠ 0.07. The observed proportion of RSI in the sample of Canadians is pyes =10.7%. 10 7% Based on this sample I am 95% Confident that the rate of RSI is higher than 7%. The 95% C.I. is 9.3% ≤ П yes ≤ 12.1% χ2 goodness of fit test Example 2 - testing when expectation is uniform distribution i.e. all categories with equal proportion Hypothesis testing one categorical variable (>2 categories) Searching for weirdness in proportions. Step 1 of the hypothesis test colour of smartie 12.5 F Frequency Frequency Percent 8 14.5 7 12.7 10 18.2 8 14.5 13 23.6 9 16.4 55 100.0 One pack of smarties was randomly chosen for purchase. Open up the ‘smarties’ ‘smarties’ data set on www.stataras.com and start with an exploration Following page 8 in exercise booklet Smarties exploration (practical significance) blue green orange purple red yellow Total Research question: Are there an equal number of smarties of each colour produced? H0: We expect that there will be an equal number of smarties of each colour colour.. 10.0 7.5 5.0 2.5 0.0 blue green orange purple red yellow Thi can nott be This b solved l d with ith a straightforward t i htf d binomial test. We have to use option 2. colour of smartie It seems that there are more red smarties than other colours – is 23.6% vs 18.2% a large enough difference practically speaking? 7 Equality of distribution of smarties step 2: the test blue green orange purple red yellow Total Observed N Expected N Residual 8 9.2 -1.2 7 9.2 -2.2 10 9.2 .8 8 9.2 -1.2 13 9.2 3.8 9 9.2 -.2 55 Chi-Square(a) df Asymp. Sig. Decision: Since the p p--value is >0.05 (it is not even close) we don’t have enough evidence to reject the null hypothesis – P-value colour of smartie Both of these tables are produced with the chi-sq function. Smarties distribution step 3 and 4 – decision and statement 2.491 5 .778 Final statement: Even though in the sample we saw what looked like a difference, we do not have enough evidence to generalize that claim to all smarties boxes and thus cannot question the claim that there are an equal number of each colour smarties in the smartie population. More practice with hypothesis testing 8