Wilcoxon Mann-Whitney test
Transcription
Wilcoxon Mann-Whitney test
Supplement 16B: Small Sample Wilcoxon Rank Sum Test Hypothesis Testing Steps When the samples have fewer than 10 observations, it may not be appropriate to use the large sample Wilcoxon Rank Sum (Mann-Whitney) test. However, there are tables of critical values that allow us to safely perform this test for small samples. As in the large sample case, the hypotheses are: H0: Populations are the same H1: populations are not the same If the analyst is willing to assume that the populations differ only in location (i.e., the center of the distributions is shifted) and are otherwise the same, we can view this as a test of two medians. For a two-sided test the hypotheses would then be: H0: M1= M2 (medians are the same) H0: M1≠ M2 (medians are not the same) The hypothesis testing procedure is similar to the large-sample case until we get to Step 4. Step 1: Combine the two samples. Step 2: Calculate the ranks for the combined samples, sorting from smallest to largest. Be careful to average the ranks if there are tied data values. Warning: If you are using Excel, use the 2010 Excel function =RANK.AVG(X,Array,1) to sort from smallest to largest. Be sure to specify the third argument “1” because the default is to sort from largest to smallest. That is, if yo0u were to use the function =RANK.AVG(X,Array,0) or if the third argument is omitted as in =RANK.AVG(X,Array,) your data will be sorted from largest to smallest (the opposite of the test format shown here). Also, beware of the old 2007 Excel function =RANK(X,Array) and the new 2010 function =RANK.EQ(X,Array) which do not handle tied data values correctly. Step 3: Separate the ranks into the original groups and sum the ranks for each group. Denote the rank sums T1 and T2. Step 4: The test statistic is the sum of the ranks from the smaller sample (the sample with fewer observations). To avoid confusion, it is best to list the smaller sample first, so that the test statistic can be denoted T1. Table 16.xx shows the critical values for a two-tailed test at α = .05 (upper and lower 2.5% critical values). Reject H0 if T1 ≤ WLower or if T1 ≥ WUpper. Illustration: Computer Repair Claims Warranty Baldr Electronics Emporium is a medium-sized electronics retailer that offers a one-year parts and labor warranty on laptop computers that it sells. During the month of October, there were 15 claims for warranty repairs for its top two brands of laptops (6 claims for brand A and 9 claims for brand B). The store noted the number of days the laptops had been owned prior to coming in for repair. Is there a difference in the days owned prior to repairs? There is doubt about whether the data are normally distributed, so we will perform the Wilcoxon rank sum test to compare the medians. The color-coded data are: Brand A Brand B 225 83 79 52 225 113 52 67 29 165 98 132 48 230 255 Step 1: Combine the two samples. Step 2: Calculate the ranks for the combined samples, sorting from smallest to largest. Be careful to average the ranks if there are tied data values. For example, here the value 52 occurs twice, as does the value 225. Color coding helps you keep track of data in the the combined samples. Combined and Sorted Rank Brand A 29 1 48 2 52 Rank Brand B 29 1 48 2 52 3.5 52 3.5 3.5 79 6 67 5 52 3.5 98 8 83 7 67 5 225 12.5 113 9 79 6 225 12.5 132 10 83 7 165 11 98 8 230 14 113 9 255 15 132 10 165 11 225 12.5 n1 = 225 12.5 Median 1 = 230 14 255 15 Sum T1 = 43.5 6 88.5 Sum T2 = n2 = Median 2 = Rank 76.5 9 113.0 Step 3: Separate the ranks into the original groups and sum them for each group. The test statistic is the sum of the ranks from the smaller sample (the sample with fewer observations). If you wish, you can check your sums by adding; the ranks must sum to n(n+1)/2 where n = n1 + n2. In this case, n = n1 + n2. = 6 + 9 = 15 so the ranks must sum to n(n+1)/2 = 15(15+1)/2 = 120. Checking our rank sums, we get T1 + T2 = 43.5 + 76.5 = 120 which confirms our rank calculations. Step 4: Table 16.B1 shows the critical values for a two-tailed test at α = .05 (upper and lower 2.5% critical values). We would reject H0 if T1 ≤ WLower or if T1 ≥ WUpper. For our data, n1 = 6 and n2 = 9, so the decision rule is: Reject H0 if T1 ≤ 31 or if T1 ≥ 65 Because our test statistic is T1 = 43.5, we cannot reject H0. Although there is a difference in the sample medians, it is not great enough to conclude unequal population medians. TABLE 16.B1 Lower 2.5% and Upper 2.5% Critical Values for Wilcoxon Rank Sum Test n1 n2 4 4 10,26 5 11,29 17,38 6 12,32 18,42 26,52 7 13,35 20,45 27,57 36,69 8 14,38 21,49 29,61 38,74 49,87 9 14,42 22,53 31,65 40,79 51,93 62,109 10 15,45 23,57 32,70 42,84 53,99 65,115 78,132 11 16,48 24,61 34,74 44,89 55,105 68,121 81,139 5 6 7 8 9 10 11 12 96,157 12 17,51 26,64 35,79 46,94 58,110 71,127 84,146 99,165 115,185 Decision Rule: Reject the null hypothesis if T1 ≤ WLower or if T1 ≥ WUpper where T1 is the rank sum from the smaller sample. Source: F. Wilcoxon and R.A. Wilcox, Some Rapid Approximate Statistical Procedures, Lederle Laboratories, 1964. Use with permission of the American Cyanamid Company. Step 5: No action is required, However, the retailer may wish to continue accumulating data on the length of time before each warranty claim for these two top-selling brands. It is possible that in a larger sample, significant differences might be detected. Computer Software There are many reasons to prefer using a computer for this type of test. First, the calculations are easier. Second, you don’t need tables. Third, tables become awkwardly large for this test when sample sizes become larger. Table 16.B1, for example, is abbreviated. If you have sample sizes between 13 and 20, you would need a larger table. Figure 16.B1 show the output from Minitab, which confirms our calculations and our decision not to reject H0 at α = .05. Note that Minitab also provides a confidence interval for the difference of medians as well as a p-value (0.6367) which shows that the observed difference in medians is within the realm of chance. FIGURE 16.B1 Minitab Results for Wilcoxon Rank Sum/Mann-Whitney Test Mann-Whitney Test and CI: Brand A, Brand B Brand A Brand B N 6 9 Median 88.5 113.0 Point estimate for ETA1-ETA2 is -15.0 96.1 Percent CI for ETA1-ETA2 is (-113.0,112.0) W = 43.5 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.6374 The test is significant at 0.6367 (adjusted for ties) Section Exercises Note: *Indicates optional exercises based on large sample z-test or using software that may not be available to students. 16B.1 A trucking company wants to compare the number of miles driven by two delivery truck drivers in one week on different days (n1 = 5 days, n2 = 7 days). Do not assume that distances driven are normally distributed. (a) Use Table 16.B1 to test the hypothesis of equal medians at α = .05. Show the steps in your analysis. (b*) If possible, check your work using Minitab or another computer package. (c*) Perform a large-sample test using the z-test. Is your conclusion the same? Delivery Driver 1 128 102 78 40 76 Driver 2 97 158 112 112 216 316 112 16B.2 Below are data for two different regions, showing the number of days that kidney transplant patients had to wait before a donor was found (n1 = 6 patients, n2 = 8 patients). Do not assume a normal distribution of waiting times. (a) Use Table 16.B1 to test the hypothesis of equal medians at α = .05. Show the steps in your analysis. (b*)If possible, check your work using Minitab or another computer package. (c*) Perform a largesample test using the z-test. Is your conclusion the same? Kidneys East Region West Region 109 137 248 93 85 52 107 191 28 236 67 205 92 133