Constructing Confidence Intervals from a Ranked Set Sample Technical Report 783
Transcription
Constructing Confidence Intervals from a Ranked Set Sample Technical Report 783
Constructing Confidence Intervals from a Ranked Set Sample Technical Report 783 Christopher J. Sroka, Elizabeth A. Stasny, and Douglas A. Wolfe Department of Statistics The Ohio State University August 5, 2006 Abstract Ranked set sampling (RSS) is a sampling method whereby auxiliary information is used to rank the units in the population. A unit is included in the sample based on its rank amongst the other units. The method was first proposed over fifty years ago as a way to increase the precision of estimates of pasture yields without increasing the sample size. The estimator of the population mean based on balanced RSS has been shown to be unbiased and have variance no larger than the estimator based on the same number of measured observations from simple random sampling (SRS). In some cases, RSS can result in more precise estimates than those obtained through stratified simple 1 random sampling. RSS holds the potential to significantly improve the efficiency of survey sampling. Despite previous research demonstrating the advantages of RSS, the literature has not addressed how to calculate a confidence interval for the population mean using RSS. The critical component to computing such a confidence interval is estimating the variances of the judgment order statistics. In this paper, we develop estimators for these variances and demonstrate some of their desirable properties. Using these properties, we provide a formula for calculating a large sample confidence interval based on a normal approximation. This method is extended to the case where RSS is used in place of SRS in stratified sampling. We evaluate the performance of our confidence interval method using simulated samples from the Medical Expenditure Panel Survey (MEPS). We find that confidence intervals based on RSS can be over 40% shorter than competing confidence intervals based on a SRS of equivalent sample size. The data used for the simulation are heavily skewed, so the true confidence level is usually less than the nominal confidence level. RSS and SRS perform similarly with respect to how quickly the levels of their respective confidence intervals converge to the nominal level. Key words: Medical Expenditure Panel Survey, ratio estimation, stratified sampling, variance estimation 2 1 Introduction Ranked set sampling (RSS) is a method of data collection whereby the sampler’s judgment about the relative sizes of the population units determines which units are selected for measurement. The method was first devised by McIntyre (1952) as a way to improve estimates of pasture yields. McIntyre wanted to utilize selective sampling to increase the precision of his estimates without introducing bias. Since this original paper, RSS (in its basic form) has been shown to be at least as precise as SRS for estimating the population mean (Dell and Clutter 1972). Kaur, Patil, Shirk, and Taillie (1996) demonstrated that in some cases, RSS can result in more precise estimation than stratified simple random sampling (SSRS). Equivalently, fewer quantified measurements are needed under RSS than under SRS or SSRS to obtain a specified level of precision. Thus, RSS provides a more cost-effective alternative to other sampling methods when it is relatively inexpensive to rank units in the population but costly to quantify the variable of interest. Kaur, et al. (1996), Nahhas, Wolfe, and Chen (2002), and Wang, Chen, and Liu (2004) developed cost models to determine when RSS is more efficient than SRS and SSRS. The most basic form of RSS is balanced RSS. The sampler uses SRS, with replacement, to select m2 units from the population. These m2 units are divided into m sets of size m. The units within each set are ranked in increasing order according to the variable of interest prior to actually measuring this variable. The ranking is done using some sort of auxiliary information, such as a concomitant variable or visual inspection. The auxiliary information need not be perfectly correlated with the 3 variable of interest. The next step is to select units from the ranked sets for measurement. A set is randomly chosen without replacement, and from it the smallest ranked unit is chosen. Another set is selected without replacement, and from it the second smallest unit is chosen. This process continues until m units have been selected, each representing a different rank from a different set. The variable of interest is quantified for this subsample of m units. The units in the sets not included in the subsample are disregarded, and analysis is carried out using only the measurements on the variable of interest. This process yields m measurements on the quantified variable. The process can be repeated k times, called cycles, resulting in a total sample size of mk. Each measurement is called a judgment order statistic. We let X[r]i denote the judgment order statistic representing the rth rank in the set taken during the ith cycle, r = 1, 2, . . . , m and i = 1, 2, . . . , k. The ranking based on the auxiliary information might not result in the true ranking that would result if the units had been ranked based on complete knowledge of the variable of interest. In the case where the two sets of rankings are the same, we say we have perfect rankings. The term imperfect rankings is used to describe situations where the rankings based on the auxiliary information differ from the rankings based on knowledge of the quantified variable. Rankings can be so imperfect as to result in an ordering that is reversed from the true ranking. If such an imperfect ranking is applied consistently, it will be no different than taking a RSS under perfect rankings. The worst case scenario, in terms of the sample obtained, would be if the ranks were assigned randomly to the units in the sets. In this case, the judgment order statistics 4 have no more structure to them than a SRS of size mk. An estimator of the population mean under RSS, µ ˆ, is the simple average of the mk quantified observations, namely, µ ˆ= m k X 1 X X[r]i mk i=1 r=1 (1) Even if the rankings are not perfect, Dell and Clutter (1972) show that this estimator is unbiased if the ranking errors are uncorrelated with the ranking procedure. RSS has been shown to fit easily into survey sampling settings. Stokes (1977) showed that concomitant variables, which are frequently available in stratified survey designs, can be used for purposes of ranking. Husby, Stasny, and Wolfe (2005) demonstrated that estimates of corn production in Ohio can be significantly more precise under RSS than under SRS. The gains in precision are greater as the rankings become closer to perfect. Chen, Stasny, and Wolfe (2005) developed a method for estimating a population proportion via RSS. This method uses logistic regression on a set of concomitant variables to arrive at the rankings. Sroka, Stasny, and Wolfe (2005) showed that RSS can be used in place of SRS within strata in stratified sampling to improve precision. The RSS literature does not discuss how one would go about computing a confidence interval for the population mean once a RSS has been obtained. Previous studies demonstrating the precision of RSS rely on simulations to estimate the largesample variance of µ ˆ. Dell and Clutter (1972) show that the variance of µ ˆ is a function of the variances of the r judgment order statistics. There still remain the dual issues of how to estimate these variances and how to determine distributional quantiles for 5 a confidence interval. In Section 2, we develop an unbiased, consistent estimator for the variance of the rth judgment order statistic. We use this estimator to derive a large sample confidence interval based on a balanced RSS. We extend these results in Section 3 to the case where RSS is used in a stratified framework. Section 4 examines the use of RSS in place of SRS in ratio estimation. In Section 5, we discuss the results of a simulation study used to evaluate our confidence intervals. We present our conclusions in Section 6. 2 Confidence Intervals for Balanced RSS Suppose we collect a balanced RSS from the population of interest using k cycles and set size m. We do not need to assume that the rankings are perfect. Our first step in developing a confidence interval for the mean µ is to determine the variance of the estimator µ ˆ. Since only one of our measured quantities is selected from a set, and the sets are independent simple random samples, all of our observations are independent. They are not, however, identically distributed, since X[r]i and X[s]i , r 6= s, represent different ranks assigned to the units in the sets. Let f[r] denote the distribution of the 2 rth judgment order statistic. Furthermore, let µ[r] and σ[r] denote the mean and vari- ance, respectively, of this distribution. For a fixed r, X[r]1 , X[r]2 , . . . , X[r]k represent independent and identically distributed measurements drawn from the distribution f[r] . Thus, Var(ˆ µ) = 1 mk 2 X m k X i=1 r=1 6 Var(X[r]i ) = = 1 mk 1 m2 k 2 X m k X 2 σ[r] i=1 r=1 m X 2 σ[r] . r=1 ¯ [r]· = Pki=1 X[r]i /k. It is easy to see that this is an unbiased estimator of Let X 2 µ[r] . We propose the following estimator of σ[r] : 2 S[r] = k 1 X ¯ [r]· )2 (X[r]i − X k − 1 i=1 (2) Note that this is just the usual variance estimator for a SRS from the distribution f[r] . This is a logical choice given the fact that the judgment order statistics are independent and their distribution is identical for fixed r. Since X[r]1 , X[r]2 , . . . , X[r]k is a random sample from f[r] , it follows immediately 2 that the properties of S[r] are identical to those that hold for S 2 in the SRS case. p 2 2 2 2 Thus, E(S[r] ) = σ[r] and S[r] → σ[r] as k → ∞, provided the fourth moment of f[r] is finite. The following result allows us to make statements about the convergence of 2 S[r] based solely on the distribution of the population whence our RSS came. Theorem 1. Let X[1]1 , X[1]2 , . . . , X[m]k be a RSS from a population with distribution f (·). A sufficient condition for the fourth moment of f[r] to be finite is that the fourth moment of f (·) is finite. Proof. Dell and Clutter (1972) noted that f (x) = m 1 X f[r] (x). m r=1 (3) The result follows from the Law of Total Probability. The ranking procedure partitions the population into m classes, each with probability of selection 1/m. The 7 value f[r] (x) is the probability of selecting x given partition r was chosen. Using this result, we have 4 E(X[r]i ) < = = = m X 4 E(X[r]i ) r=1 m Z X r=1 X Z X m X r=1 Z X x4 f[r] (x)dx x4 f[r] (x)dx x4 mf (x)dx = m E(X 4 ) where X follows the population distribution f (x). The last line of the proof holds because the support for the judgment order statistics is the same as the support for the distribution of the overall population. This is an important fact that distinguishes judgment order statistics from ordinary order statistics. Given a ranked set, only one unit is selected from which we obtain the measured judgment order statistic. Since the sets are independent, there are no restrictions placed on the values of X[1]i , X[2]i , . . . , X[m]i other than those determined by the sample space of the population. Conversely, ordinary order statistics are the ranked observations from a single random sample. Their distribution carries the constraint Y1:n ≤ Y2:n ≤ . . . ≤ Yn:n , where Yj:n is the j th order statistic from a random sample of size n. We cannot calculate an exact confidence interval for µ without knowledge of f[r] . We can, however, develop a large sample approximate confidence interval based on the standard normal distribution. This approach assumes that the number of 8 cycles, k, goes to infinity while the set size, m, is fixed. Theorem 2. Let X[1]1 , X[2]1 , . . . , X[m]k be a balanced ranked set sample based on set size m and conducted for k cycles. Let µ denote the mean of the underlying distribution from which the sample was obtained. As k → ∞, for fixed m, √ µ ˆ−µ d −→ N(0, 1), m k qPm 2 r=1 S[r] (4) 2 where µ ˆ and S[r] are defined by Equations 1 and 2, respectively. ¯ [·]i = Proof. Let X 1 m Pm r=1 (1972) showed that µ = X[r]i , i = 1, 2, . . . , k. Applying Equation 3, Dell and Clutter 1 m Pm r=1 µ[r] . Therefore, m m 1 X 1 X ¯ µ[r] = µ E(X[·]i ) = E(X[r]i ) = m r=1 m r=1 (5) m m 1 X 1 X 2 ¯ Var(X[r]i ) = 2 σ[r] . Var(X[·]i ) = 2 m r=1 m r=1 (6) and ¯ [·]i , i = 1, 2, . . . , k are independent and identically distributed with For fixed m, the X mean and variance specified above. Applying the standard Central Limit Theorem, ¯ [·]i − µ d √ 1 Pki=1 X kq k −→ N(0, 1) 1 Pm 2 r=1 σ[r] m2 (7) 2 as k → ∞. It follows from the consistency of S[r] that m X r=1 p 2 S[r] −→ m X 2 σ[r] r=1 vP u m 2 u r=1 S[r] p =⇒ t Pm 2 −→ 1. r=1 σ[r] We apply Slutsky’s Theorem so that q Pm 2 ¯ [·]i − µ √ 1 Pki=1 X r=1 S[r] −1 d kq qP k −→ N(0, 1). m 1 Pm 2 2 r=1 σ[r] r=1 σ[r] m2 After simplification, we achieve our result. 9 (8) It follows from Theorem 2 that an asymptotic 100(1 − α)% confidence inteval for µ based on a RSS is µ ˆ ± zα/2 qP m r=1 2 S[r] √ m k (9) where zα/2 is the upper 100(1−α/2)th percentile of the standard normal distribution. 3 Extension to Stratified RSS Sroka, et al. (2005) introduced the concept of stratified ranked set sampling (SRSS). Under this sampling design, the SRS stage of stratified simple random sampling is replaced by RSS. This sampling method can result in significant gains in precision of estimators over stratified simple random sampling (SSRS). We modify our notation from Section 2 to account for the stratification. Suppose within each stratum h = 1, 2, . . . , H, we obtain a RSS of set size mh over k cycles, where k is the same across strata. The sample size from each stratum equals mh h; the optimal allocation of sample sizes to strata can be achieved by varying mh . Let X[r]ih = the rth judgment order statistic from cycle i in stratum h 2 σ[r]h = the variance of the rth judgment order statistic in stratum h µh = the mean of stratum h Nh = the population size of stratum h N = ¯ [·]i· = X H X Nh = the total population size h=1 H X h=1 Nh N mh 1 X X[r]ih = the RSS mean for cycle i mh r=1 10 k X ¯ [r]·h = 1 X[r]ih = the mean of the rth judgment order statistic for stratum h X k i=1 µ ˆh = µ ˆSRSS 2 S[r]h X mh X k 1 X[r]ih = the RSS mean for stratum h mh k r=1 i=1 H k X 1X Nh ¯ = X[·]i· = µ ˆh = the SRSS mean k i=1 h=1 N k 1 X ¯ [r]·h )2 (X[r]ih − X = k − 1 i=1 = the sample variance for the rth judgment order statistic for stratum h. We can develop a large sample confidence interval for µ under SRSS using a method similar to that used for the ordinary RSS case. For SRSS, we allow k, the number of cycles of RSS performed in each stratum, to get large, while the number of strata and the set sizes in each stratum remain fixed. The quantities ¯ [·]i· , i = 1, 2, . . . , k are independent and identically distributed random variables X such that ¯ [·]i· E X = = = mh H X Nh 1 X h=1 H X h=1 H X h=1 N Nh N mh 1 mh Nh µh N r=1 X mh E(X[r]ih ) µ[r]h r=1 = µ and ¯ [·]i· Var X = = mh H X Nh 2 X r=1 h=1 N mh mh H 2 X Nh X h=1 N mh 11 r=1 Var(X[r]ih ) 2 σ[r]h . By the Central Limit Theorem, √ k s 1 k PH Pk h=1 i=1 ¯ [·]i· − µ X Nh N mh 2 Pmh 2 r=1 σ[r]h d −→ N(0, 1) (10) 2 as k → ∞. By the consistency of S[r]h , s PH s h=1 PH h=1 Nh N mh 2 Nh N mh 2 Pmh r=1 2 S[r]h Pmh 2 r=1 σ[r]h p −→ 1 (11) as k → ∞. After dividing the expression in Equation 10 by the expression in Equation 11, applying Slutsky’s Theorem, and simplifying, we get the pivotal quantity √ k s µ ˆSRSS − µ PH h=1 Nh N mh 2 Pmh 2 r=1 S[r]h (12) which has a standard normal distribution for large k. Thus, from a SRSS, the approximate 100(1 − α)% confidence interval for the population mean is given by the followng formula: µ ˆSRSS ± zα/2 v 2 u Pmh 2 u PH u h=1 Nh r=1 S[r]h N mh t k (13) where zα/2 is the upper 100(1−α/2)th percentile of the standard normal distribution. 4 Ratio Estimation In this section, we discuss a ratio estimator for the population mean when the data were obtained using RSS. We compare the accuracy of this estimator to the corresponding estimator for SRS. Since ratio estimators are known to be biased, we compare the RSS estimator to the SRS estimator using the mean squared error (MSE). 12 Let X and Y denote two correlated random variables with respective means µX and µY . We assume µX > 0 and µY > 0, as is typical for most variables of interest in a survey sample. If we know the value of µX , we can use it to estimate µY (Lohr 1999). Let (Xi , Yi ), i = 1, 2, . . . , n denote measurements of X and Y from a SRS of n units from the population. The ratio estimator of µY , denoted µ ˜Y,SRS , is Y¯ µ ˜Y,SRS = ¯ µX X (14) ¯ are the sample averages of the Yi and Xi values, respectively. The where Y¯ and X MSE of µ ˜Y,SRS is approximately equal to the following quantity: 2 µ2 σY2 σX 2Cov(X, Y ) M SE(˜ µY,SRS ) = Y2 + − 2 2 nµX µY µX µX µY (15) 2 where σX and σY2 are the variances of X and Y , respectively. Note that the MSE decreases as the covariance between X and Y becomes large and positive. We consider the case of using ratio estimation when the (X, Y ) pairs are obtained via RSS. The sampling procedure is similar to that of the univariate case. The sampler identifies m sets of m units from the population. For now, we assume the units within each set are ranked based on the sampler’s perceived value of the Y variable; we will see later, however, that one could instead rank based on the perceived X value of the units. For the unit selected from a set for measurement, both the Y variable and the X variable are measured. The process is repeated k times. Thus, our data from this process are the mk pairs (X[r]i , Y[r]i ), r = 1, 2, . . . , m, and i = 1, 2, . . . , k. Although the ranking is done based on perceived Y values, the realizations of the X variable are also considered judgment order statistics. The observations 13 on X were indirectly ranked by virtue of their correlation (if any) with the ranking procedure. The ranking procedure produces a de facto ranking of X based on the ranker’s perception of Y for each of the m2 units in the initial sample. The rankings of the X values will become more accurate as the correlation between X and Y increases and the ranking of Y becomes more accurate. Both of the judgment order statistics, X[r]i and Y[r]i , follow particular distributions based on the ranking procedure. The mean and variance of the rth judgment 2 order statistic related to X are denoted µX[r] and σX[r] , respectively. The correspond- ing quantities related to Y are denoted µY [r] and σY2 [r] . Assume µX , the population mean of X, is known. Then a ratio estimator of µY based on RSS is Y¯RSS µ ˜Y,RSS = ¯ µX XRSS (16) ¯ RSS is the sample where Y¯RSS is the sample average of the mk Y[r]i values and X average of the mk X[r]i values. ¯ RSS . This The MSE of µ ˜Y,RSS depends on the covariance between Y¯RSS and X covariance can be expressed in terms of the covariance between X and Y for the entire population. Theorem 3. Suppose we obtain observations (X[r]i , Y[r]i ), r = 1, 2, . . . , m, i = 1, 2, . . . , k from a population using RSS. Then m X ¯ RSS , Y¯RSS ) = 1 Cov(X, Y ) + 1 µX µY − 1 Cov(X µX[r] µY [r] . mk mk m2 k r=1 14 (17) Proof. ¯ RSS , Y¯RSS ) = Cov Cov(X 1 mk = = 1 mk 1 mk = k k m X m X 1 X 1 X X[r]i , Y[r]i mk r=1 i=1 mk r=1 i=1 2 X m X k Cov(X[r]i , Y[r]i ) + r=1 i=1 2 X k m X XX Cov(X[r]i , Y[s]j ) r6=s i6=j Cov(X[r]i , Y[r]i ) r=1 i=1 2 X m Cov(X[r]1 , Y[r]1 ) k (18) r=1 The penultimate line holds because the judgment order statistics are obtained from independent samples. Consequently, X[r]i and Y[s]j are independent whenever r 6= s or i 6= j. The last line follows from the fact that (X[r]i , Y[r]i ) and (X[r]j , Y[r]j ) are identically distributed for all 1 ≤ i, j ≤ k, i 6= j. The ranking procedure partitions the joint distribution of X and Y , f (x, y), into m classes. Thus, by the Law of Total Probability, m 1 X f (x, y) = f[r,r] (x, y) m r=1 (19) where f[r,r] is the joint distribution of X[r]1 and Y[r]1 . It follows that m X Cov(X[r]1 , Y[r]1 ) = r=1 = m X E(X[r]1 Y[r]1 ) − r=1 m Z X r=1 X Z Y m X µX[r] µY [r] r=1 xyf[r,r] (x, y)dydx − = m E(XY ) − m X m X µX[r] µY [r] r=1 µX[r] µY [r] (20) r=1 Our final result is obtained by substituting the expression in Equation 20 into the expression in Equation 18 and replacing E(XY ) by Cov(X, Y ) + µX µY . Using the above result, we provide a simplified expression for the MSE of the ratio estimator under RSS. The MSE for the RSS estimator is a function of the 15 MSE for the SRS estimator, so the relative magnitudes of the two quantities can be compared easily. Theorem 4. The ratio estimator under RSS (Equation 16) has the following approximate mean squared error: 1 µY MSE(˜ µY,RSS ) ≈ MSE(˜ µY,SRS ) − 2 m k µX 2 X m r=1 µY [r] µX[r] − µY µX 2 . (21) ¯ RSS can be approximated using a Taylor series expanProof. The ratio of Y¯RSS to X sion. Y¯RSS 1 ¯ µY ¯ µY + ( Y − µ ) − (XRSS − µX ) ≈ RSS Y ¯ RSS µX µX µ2X X (22) From this approximation, we get ¯ YRSS µY − µX 2 MSE(˜ µY,RSS ) = E ¯ XRSS 2 µY ¯ 1 ¯ (YRSS − µY ) − 2 (XRSS − µX ) ≈ E µX µX 2 ¯ ¯ RSS − µX )2 (YRSS − µY )2 (X µY + = E µX µ2Y µ2X ¯ RSS − µX )(Y¯RSS − µY ) 2(X − µX µY ¯ RSS ) µY 2 Var(Y¯RSS ) Var(X + = 2 µX µY µ2X ¯ RSS , Y¯RSS ) 2 Cov(X − . µX µY (23) ¯ RSS and Y¯RSS was demonstrated in Theorem 3. From Dell and The covariance of X Clutter (1972), we have m 1 2 1 X σX − 2 (µX[r] − µX )2 mk m k r=1 m 1 X 1 2 σY − 2 (µY [r] − µY )2 . Var(Y¯RSS ) = mk m k r=1 ¯ RSS ) = Var(X 16 (24) (25) Using the expressions in Equations 17, 24, and 25 in Equation 23 and setting aside the terms that contain the subscript r, we obtain MSE(˜ µY,RSS ) = MSE(˜ µY,SRS ) − − Pm r=1 (µX[r] − m2 kµ2X Pm r=1 (µY [r] − m2 kµ2Y 2 µX ) µY )2 1 − −2 mk Pm r=1 µX[r] µY [r] m2 kµX µY After expanding the squared terms, simplifying, and factoring into a single squared term, we obtain the desired result. Theorem 4 provides important insight into the accuracy of the ratio estimator under RSS relative to the accuracy of the estimator under SRS. First, the RSS estimator has MSE that is no greater than the MSE for the SRS estimator. The MSE under RSS will be smallest relative to the MSE under SRS when the quantity µY [r] /µY is considerably different from the quantity µX[r] /µX for all r = 1, 2, . . . , m. This situation would arise, for instance, when the ranking procedure produces a perfect ordering of the Y variable, but it produces an ordering of the X variable that is exactly opposite the true ordering. Furthermore, the MSE for the RSS estimator is invariant with respect to which variable formed the basis for the judgment rankings in RSS. We assumed in our previous discussion that the sampler would rank the sets based on perceived values of Y . We see from Equation 21 that the MSE is not altered if the sampler ranked the sets based on judgments about X. For RSS to achieve a worthwhile reduction in MSE over SRS, some care must be used when choosing a ranking variable. The above result implies that the ranking variable should not be highly correlated with both X and Y in the same direction. Ideally, the ranking variable would be highly correlated with Y (either positively or 17 negatively) and have an opposite and strong correlation with X. Ratio estimation exploits the information contained in the relationship between X and Y . For RSS to be advantageous, the ranking must utilize information about Y that is not already accounted for in X. One can compute an approximate confidence interval for the ratio µY /µX under RSS using our approach in Section 2 and techniques similar to those used under SRS (Lohr 1999). 5 5.1 Simulation Study We conducted a simulation study to evaluate how often our confidence intervals for RSS and SRSS contain the population mean. We used data from the Medical Expenditure Panel Survey (MEPS). MEPS is a rotating panel survey of households and medical care providers conducted by the U.S. Agency for Healthcare Research and Quality. The data collected from the survey are used to estimate how much health care Americans use and the amounts they pay for it. Our simulation utilizes the 2002 consolidated Household Component. This dataset consists of responses provided by all households in the various panels for the entire year. Data are collected for every individual in the household. We collapsed the MEPS Household Component to give us totals by household. We also removed records for which a negative income was reported. The resulting 18 dataset consists of 14,686 records. We treat the dataset as though it were an actual population from which to sample. Therefore, the exact survey design and survey weights corresponding to the MEPS records are irrelevant for the simulation. The 14,686 households are the entire population, and the mean of these households on each variable of interest is the true population mean we wish to estimate. The quantified variables for our simulation are total prescription drug expenditures, logarithm of the quantity (total prescription drug expenditures + 1), total health expenditures paid by insurance on behalf of the household, and total household health expenditures. The simulation study consisted of two parts. In the first part, we compared the RSS confidence interval with the SRS confidence interval. We took 10,000 samples from the population using each sampling design. All sampling was done with replacement. The appropriate formula was applied to each sample to compute a 95% confidence interval for the population mean. We calculated the average length of the 10,000 intervals and the percentage of intervals that contained the true mean. For RSS, the simulation was conducted using set sizes of 2, 5, and 10 and enough cycles to yield sample sizes of 20, 50, 150, 500, and 2,500 for each set size. The SRS simulation utilized the same sample sizes. The second part of the simulation study compared SRSS to SSRS. The population was divided into three strata of equal sizes based on the values of the ranking variable. For the SRSS design, we sampled from each stratum using RSS; the set size per stratum, mh , was fixed at 5 for all strata. We conducted the simulation when the number of cycles of RSS used in each stratum was 5, 10, 50, 100, and 250. This resulted in total sample sizes of 75, 150, 750, 1,500, and 3,750, respectively. For the 19 SSRS simulation, we sampled equal numbers of observations from each stratum via SRS such that the total sample sizes would be the same as those for the SRSS simulation. As in the first part of the study, we obtained 10,000 samples and computed a 95% confidence interval for the population mean from each one. We then calculated the average length of the intervals and the percentage of intervals containing the true mean. Both parts of the simulation study were conducted using various combinations of ranking variables and variables of interest. The accuracy of the rankings in RSS can be quantified using the Spearman rank correlation coefficient between the quantified and ranking variables. A higher value of the coefficient corresponds to more accurate rankings. Table 1 indicates the variables used in the simulations and the rank correlations between them. All of the quantified variables used in the simulation measured some type of expenditure. The distributions of these variables were skewed heavily to the right, and a significant number of observations had values of zero. To simulate a case with a more symmetric distribution, we added 1 to the total drug expenditure variable and took the natural logarithm. 5.2 Results The results of our simulation study are presented in Table 2 through Table 9. In almost all of our simulations comparing RSS to SRS, the RSS interval was, on average, shorter than the SRS interval from the same size sample. The differences in interval 20 length were largest for the case where the log of the quantified variable and the ranking variable were highly correlated (see Table 2). In this case, when the number of sets used in RSS was ten, the decrease in the length of the interval, as a percent of the interval length under SRS, ranged from 42% to 44%, depending on the total sample size. For our simulations comparing SRSS to SSRS, the SRSS intervals were, on average, always shorter. In most of our simulations, the percent of intervals containing the population mean did not come close to 95% for either RSS or SRS (see Tables 3, 4, 5, 7, 8, and 9). This result is not surprising given the skewed distributions of the quantified variables. For the simulation where we used the natural logarithm (see Table 2), the RSS resulted in close to 95% coverage when the number of cycles was as low as 15. It is important to note that the percent of intervals containing the mean under the RSS simulations did not differ substantially from the percent containing the mean under the SRS simulations of equivalent sample size. Thus, the asymmetrical nature of the distributions of the variables is affecting the RSS procedure to the same degree that it affects the SRS procedure. 5.3 Discussion and Extensions In one simulation, the RSS intervals were, on average, longer than the SRS intervals. This occurred when the quantified variable and the ranking variable had a low rank correlation coefficient (see Table 5). In fact, 10 out of the 15 average interval lengths for RSS provided in Table 5 are longer than the corresponding average lengths for SRS. 21 We conducted a follow-up simulation study for the case where the rank correlation coefficient between the variables was low. In the follow-up study, we repeated the study as described above except that 100,000 iterations were used. The results of the follow-up study are shown in Table 10. Only four of the 15 average interval lengths for RSS were longer than the corresponding average interval lengths from SRS. Moreover, the differences were negligible for three of these reported lengths; the percent difference was less than 0.05%. The largest difference (0.173%) occurred when the set size was 2 and the total sample size was 20. For small set sizes, and low correlation between the variables, we would not expect RSS to perform much better than SRS. We conclude that variability in the simulation process led to the average lengths under RSS being longer than the average lengths for SRS. We note that it may be inappropriate to compare the asymptotic behavior of the RSS approximate confidence interval to that of the SRS approximate interval at identical sample sizes. The behavior of the RSS interval relies on the number of cycles, k, increasing to infinity. Conversely, the approximate interval for SRS depends on the total sample size, n = mk, being large. Thus, at a fixed sample size, the asymptotic behavior of the SRS interval is more established than the asymptotic behavior of the RSS interval. A simulation study involving ratio estimation is not presented here, but our methods could easily be modified to accommodate one. A possible variable of interest is the amount of health expenditures paid by private insurance. Total household health expenditures could be used as the X variable in the ratio estimation, since it is moderately correlated with the variable of interest (Spearman rank correlation 22 is 0.522). As a ranking variable, one could use household income; it is moderately correlated with the variable of interset (Spearman rank correlation is 0.433) but is weakly correlated with the total health expenditures (Spearman rank correlation is 0.116). The conditions for RSS to improve upon ratio estimation, discussed in Section 4, would be met. The approximate confidence interval for the ratio estimator under RSS can be calculated by applying the methods from Section 2 in conjunction with approximations provided by Lohr (1999). 6 Conclusions In this paper, we introduced an estimator of the variance of the rth judgment order statistic. We demonstrated desirable properties of this estimator that allow us to compute approximate confidence intervals for the mean of a population using a RSS. These results were extended to the case where RSS is conducted within a stratified sampling framework. When the ranking procedure provides reasonable structure to the observations, the confidence interval based on RSS is shorter than the corresponding interval based on SRS. In our simulation study, we obtained intervals based on RSS that were sometimes over 40% shorter than their SRS counterparts. The largest gains in precision were achieved when the number of sets used for RSS was relatively high. If the ranking procedure produces a result similar to a random ordering of the observations, however, the RSS confidence interval may be longer than the SRS interval. The difference between the average RSS interval length and the average SRS interval length 23 is negligible; the difference is likely due to variablility in the simulation process. As with SRS, asymmetry in the distribution of the variable of interest can slow convergence to the nominal confidence level. In one severe case, only about 91% of our simulated 95% confidence intervals contained the true mean, even though the number of cycles of RSS surpassed 1,200. Simulated confidence intervals using SRS fared no better, however. Thus, we conclude that RSS does not facilitate or hinder convergence to the nominal coverage probability. This paper raises an important question about how one should allocate a fixed sample size, mk, under RSS. The largest gains in precision over SRS are achieved by using a large number of sets, m. However, one also wants a large number of cycles, k, to ensure that the attained confidence level is close to the nominal level. A topic for future research is whether some method can be developed to ascertain the optimum choice of cycles and number of sets that accounts for this tradeoff. References [1] Agency for Health Quality and Research (2002), “Medical Panel Expenditure Survey,” www.meps.ahrq.gov. [2] Chen, H., Stasny, E. A., and Wolfe, D. A. (2005), “Ranked Set Sampling for Efficient Estimation of a Population Proportion,” Statistics in Medicine, 24, 3319-3329. [3] Dell, T. R. and Clutter, J. L. (1972) “Ranked Set Sampling Theory With Order Statistics Background,” Biometrics, 28, 545-555. 24 [4] Husby, C. E., Stasny, E. A., and Wolfe, D. A. (2005), “An Application of Ranked Set Sampling for Mean and Median Estimation Using USDA Crop Production Data,” Journal of Agricultural, Biological, and Environmental Statistics, 10, 354373. [5] Kaur, A., Patil, G. P., Shirk, S. J., and Taillie, C. (1996), “Environmental Sampling With a Concomitant Variable: A Comparison Between Ranked Set Sampling and Stratified Simple Random Sampling,” Journal of Applied Statistics, 23, 231-255. [6] Lohr, S. L. (1999), Sampling: Design and Analysis, Pacific Grove, CA: Duxbury Press. [7] McIntyre, G. A. (1952), “A Method for Unbiased Selective Sampling, Using Ranked Sets,” Australian Journal of Agricultural Research, 3, 385-390. [8] Nahhas, R., Wolfe, D. A., and Chen, H. (2002), “Ranked Set Sampling: Cost and Optimal Set Size,” Biometrics, 58, 964-971. [9] Sroka, C. J., Stasny, E. A., and Wolfe, D. A. (2005), “Ranked Set Sampling: Where Are the Samplers?” Technical Report 752, The Ohio State University, Department of Statistics. [10] Stokes, S. L. (1977), “Ranked Set Sampling With Concomitant Variables,” Communications in Statistics - Theory and Methods, A6, 1207-1211. [11] Wang, Y., Chen, Z., and Liu, J. (2004), “General Ranked Set Sampling With Cost Considerations,” Biometrics, 60, 556-561. 25 Table 1: Sets of Variables Used in Simulation Quantified variable Ranking variable Spearman rank correlation 0.907 Total drug expenditures Drug expenditures paid by household Log(total drug expenditures + 1) Drug expenditures paid by household 0.907 Health expenditures paid by insurance Health expenditures paid by household 0.436 Total health expenditures Household income 0.116 Table 2: Comparison of RSS and SRS. Variable of interest is log(drug expenditures + 1), ranking variable is household drug expenditures. Percent of intervals containing mean (95% nominal confidence level) No. of Sample size (n) sets (m) 20 50 150 500 2500 2 92.3% 94.0% 95.2% 95.1% 94.9% RSS 5 92.2% 94.1% 94.6% 95.0% 94.7% 10 90.4% 93.7% 94.8% 95.0% 94.6% SRS 93.1% 93.9% 94.8% 94.9% 95.2% Average length of interval (µ = 4.920) No. of Sample size (n) sets (m) 20 50 150 500 2 2.089 1.332 0.772 0.423 RSS 5 1.643 1.054 0.611 0.335 10 1.366 0.888 0.515 0.283 SRS 2.446 1.553 0.899 0.493 26 2500 0.189 0.150 0.127 0.220 Table 3: Comparison of RSS and SRS. Variable of interest is drug expenditures, ranking variable is household drug expenditures. Percent of intervals containing mean (95% nominal confidence level) No. of Sample size (n) sets (m) 20 50 150 500 2500 2 84.2% 88.2% 91.0% 92.5% 91.3% RSS 5 83.6% 87.9% 90.2% 91.0% 90.1% 10 80.9% 87.4% 89.6% 90.8% 89.5% SRS 84.9% 89.0% 91.8% 92.2% 91.7% Average length of interval (µ = 1021.24) No. of Sample size (n) sets (m) 20 50 150 500 2 1391.60 936.44 593.37 356.27 RSS 5 1265.68 875.18 547.46 331.02 10 1130.12 820.17 529.00 324.09 SRS 1430.43 1007.03 625.60 374.60 2500 184.12 175.54 172.22 192.43 Table 4: Comparison of RSS and SRS. Variable of interest is amount of expenditures paid by insurance, ranking variable is amount of expenditures paid by household. Percent of intervals containing mean (95% nominal confidence level) No. of Sample size (n) sets (m) 20 50 150 500 2500 2 75.2% 81.9% 88.1% 91.7% 94.4% RSS 5 75.1% 81.8% 87.9% 91.9% 94.3% 10 74.4% 81.9% 87.8% 92.5% 93.8% SRS 75.0% 81.9% 87.4% 91.9% 93.7% Average length of interval (µ = 1741.93) No. of Sample size (n) sets (m) 20 50 150 500 2 3142.91 2296.97 1460.20 839.67 RSS 5 3190.13 2290.01 1446.90 831.62 10 3086.05 2257.41 1439.66 827.55 SRS 3194.25 2309.67 1463.49 848.87 27 2500 383.94 379.08 377.22 388.34 Table 5: Comparison of RSS and SRS. Variable of interest is total health expenditures, ranking variable is household income. Percent of intervals containing mean (95% nominal confidence level) No. of Sample size (n) sets (m) 20 50 150 500 2500 2 79.1% 85.1% 89.7% 92.8% 94.3% RSS 5 78.3% 85.4% 90.0% 93.2% 94.7% 10 78.9% 85.6% 90.4% 93.2% 94.6% SRS 79.0% 85.5% 90.0% 92.7% 94.3% Average length of interval (µ = 5086.90) No. of Sample size (n) sets (m) 20 50 150 500 2 7411.01 5169.87 3235.09 1847.87 RSS 5 7346.86 5190.91 3234.70 1852.73 10 7437.71 5206.06 3236.60 1850.97 SRS 7396.06 5207.70 3222.80 1846.51 2500 844.81 845.53 845.12 844.87 Table 6: Comparison of SRSS and SSRS. Variable of interest is log(drug expenditures + 1), ranking variable is household drug expenditures. Percent of intervals containing mean (95% nominal confidence level) No. of Sample size (n) sets (m) 60 150 750 1500 3750 SRSS 5 94.0% 94.5% 94.7% 94.8% 94.9% SSRS 94.2% 94.8% 95.2% 94.8% 95.1% Average length of interval (µ = 4.920) No. of Sample size (n) sets (m) 60 150 750 1500 SRSS 5 0.549 0.390 0.175 0.124 SSRS 0.653 0.464 0.208 0.147 28 3750 0.078 0.093 Table 7: Comparison of SRSS and SSRS. Variable of interest is drug expenditures, ranking variable is household drug expenditures. Percent of intervals containing mean (95% nominal confidence level) No. of Sample size (n) sets (m) 60 150 750 1500 3750 SRSS 5 87.0% 88.9% 90.3% 89.8% 88.1% SSRS 87.7% 89.8% 91.2% 91.0% 89.3% Average length of interval (µ = 1021.24) No. of Sample size (n) sets (m) 60 150 750 1500 SRSS 5 647.71 496.33 262.94 201.89 SSRS 708.70 546.45 279.73 213.69 3750 140.66 147.30 Table 8: Comparison of SRSS and SSRS. Variable of interest is amount of expenditures paid by insurance, ranking variable is amount of expenditures paid by household. Percent of intervals containing mean (95% nominal confidence level) No. of Sample size (n) sets (m) 60 150 750 1500 3750 SRSS 5 83.7% 87.5% 92.9% 94.1% 94.3% SSRS 84.1% 87.3% 93.0% 93.9% 95.0% Average length of interval (µ = 1741.93) No. of Sample size (n) sets (m) 60 150 750 1500 SRSS 5 1899.82 1416.60 677.05 482.46 SSRS 1917.86 1422.50 679.04 484.59 29 3750 307.26 307.95 Table 9: Comparison of SRSS and SSRS. Variable of interest is total health expenditures, ranking variable is household income. Percent of intervals containing mean (95% nominal confidence level) No. of Sample size (n) sets (m) 60 150 750 1500 3750 SRSS 5 88.0% 90.1% 93.2% 93.7% 94.5% SSRS 87.7% 90.4% 93.8% 94.1% 94.7% Average length of interval (µ = 5086.90) No. of Sample size (n) sets (m) 60 150 750 1500 SRSS 5 4395.30 3240.78 1520.00 1082.93 SSRS 4420.99 3257.15 1524.18 1087.92 3750 690.88 690.75 Table 10: Comparison of RSS and SRS. Variable of interest is total health expenditures, ranking variable is household income (100,000 iterations) Percent of intervals containing mean (95% nominal confidence level) No. of Sample size (n) sets (m) 20 50 150 500 2500 2 79.2% 85.4% 90.2% 92.8% 94.4% RSS 5 79.0% 85.4% 90.2% 93.0% 94.3% 10 78.0% 85.2% 90.2% 92.9% 94.4% SRS 79.3% 85.4% 90.3% 92.8% 94.5% Average length of interval (µ = 5086.90) No. of Sample size (n) sets (m) 20 50 150 500 2 7427.21 5197.21 3235.41 1846.90 RSS 5 7414.47 5211.73 3235.33 1848.53 10 7332.53 5188.80 3233.52 1848.54 SRS 7414.35 5216.20 3236.10 1847.95 30 2500 844.97 844.63 844.34 844.98