Author manuscript, published in "Statistics & Probability Letters 78, 8... 10.1016/j.spl.2007.09.057
Transcription
Author manuscript, published in "Statistics & Probability Letters 78, 8... 10.1016/j.spl.2007.09.057
Author manuscript, published in "Statistics & Probability Letters 78, 8 (2010) 970" DOI : 10.1016/j.spl.2007.09.057 Accepted Manuscript Stratified two-stage sampling in domains: Sample allocation between domains, strata, and sampling stages peer-00616535, version 1 - 23 Aug 2011 Marcin Kozak, Andrzej Zieli´nski, Sarjinder Singh PII: DOI: Reference: S0167-7152(07)00349-5 10.1016/j.spl.2007.09.057 STAPRO 4795 To appear in: Statistics and Probability Letters Received date: 18 April 2006 Revised date: 7 August 2007 Accepted date: 25 September 2007 Please cite this article as: Kozak, M., Zieli´nski, A., Singh, S., Stratified two-stage sampling in domains: Sample allocation between domains, strata, and sampling stages. Statistics and Probability Letters (2007), doi:10.1016/j.spl.2007.09.057 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. ACCEPTED MANUSCRIPT RI P T Stratified two-stage sampling in domains: sample allocation between domains, strata, and sampling stages Marcin Kozak ∗ Andrzej Zieli´ nski Sarjinder Singh SC Department of Biometry, Faculty of Agriculture and Biology, Warsaw Agricultural University, Nowoursynowska 159, 02-776 Warsaw, Poland Department of Biometry, Warsaw Agricultural University, Nowoursynowska 159, 02-776 Warsaw, Poland AN U Abstract TE DM In the paper, formulae for optimum sample allocation between domains, strata in the domains, and sampling stages are presented for stratified two-stage sampling in domains under fixed sample size of SSUs from PSUs. Key words: domain-orientated approach, optimum sample allocation, survey cost. 1 Introduction EP Kozak (2005) presented basic concepts of stratified two-stage sampling design, in which a population of primary sampling units is subdivided into strata. He provided formulas for optimum sample allocation between strata and sampling stages under two schemes of the design: (i) in which sample size of secondary sampling units (SSUs) from primary sampling units (PSUs) is fixed, and (ii) with self-weighting design in strata. Kozak and Zieli´ nski (2005), on the other hand, presented basic concepts of a problem of sample allocation between domains and strata in case when domains are subdivided into strata. They considered (i) a so-called domain-orientated approach to the sample allocation, in which one requires precise estimation in all the domains, and (ii) sample allocation orientated towards minimizing total survey cost subject to AC C peer-00616535, version 1 - 23 Aug 2011 Department of Statistics, St. Cloud State University, St. Cloud, MN, 56301, USA ∗ Corresponding author. Email address: m.kozak@omega.sggw.waw.pl (Marcin Kozak). Preprint submitted to Elsevier Science 3 October 2007 ACCEPTED MANUSCRIPT Estimation for domains under stratified two-stage sampling in the domains: basic ideas U= D H dh [ [d M [ TE DM AN U Basic concepts of stratified sampling and two-stage sampling, which lay the basis for the design introduced in this section, may be found, e.g., in S¨arndal et al. (1992) or Singh (2003). Consider a population U comprising N elements. The population is subdivided into D domains Ud (d = 1, . . . , D); each domain is subdivided into Hd non-overlapping strata Udh ; finally, each stratum Udh is subdivided into Mdh separate PSUs Udhg . This division can be presented as Udhg , Ud ∩ Ud0 = ∅ f or d, d0 = 1, . . . , D, d 6= d0 ; and d=1 h=1 g=1 Udh ∩ Udh0 = ∅ f or d = 1, . . . , D, h, h0 = 1, . . . , Hd , h 6= h0 ; and EP Udhg ∩ Udhg0 = ∅ f or d = 1, . . . , D, h = 1, . . . , Hd , g, g 0 = 1, . . . , Mdh , g 6= g 0 . The gth PSU from the hth stratum in the dth domain comprises Ndhg SSUs, which are the population elements. Let Nd indicate the number of SSUs in the dth domain and Ndh indicate the number of SSUs in the hth stratum of the dth domain. AC C peer-00616535, version 1 - 23 Aug 2011 2 SC RI P T fixed level of precision of estimation in the domains. In this paper, we introduce a hybrid of these two designs, namely stratified two-stage sampling in domains. Such a design can be of practical use when a population is subdivided into domains, each domain comprising some number of strata consisting of PSUs. A simple random sample of SSUs is to be taken without replacement from PSUs. We consider a situation in which sample size of SSUs from PSUs is fixed; in such a case, one obtains a sample of fixed size (and fixed total cost). We give formulas for sample allocation between domains, strata and sampling stages (i) under domain-orientated approach, and (ii) orientated towards minimizing a total survey cost. Let a population parameter investigated be the population total of Y , Y being a characteristic studied. For the dth domain its estimator is given by Yˆd = Hd X h=1 Yˆdh = ndhg Hd dh X X Mdh m Ndhg X h=1 mdh g=1 ndhg i=1 2 ydhgi (1) ACCEPTED MANUSCRIPT RI P T where Yˆd is the estimator of Yd , Yd being the population total of the variable Y restricted to the dth domain; Yˆdh is the estimator of Ydh , Ydh being the population total of the variable Y restricted to the hth stratum of the dth domain; mdh is the sample size of PSUs from the hth stratum of the dth domain; ndhg is the sample size of SSUs from the gth PSU of the hth stratum of the dth domain; and ydhgi is the Y value in the ith SSU (population element) of the gth PSU from the hth stratum in dth domain. In both cases (i.e., when sampling PSUs from strata and when sampling SSUs from PSUs), simple random sample is to be taken without replacement. h=1 mdh SC · Hd X Mdh (Mdh − 2 mdh )S1dh P ¸ dh X 1 M 2 + Ndhg (Ndhg − ndh )S2dhg (2) ndh g=1 AN U V (Yˆd ) = PN TE DM dhg 2 2 dh where S1dh = (Mdh − 1)−1 M g=1 (Ydhg − Y dh ) , Ydhg = j=1 ydhgj , −1 PMdh PNdhg 2 −1 PNdhg 2 Y dh = Ndh g=1 j=1 ydhgj , S2dhg = (Ndhg − 1) j=1 (ydhgj − Y dhg ) , −1 PMdhg Y dhg = Ndhg j=1 ydhgj . Note again that sample sizes ndh in the variance (2) refer to sample sizes ndhg , which are assumed to be the same for all g = 1, . . . , Ndh in a particular section domain d×stratum h. Hence, for the sake of convenience, we write ndh instead of ndhg , keeping in mind that ndh is the sample size of SSUs from every gth PSU of the hth stratum in the dth domain. An ordinary unbiased estimator 2 of the variance (2) is obtained by replacing the population quantities S1dh and 2 S2dhg with their sample estimators; the summation in (2) is to be done by sampled PSUs in each hth stratum from the dth domain. EP The coefficient of variation of the estimator Yˆd , say δ(Yˆd ), is q δ(Yˆd ) = V (Yˆd ) , d = 1, . . . , D Yd AC C peer-00616535, version 1 - 23 Aug 2011 Let us consider a two-stage sampling scheme in which we deal with a fixed size of sample of SSUs. In our design, it consists in sampling the same number ndh ( ndhg = ndh for each combination of d = 1, . . . , D and h = 1, . . . , Hd ) of SSUs from PSUs in each section domain d × stratum h. Under such a design, the variance of the estimator (1) is given by (Kozak, 2005) In this paper, we understand optimum conditions of a design as the ones for which some function of δ(Yˆd ) is minimum. Let the overall survey cost C be C = C0 + Hd D X X ¶ µ mdh c1dh + ndh c2dh d=1 h=1 3 (3) ACCEPTED MANUSCRIPT Optimizing a design under domain-orientated approach RI P 3 T where C0 is the fixed survey cost, c1dh is the cost of selecting one PSU from the hth stratum of the dth domain, and c2dh is the cost of obtaining the information on Y value in one SSU from the hth stratum of the dth domain. AN U δ(Yˆd ) = gd ϕ, d = 1, . . . , D (4) TE DM Then, our aim is to find optimum values of ndh and mdh (d = 1, . . . , D, h = 1, . . . , Hd ) under fixed overall survey cost (3) equal C (given c1dh and c2dh ) so that the condition (4) is satisfied and the common value ϕ is minimum. We will optimize the design based on the assumption that the survey variable is the same as the auxiliary variable used to allocate the survey cost. Of course, in practice, it is an untrue situation; instead of the population values, the quantities originating from recent censuses or previous/pilot surveys are used. s ndh = EP Theorem 1. When a population U is subdivided into D domains and stratified two-stage sampling with fixed sample size of secondary sampling units from primary sampling units is to be applied within the domains, under a cost function (3), given survey costs C, C0 , c1dh and c2dh , the smallest common value ϕ of gd−1 δ(Yˆd ), d = 1, . . . , D is obtained when for d = 1, . . . , D, h = 1, . . . , Hd , v u c1dh u u t c2dh PMdh g=1 2 Mdh S1dh − AC C peer-00616535, version 1 - 23 Aug 2011 SC Here we apply a domain-orientated approach to the design presented in previous section. It aims at precise estimation for each domain Ud of the population U (Kozak and Zieli´ nski, 2005). Let g = (g1 , . . . , gD )T be a vector of important weights of the domains. Following Kozak and Zieli´ nski (2005), the optimum design is the one under which the smallest common value ϕ of gd−1 δ(Yˆd ), d = 1, . . . , D, is obtained. Thus, we require coefficient of variation of the estimator Yˆd of the population total in the dth domain to satisfy the condition (Kozak and Zieli´ nski, 2005) mdh s 2 2 Ndhg S2dhg PMdh g=1 2 Ndhg S2dhg µ 2 Mdh Mdh S1dh PMdh 2 g=1 Ndhg S2dhg − (C − C0 )vd = PHe √ P √ −1 Yd c1dh D Mei Zei i=1 e=1 ve Ye 4 ¶ ACCEPTED MANUSCRIPT "r ³ 2 c1ei Mei S1ei where Zei = − PMei 2 k=1 Neik S2eik ´ q + c2ei PMei # 2 2 k=1 Neik S2eik and o F = (C − C0 )−1 ABT − diag(E) , RI P n T v = (v1 , . . . , vD )T is the eigenvector connected with the largest eigenvalue of the matrix 2 Mdh S1dh − M dh X 2 >0 Ndhg S2dhg g=1 (5) AN U Proof. To prove Theorem 1, a procedure developed by Niemiro and WesoÃlowski (2001) may be used. It was recently applied in sample allocation between domains and strata by Kozak and Zieli´ nski (2005). Consider the following Lagrange function: " −α TE DM µ ¶ Hd Hd 1 1 X wdh 1 X 2 Mdh S1dh − gd2 ϕ2 L=ϕ− λd 2 − xdh − 2 udh + Y m n Y dh dh d h=1 d h=1 d=1 D X " D H d XX # # (6) mdh (c1dh + ndh c2dh ) − (C − C0 ) d=1 h=1 P 2 2 dh where λd and α are the Lagrange multipliers, wdh = Mdh M g=1 Ndhg S2dhg , P 2 2 2 dh udh = Mdh S1dh , xdh = Mdh M g=1 Ndhg S2dhg , and Yd is the population total of Y in the dth domain. Differentiation of (6) with respect to mdh , ndh , λd , and α and solving the obtained equations yield the results presented in Theorem 1. A detailed proof may be obtained from the authors upon request. EP Remark 1. If any of the conditions (5) or any of the following conditions 2 ≤ ndh ≤ Ndh ; 2 ≤ mdh ≤ Mdh f or d = 1, . . . , D, h = 1, . . . , Hd , (7) is not fulfilled, the values of ndh and mdh from Theorem 1 are not real numbers, so they are not optimum. In such a case, the optimum ndh and mdh are the solution of the following numerical problem: AC C peer-00616535, version 1 - 23 Aug 2011 for each d = 1, . . . , D, and h = 1, . . . , Hd . SC where A = (A1 , . . . , AD )T , B = (B1 , . . . , BD )T and E = (E1 , . . . , ED )T , provided that n o minimize f (n1 , m1 ), . . . , (nD , mD ); ϕ = ϕ, ³ where nd = nd1 , . . . , ndHd ´T ³ and md = md1 , . . . , mdHd 5 ´T for d = 1, . . . , D ACCEPTED MANUSCRIPT subject to: Hd D X X RI P T "µ # ¶ µ ¶ Hd dh X 1 X Mdh 1 M 2 2 Mdh − mdh S1dh + Ndhg Ndhg − ndh S2dhg = gd2 ϕ2 Yd2 h=1 mdh ndh g=1 mdh (c1dh + ndh c2dh ) = C − C0 2 Mdh S1dh − M dh X SC d=1 h=1 2 Ndhg S2dhg > 0 f or each d = 1, . . . , D, and h = 1, . . . , Hd AN U 2 ≤ ndh ≤ Ndh ; 2 ≤ mdh ≤ Mdh f or d = 1, . . . , D, h = 1, . . . , Hd . Optimizing a design subject to constraints connected with domain precisions TE DM 4 Here we consider a question dual to the problem presented in previous section. We aim at minimizing a total survey cost C given in (3) subject to "µ # ¶ µ ¶ Hd dh X 1 X Mdh 1 M 2 2 Mdh − mdh S1dh + Ndhg Ndhg − ndh S2dhg = δd2 , Yd2 h=1 mdh ndh g=1 EP d = 1, . . . , D, (8) where δd is the fixed value of coefficient of variation of Yˆd . Thus, this time we consider a design in which we look for optimum values of ndh and mdh for which the constraint (8) is fulfilled and the cost (3) is minimum. AC C peer-00616535, version 1 - 23 Aug 2011 g=1 Theorem 2. When a population U is subdivided into D domains and stratified two-stage sampling with fixed sample size of secondary sampling units from primary sampling units is to be applied within the domains, under a cost function (3), given survey costs C0 , c1dh and c2dh , and under the condition (8) (for δd being fixed), the minimum total survey cost C is obtained when for d = 1, . . . , D, h = 1, . . . , Hd , 6 ACCEPTED MANUSCRIPT c2dh s PMdh g=1 2 Mdh S1dh − µ 2 2 Ndhg S2dhg PMdh g=1 2 Ndhg S2dhg ¶ P 2 dh 2 Mdh Mdh S1dh − M g=1 Ndhg S2dhg √ Yd c1dh mdh = T ndh = v u c1dh u u t PHd · RI P s Di 2 h=1 Mdh S1dh i=1 δd2 + Yd−2 PHd SC v v u M M di di ³ ´ u u u X X 2 2 2 2 Di = tc2di Mdi Mdi S1di − Ndik S2dik + tMdi Ndik S2dik k=1 k=1 − M dh X 2 Ndhg S2dhg >0 g=1 AN U 2 Mdh S1dh (9) for each d = 1, . . . , D, and h = 1, . . . , Hd . L = C0 + Hd D X X d=1 h=1 " TE DM Proof. Consider the following Lagrange function mdh (c1dh + ndh c2dh ) # Hd Hd ´ 1 ³ wdh 1 X 1 X 2 Mdh S1dh − δd2 (10) udh + − xdh − 2 + λd 2 Yd h=1 mdh ndh Yd h=1 d=1 D X EP where λd are the Lagrange multipliers and udh , wdh , and xdh are the same as defined in previous section. Differentiating of (10) with respect to mdh , ndh , and λd and solving the obtained equations lead to the results presented in Theorem 1. A detailed proof may be obtained from the authors upon request. AC C peer-00616535, version 1 - 23 Aug 2011 provided that Remark 2. If any of the conditions (9) or any of the conditions (7) is not fulfilled, the values of ndh and mdh from Theorem 2 are not real numbers, so they are not optimum. In such a case, the optimum ndh and mdh are the solution of the following numerical problem: n o PD minimize f (n1 , m1 ), . . . , (nD , mD ); ϕ = C0 + 7 d=1 PHd h=1 µ ¶ mdh c1dh +ndh c2dh , ACCEPTED MANUSCRIPT ³ where nd = nd1 , . . . , ndHd ´T ³ and md = md1 , . . . , mdHd ´T for d = 1, . . . , D subject to: 2 − Mdh S1dh M dh X RI P T "µ # ¶ µ ¶ Hd dh X 1 X Mdh 1 M 2 2 Ndhg Ndhg − ndh S2dhg = δd2 Mdh − mdh S1dh + Yd2 h=1 mdh ndh g=1 2 > 0 f or each d = 1, . . . , D, and h = 1, . . . , Hd Ndhg S2dhg SC g=1 AN U References EP TE DM Niemiro, W., WesoÃlowski, J. (2001), Fixed precision optimal allocation in two-stage sampling, Applicationes Mathematicae 23, 73-82. Kozak, M. (2005), On stratified two-stage sampling: Optimum stratification and sample allocation between strata and sampling stages. Model Assisted Statistics and Applications 1(1), 23-29. Kozak, M., Zieli´ nski, A. (2005). Sample allocation between domains and strata. International Journal of Applied Mathematics and Statistics 3, 1940. S¨arndal, C. E., Swensson, B., Wretman, J. (1992), Model Assisted Survey Sampling (Springer-Verlag, New York). Singh, S. (2003), Advanced Sampling Theory with Applications. How Michael ”Selected” Amy (Kluwer Academic Publishers, The Netherlands). AC C peer-00616535, version 1 - 23 Aug 2011 2 ≤ ndh ≤ Ndh ; 2 ≤ mdh ≤ Mdh f or d = 1, . . . , D, h = 1, . . . , Hd . 8