Author manuscript, published in "Statistics & Probability Letters 78, 8... 10.1016/j.spl.2007.09.057

Transcription

Author manuscript, published in "Statistics & Probability Letters 78, 8... 10.1016/j.spl.2007.09.057
Author manuscript, published in "Statistics & Probability Letters 78, 8 (2010) 970"
DOI : 10.1016/j.spl.2007.09.057
Accepted Manuscript
Stratified two-stage sampling in domains: Sample allocation between
domains, strata, and sampling stages
peer-00616535, version 1 - 23 Aug 2011
Marcin Kozak, Andrzej Zieli´nski, Sarjinder Singh
PII:
DOI:
Reference:
S0167-7152(07)00349-5
10.1016/j.spl.2007.09.057
STAPRO 4795
To appear in:
Statistics and Probability Letters
Received date: 18 April 2006
Revised date: 7 August 2007
Accepted date: 25 September 2007
Please cite this article as: Kozak, M., Zieli´nski, A., Singh, S., Stratified two-stage sampling in
domains: Sample allocation between domains, strata, and sampling stages. Statistics and
Probability Letters (2007), doi:10.1016/j.spl.2007.09.057
This is a PDF file of an unedited manuscript that has been accepted for publication. As a
service to our customers we are providing this early version of the manuscript. The manuscript
will undergo copyediting, typesetting, and review of the resulting proof before it is published in
its final form. Please note that during the production process errors may be discovered which
could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
RI P
T
Stratified two-stage sampling in domains:
sample allocation between domains, strata,
and sampling stages
Marcin Kozak ∗ Andrzej Zieli´
nski Sarjinder Singh
SC
Department of Biometry, Faculty of Agriculture and Biology, Warsaw Agricultural
University, Nowoursynowska 159, 02-776 Warsaw, Poland
Department of Biometry, Warsaw Agricultural University, Nowoursynowska 159,
02-776 Warsaw, Poland
AN
U
Abstract
TE
DM
In the paper, formulae for optimum sample allocation between domains, strata in
the domains, and sampling stages are presented for stratified two-stage sampling in
domains under fixed sample size of SSUs from PSUs.
Key words: domain-orientated approach, optimum sample allocation, survey cost.
1
Introduction
EP
Kozak (2005) presented basic concepts of stratified two-stage sampling
design, in which a population of primary sampling units is subdivided into
strata. He provided formulas for optimum sample allocation between strata
and sampling stages under two schemes of the design: (i) in which sample size
of secondary sampling units (SSUs) from primary sampling units (PSUs) is
fixed, and (ii) with self-weighting design in strata. Kozak and Zieli´
nski (2005),
on the other hand, presented basic concepts of a problem of sample allocation
between domains and strata in case when domains are subdivided into strata.
They considered (i) a so-called domain-orientated approach to the sample
allocation, in which one requires precise estimation in all the domains, and (ii)
sample allocation orientated towards minimizing total survey cost subject to
AC
C
peer-00616535, version 1 - 23 Aug 2011
Department of Statistics, St. Cloud State University, St. Cloud, MN, 56301, USA
∗ Corresponding author.
Email address: m.kozak@omega.sggw.waw.pl (Marcin Kozak).
Preprint submitted to Elsevier Science
3 October 2007
ACCEPTED MANUSCRIPT
Estimation for domains under stratified two-stage sampling in
the domains: basic ideas
U=
D H
dh
[
[d M
[
TE
DM
AN
U
Basic concepts of stratified sampling and two-stage sampling, which lay the
basis for the design introduced in this section, may be found, e.g., in S¨arndal
et al. (1992) or Singh (2003). Consider a population U comprising N elements.
The population is subdivided into D domains Ud (d = 1, . . . , D); each domain
is subdivided into Hd non-overlapping strata Udh ; finally, each stratum Udh is
subdivided into Mdh separate PSUs Udhg . This division can be presented as
Udhg , Ud ∩ Ud0 = ∅ f or d, d0 = 1, . . . , D, d 6= d0 ; and
d=1 h=1 g=1
Udh ∩ Udh0 = ∅ f or d = 1, . . . , D, h, h0 = 1, . . . , Hd , h 6= h0 ; and
EP
Udhg ∩ Udhg0 = ∅ f or d = 1, . . . , D, h = 1, . . . , Hd , g, g 0 = 1, . . . , Mdh , g 6= g 0 .
The gth PSU from the hth stratum in the dth domain comprises Ndhg SSUs,
which are the population elements. Let Nd indicate the number of SSUs in the
dth domain and Ndh indicate the number of SSUs in the hth stratum of the
dth domain.
AC
C
peer-00616535, version 1 - 23 Aug 2011
2
SC
RI P
T
fixed level of precision of estimation in the domains. In this paper, we introduce
a hybrid of these two designs, namely stratified two-stage sampling in domains.
Such a design can be of practical use when a population is subdivided into
domains, each domain comprising some number of strata consisting of PSUs. A
simple random sample of SSUs is to be taken without replacement from PSUs.
We consider a situation in which sample size of SSUs from PSUs is fixed; in
such a case, one obtains a sample of fixed size (and fixed total cost). We give
formulas for sample allocation between domains, strata and sampling stages
(i) under domain-orientated approach, and (ii) orientated towards minimizing
a total survey cost.
Let a population parameter investigated be the population total of Y , Y being
a characteristic studied. For the dth domain its estimator is given by
Yˆd =
Hd
X
h=1
Yˆdh =
ndhg
Hd
dh
X
X
Mdh m
Ndhg X
h=1
mdh
g=1
ndhg
i=1
2
ydhgi
(1)
ACCEPTED MANUSCRIPT
RI P
T
where Yˆd is the estimator of Yd , Yd being the population total of the variable
Y restricted to the dth domain; Yˆdh is the estimator of Ydh , Ydh being the
population total of the variable Y restricted to the hth stratum of the dth
domain; mdh is the sample size of PSUs from the hth stratum of the dth
domain; ndhg is the sample size of SSUs from the gth PSU of the hth stratum
of the dth domain; and ydhgi is the Y value in the ith SSU (population element)
of the gth PSU from the hth stratum in dth domain. In both cases (i.e., when
sampling PSUs from strata and when sampling SSUs from PSUs), simple
random sample is to be taken without replacement.
h=1
mdh
SC
·
Hd
X
Mdh
(Mdh −
2
mdh )S1dh
P
¸
dh
X
1 M
2
+
Ndhg (Ndhg − ndh )S2dhg (2)
ndh g=1
AN
U
V (Yˆd ) =
PN
TE
DM
dhg
2
2
dh
where S1dh
= (Mdh − 1)−1 M
g=1 (Ydhg − Y dh ) , Ydhg =
j=1 ydhgj ,
−1 PMdh PNdhg
2
−1 PNdhg
2
Y dh = Ndh
g=1
j=1 ydhgj , S2dhg = (Ndhg − 1)
j=1 (ydhgj − Y dhg ) ,
−1 PMdhg
Y dhg = Ndhg
j=1 ydhgj .
Note again that sample sizes ndh in the variance (2) refer to sample sizes ndhg ,
which are assumed to be the same for all g = 1, . . . , Ndh in a particular section
domain d×stratum h. Hence, for the sake of convenience, we write ndh instead
of ndhg , keeping in mind that ndh is the sample size of SSUs from every gth
PSU of the hth stratum in the dth domain. An ordinary unbiased estimator
2
of the variance (2) is obtained by replacing the population quantities S1dh
and
2
S2dhg with their sample estimators; the summation in (2) is to be done by
sampled PSUs in each hth stratum from the dth domain.
EP
The coefficient of variation of the estimator Yˆd , say δ(Yˆd ), is
q
δ(Yˆd ) =
V (Yˆd )
, d = 1, . . . , D
Yd
AC
C
peer-00616535, version 1 - 23 Aug 2011
Let us consider a two-stage sampling scheme in which we deal with a fixed size
of sample of SSUs. In our design, it consists in sampling the same number ndh
( ndhg = ndh for each combination of d = 1, . . . , D and h = 1, . . . , Hd ) of SSUs
from PSUs in each section domain d × stratum h. Under such a design, the
variance of the estimator (1) is given by (Kozak, 2005)
In this paper, we understand optimum conditions of a design as the ones for
which some function of δ(Yˆd ) is minimum. Let the overall survey cost C be
C = C0 +
Hd
D X
X
¶
µ
mdh c1dh + ndh c2dh
d=1 h=1
3
(3)
ACCEPTED MANUSCRIPT
Optimizing a design under domain-orientated approach
RI P
3
T
where C0 is the fixed survey cost, c1dh is the cost of selecting one PSU from
the hth stratum of the dth domain, and c2dh is the cost of obtaining the
information on Y value in one SSU from the hth stratum of the dth domain.
AN
U
δ(Yˆd ) = gd ϕ, d = 1, . . . , D
(4)
TE
DM
Then, our aim is to find optimum values of ndh and mdh (d = 1, . . . , D, h =
1, . . . , Hd ) under fixed overall survey cost (3) equal C (given c1dh and c2dh ) so
that the condition (4) is satisfied and the common value ϕ is minimum. We
will optimize the design based on the assumption that the survey variable is
the same as the auxiliary variable used to allocate the survey cost. Of course,
in practice, it is an untrue situation; instead of the population values, the
quantities originating from recent censuses or previous/pilot surveys are used.
s
ndh =
EP
Theorem 1. When a population U is subdivided into D domains and stratified two-stage sampling with fixed sample size of secondary sampling units
from primary sampling units is to be applied within the domains, under a
cost function (3), given survey costs C, C0 , c1dh and c2dh , the smallest common value ϕ of gd−1 δ(Yˆd ), d = 1, . . . , D is obtained when for d = 1, . . . , D,
h = 1, . . . , Hd ,
v
u
c1dh u
u
t
c2dh
PMdh
g=1
2
Mdh S1dh
−
AC
C
peer-00616535, version 1 - 23 Aug 2011
SC
Here we apply a domain-orientated approach to the design presented in
previous section. It aims at precise estimation for each domain Ud of the
population U (Kozak and Zieli´
nski, 2005). Let g = (g1 , . . . , gD )T be a vector
of important weights of the domains. Following Kozak and Zieli´
nski (2005),
the optimum design is the one under which the smallest common value ϕ of
gd−1 δ(Yˆd ), d = 1, . . . , D, is obtained. Thus, we require coefficient of variation
of the estimator Yˆd of the population total in the dth domain to satisfy the
condition (Kozak and Zieli´
nski, 2005)
mdh
s
2
2
Ndhg
S2dhg
PMdh
g=1
2
Ndhg S2dhg
µ
2
Mdh Mdh S1dh
PMdh
2
g=1 Ndhg S2dhg
−
(C − C0 )vd
=
PHe √
P
√
−1
Yd c1dh D
Mei Zei
i=1
e=1 ve Ye
4
¶
ACCEPTED MANUSCRIPT
"r
³
2
c1ei Mei S1ei
where Zei =
−
PMei
2
k=1 Neik S2eik
´
q
+
c2ei
PMei
#
2
2
k=1 Neik S2eik
and
o
F = (C − C0 )−1 ABT − diag(E) ,
RI P
n
T
v = (v1 , . . . , vD )T is the eigenvector connected with the largest eigenvalue of
the matrix
2
Mdh S1dh
−
M
dh
X
2
>0
Ndhg S2dhg
g=1
(5)
AN
U
Proof. To prove Theorem 1, a procedure developed by Niemiro and WesoÃlowski
(2001) may be used. It was recently applied in sample allocation between
domains and strata by Kozak and Zieli´
nski (2005). Consider the following
Lagrange function:
"
−α
TE
DM
µ
¶
Hd
Hd
1
1 X
wdh
1 X
2
Mdh S1dh
− gd2 ϕ2
L=ϕ−
λd 2
− xdh − 2
udh +
Y
m
n
Y
dh
dh
d h=1
d h=1
d=1
D
X
" D H
d
XX
#
#
(6)
mdh (c1dh + ndh c2dh ) − (C − C0 )
d=1 h=1
P
2
2
dh
where λd and α are the Lagrange multipliers, wdh = Mdh M
g=1 Ndhg S2dhg ,
P
2
2
2
dh
udh = Mdh
S1dh
, xdh = Mdh M
g=1 Ndhg S2dhg , and Yd is the population total of
Y in the dth domain. Differentiation of (6) with respect to mdh , ndh , λd , and
α and solving the obtained equations yield the results presented in Theorem
1. A detailed proof may be obtained from the authors upon request.
EP
Remark 1. If any of the conditions (5) or any of the following conditions
2 ≤ ndh ≤ Ndh ; 2 ≤ mdh ≤ Mdh f or d = 1, . . . , D, h = 1, . . . , Hd ,
(7)
is not fulfilled, the values of ndh and mdh from Theorem 1 are not real numbers,
so they are not optimum. In such a case, the optimum ndh and mdh are the
solution of the following numerical problem:
AC
C
peer-00616535, version 1 - 23 Aug 2011
for each d = 1, . . . , D, and h = 1, . . . , Hd .
SC
where A = (A1 , . . . , AD )T , B = (B1 , . . . , BD )T and E = (E1 , . . . , ED )T , provided that
n
o
minimize f (n1 , m1 ), . . . , (nD , mD ); ϕ = ϕ,
³
where nd = nd1 , . . . , ndHd
´T
³
and md = md1 , . . . , mdHd
5
´T
for d = 1, . . . , D
ACCEPTED MANUSCRIPT
subject to:
Hd
D X
X
RI P
T
"µ
#
¶
µ
¶
Hd
dh
X
1 X
Mdh
1 M
2
2
Mdh − mdh S1dh +
Ndhg Ndhg − ndh S2dhg = gd2 ϕ2
Yd2 h=1 mdh
ndh g=1
mdh (c1dh + ndh c2dh ) = C − C0
2
Mdh S1dh
−
M
dh
X
SC
d=1 h=1
2
Ndhg S2dhg
> 0 f or each d = 1, . . . , D, and h = 1, . . . , Hd
AN
U
2 ≤ ndh ≤ Ndh ; 2 ≤ mdh ≤ Mdh f or d = 1, . . . , D, h = 1, . . . , Hd .
Optimizing a design subject to constraints connected with domain precisions
TE
DM
4
Here we consider a question dual to the problem presented in previous
section. We aim at minimizing a total survey cost C given in (3) subject to
"µ
#
¶
µ
¶
Hd
dh
X
1 X
Mdh
1 M
2
2
Mdh − mdh S1dh +
Ndhg Ndhg − ndh S2dhg = δd2 ,
Yd2 h=1 mdh
ndh g=1
EP
d = 1, . . . , D,
(8)
where δd is the fixed value of coefficient of variation of Yˆd . Thus, this time
we consider a design in which we look for optimum values of ndh and mdh for
which the constraint (8) is fulfilled and the cost (3) is minimum.
AC
C
peer-00616535, version 1 - 23 Aug 2011
g=1
Theorem 2. When a population U is subdivided into D domains and stratified two-stage sampling with fixed sample size of secondary sampling units
from primary sampling units is to be applied within the domains, under a cost
function (3), given survey costs C0 , c1dh and c2dh , and under the condition (8)
(for δd being fixed), the minimum total survey cost C is obtained when for
d = 1, . . . , D, h = 1, . . . , Hd ,
6
ACCEPTED MANUSCRIPT
c2dh
s
PMdh
g=1
2
Mdh S1dh
−
µ
2
2
Ndhg
S2dhg
PMdh
g=1
2
Ndhg S2dhg
¶
P
2
dh
2
Mdh Mdh S1dh
− M
g=1 Ndhg S2dhg
√
Yd c1dh
mdh =
T
ndh =
v
u
c1dh u
u
t
PHd
·
RI P
s
Di
2
h=1 Mdh S1dh
i=1
δd2 +
Yd−2
PHd
SC
v
v
u
M
M
di
di
³
´ u
u
u
X
X
2
2
2
2
Di = tc2di Mdi Mdi S1di
−
Ndik S2dik
+ tMdi
Ndik
S2dik
k=1
k=1
−
M
dh
X
2
Ndhg S2dhg
>0
g=1
AN
U
2
Mdh S1dh
(9)
for each d = 1, . . . , D, and h = 1, . . . , Hd .
L = C0 +
Hd
D X
X
d=1 h=1
"
TE
DM
Proof. Consider the following Lagrange function
mdh (c1dh + ndh c2dh )
#
Hd
Hd
´
1 ³
wdh
1 X
1 X
2
Mdh S1dh
− δd2 (10)
udh +
− xdh − 2
+
λd 2
Yd h=1 mdh
ndh
Yd h=1
d=1
D
X
EP
where λd are the Lagrange multipliers and udh , wdh , and xdh are the same as
defined in previous section. Differentiating of (10) with respect to mdh , ndh ,
and λd and solving the obtained equations lead to the results presented in
Theorem 1. A detailed proof may be obtained from the authors upon request.
AC
C
peer-00616535, version 1 - 23 Aug 2011
provided that
Remark 2. If any of the conditions (9) or any of the conditions (7) is not
fulfilled, the values of ndh and mdh from Theorem 2 are not real numbers,
so they are not optimum. In such a case, the optimum ndh and mdh are the
solution of the following numerical problem:
n
o
PD
minimize f (n1 , m1 ), . . . , (nD , mD ); ϕ = C0 +
7
d=1
PHd
h=1
µ
¶
mdh c1dh +ndh c2dh ,
ACCEPTED MANUSCRIPT
³
where nd = nd1 , . . . , ndHd
´T
³
and md = md1 , . . . , mdHd
´T
for d = 1, . . . , D
subject to:
2
−
Mdh S1dh
M
dh
X
RI P
T
"µ
#
¶
µ
¶
Hd
dh
X
1 X
Mdh
1 M
2
2
Ndhg Ndhg − ndh S2dhg = δd2
Mdh − mdh S1dh +
Yd2 h=1 mdh
ndh g=1
2
> 0 f or each d = 1, . . . , D, and h = 1, . . . , Hd
Ndhg S2dhg
SC
g=1
AN
U
References
EP
TE
DM
Niemiro, W., WesoÃlowski, J. (2001), Fixed precision optimal allocation in
two-stage sampling, Applicationes Mathematicae 23, 73-82.
Kozak, M. (2005), On stratified two-stage sampling: Optimum stratification
and sample allocation between strata and sampling stages. Model Assisted
Statistics and Applications 1(1), 23-29.
Kozak, M., Zieli´
nski, A. (2005). Sample allocation between domains and
strata. International Journal of Applied Mathematics and Statistics 3, 1940.
S¨arndal, C. E., Swensson, B., Wretman, J. (1992), Model Assisted Survey
Sampling (Springer-Verlag, New York).
Singh, S. (2003), Advanced Sampling Theory with Applications. How Michael
”Selected” Amy (Kluwer Academic Publishers, The Netherlands).
AC
C
peer-00616535, version 1 - 23 Aug 2011
2 ≤ ndh ≤ Ndh ; 2 ≤ mdh ≤ Mdh f or d = 1, . . . , D, h = 1, . . . , Hd .
8