Optimal allocation algorithm for a multi-way - CROS

Transcription

Optimal allocation algorithm for a multi-way - CROS
Optimal allocation algorithm for a multi-way
stratification design
Paolo Righi1, Piero Demetrio Falorsi2
1
Italian National Statistical Institute, e-mail: parighi@istat.it
Italian National Statistical Institute, e-mail: falorsi@istat.it
2
Abstract
The paper shows a sampling allocation algorithm for obtaining planned sample size for
domains belonging to different partitions of the population guaranteeing the sampling
errors of domain estimates be lower than given thresholds. The algorithm is innovative
because may be applied when a multi-way stratified design is used. That kind of design is
useful when the overall sample size is bounded and consequently the standard solution of
using a stratified sample with the strata given by cross-classification of variables defining
the different partitions (one-way design) is not feasible since the number of strata is
larger than the overall sample size. Moreover, the one-way designs could define too
detailed strata with a small number of population units, leading to a heavy response
burden for such units that have to be drawn in every survey occasion. Multi-way design
overcomes this critical aspect and it allows to allocate the sample in a more efficient way
of one-way design, because it has not constraints on the number of sample units in the
cross-classified strata. The algorithm is implemented to cover a multivariate-multidomain
allocation usually considered in the large scale surveys.
Keywords: Optimal allocation, Multi-way stratification, Large scale survey
1. Introduction
Large scale surveys in Official Statistics usually produce estimates for a set of parameters
by a huge number of highly detailed estimation domains. These domains generally define
not nested partitions of the target population. When the domain indicator variables are
available for each sampling unit at the sampling framework level, the survey sampling
planner could try to select a sample covering each domain. Using this kind of design
direct estimates can be obtained for each domain. Furthermore, it can be possible to
control the sampling errors at domain level as well. A standard solution to obtain planned
sample sizes for the domains of two or more partitions is to use a stratified simple
random sampling in which strata are identified by cross-classification of variables
defining the different partitions. In the following, this design will be denoted as crossclassification design or one-way design. However, in many practical situations the crossclassification design is unsuitable since it needs the selection of at least a number of
sampling units as large as the product of the number of categories of the stratification
variables. Cochran well illustrates (1977, p. 124) this problem giving a clear example in
which the cross-classification design is unfeasible. In some circumstances even though it
may be applied, the cross-classification design is quite expensive. For instance, in the
business surveys, following the European Council Regulation on Structural Business
Statistics (SBS) establishes that the parameters of interest refer to estimation domains
defined by three different partition of the population of enterprises (Economic activity
class NACE classification, Economic activity group NACE classification by Size class,
Economic activity division NACE classification by Region). In Italy, the total number of
estimation domains is about 1,800; while the number of non-empty strata of the crossclassification design is larger than 37,000.
In the social context, an example regards the Italian Graduates’ Career Survey conducted
by the Italian National Statistical Institute (ISTAT). This survey disseminates estimates
on the employment status of graduates three years after the degree at level of several
types of domains: gender, course, group of course, university center, defining an overall
number of domains greater than 600. The usual sample design considers the crossclassification of all the variables defining the domains, obtaining more than 2,200 strata.
There are further drawbacks in the cross-classification design such as: a greater statistical
burden due to the high level of detail entailing domains with very few units that will be
contacted in every survey occasion (this is the case of the SBS surveys); an
ineffectiveness of the sample allocation when we have to select al least two units per
stratum (for obtaining unbiased variance estimates) because in some strata, according to
the used allocation method, a not integer or less than two sample units could be required.
In order to overcome some problems of cross-classification designs, an easy strategy is to
drop one or more stratifying variables or to group some of the categories. Nevertheless,
some planned domains become unplanned and some of them can have small or null
sample size.
A second approach is to use a multi-way stratified sampling design. This design keeps
under control the sample size in all the domains without using cross-classification
approach. There are several methods implementing multi-way design, but, usually, in the
large scale surveys they have problem of application (Falorsi et al., 2006). This is not the
case of the cube method (Deville and Tillé, 2004) that may select a random sample of
multi-way stratified design for a large population and a lot of domains. Given the
inclusion probability vector, the method implemented by an algorithm selects a sample
satisfying the expected sample size at domain level. In this context, the paper illustrates a
sampling allocation algorithm useful for obtaining planned sample size (or equivalently
the inclusion probability vector) for domains belonging to different partitions of the
population and in order to guarantee the sampling errors of domain estimates be lower
than given thresholds. The algorithm covers the multivariate-multidomain problem is
implemented to take into account the selection is made by means of the cube method. In
the following, section 2 introduces the parameter of interest, section 3 describes the
sampling strategy, section 4 gives the details of the proposed algorithm, while section 4 is
devoted to a Monte Carlo simulation on real survey data.
2. Parameters of interest
Let us denote with U a population of N elements and with b a specific partition of U
(b=1,…, B) in which b-th partition defines M b different non overlapping domains, U bd
(d=1, …, M b ) , of size N bd being
∑d =1 N bd = N
Mb
and, finally let ∑b =1 M b = Q be the
B
overall number of domains.
Let yr ,k and bd δ k denote respectively the value of the r-th (r =1, …, R) variable of
interest in the k-th population unit and the domain membership indicator, being
if k ∈ U bd and bd δ k = 0 , otherwise. Let us suppose that the
each unit in the population.
The parameters of interest are the M = Q × R domains totals
t =
bd r
∑
k ∈U
y r ,k
bd
δk =
∑
k ∈U bd
y r ,k
bd δ k
bd δ k
=1
values are known for
(r = 1,…,R ; b=1,…, B; d=1,…, M b ).
(2.1)
The expression (2.1.) defines a multivariate-multidomain problem since there are R
variables of interest (multivariate aspect) and Q > 1 domains (multidomain aspect).
3. Sampling strategy
We describe the sampling allocation algorithm for a multi-way stratification design with
the Horvitz-Thompson (H-T) estimator. However, applications of generalized regression
estimator is straightforward (Falorsi and Righi, 2008).
Multi-way stratification designs can be treated in the context of the balanced sampling.
Following the design based (or model assisted) approach a sample is balanced when the
H-T estimates of the auxiliary variable totals are equal to their known population totals
(Deville and Tillé, 2004).
For defining the balanced sampling we introduce the general definition of sampling
design as a probability distribution p(.) on the set S of all the subset s of the population U
such that ∑s∈S p ( s ) = 1 , where p(s) is the probability of the sample s to be drawn. Each
set s may be represented by the outcome λ ′ = (λ1 , ..., λk , ..., λ N ) of a vector of N random
variables. Let π ′ = (π 1 ,..., π k ,..., π N ) be the vector of inclusion probabilities, where
π = E p ( λ ) = ∑s∈S p( s )λ , being E p (⋅) the expected value over repeated sampling. Let
z ′k = ( z1k ,..., z hk ,..., zQk ) be a vector of Q auxiliary variables available for each population
unit. The sampling design p(s) with inclusion probabilities π is said to be balanced with
respect to the Q auxiliary variables if and only if tˆz ,ht = t z (balancing equations) for all
s∈S such that p(s)>0 where tˆz,ht = ∑k∈U z k λ k a k
denote the H-T estimates of
t z = ∑k∈U z k being a k = 1 / π k .
We assume that a vector of inclusion probabilities π is such that
∑π
k ∈U bd
k
= nbd
(b=1,…, B; d=1,…, M b ),
(3.1)
where nbd is the sample size of the domain U bd , being nbd an integer number. Multiway stratification design represents a special case of balanced design where for unit k the
auxiliary variable vector is given by
b =1
b= B
647
48 647
48
z′k = (0,..., π k ,...,0,..., 0,..., π k ,...,0) = π k (11δ k ,..., bd δ k ,..., BM B δ k ) = π k δ′k .
14444244443
(3.2)
Q
The expression (3.2) defines the z k as vectors of (Q-B) zeros and with B entries equal to
π k in the places indicating the domains which the unit k belongs to. When defining the
z k vector as (3.2), if condition (3.1) holds, the selection of sample satisfying the system
of balancing equations, guarantees that the nbd values are non random quantities.
Deville and Tillé (2004) proposed the cube method that allows one the random selection
of balanced samples involving a large set of auxiliary variables of type (3.2) and with
respect to different vectors of inclusion probabilities. The cube method is implemented
by an enhanced algorithm for large data sets (Chauvet and Tillé, 2006) available in a free
software
code
that
may
be
downloaded
in
the
website
http://www.insee.fr/fr/nom_df_met/outils_stat/cube/accueil_cube.htm . It is also available
an R-package at http://cran.r-project.org/web/packages/sampling/index.html .
4. Allocation algorithm for multi-way stratification design
The aim of paper is to define procedure defining the inclusion probabilities π k , and the
domain sample sizes nbd , which attempts to minimize the overall sample size, n,
guaranteeing that the sampling variances are lower than prefixed level of precision
thresholds, bd Vr : V p ( bd tˆr ,ht | tˆz ,ht = t z ) ≤ bd Vr (b=1,…, B; d=1,…, M b ; r=1,…, R) where
bd Vr
is the variance threshold and V p ( bd tˆr ,ht | tˆz ,ht = t z ) is the variance of the H-T
estimator bd tˆr ,ht of bd t r given tˆz ,ht = t z (balanced sample).
We use an approximated expression of the variance of balanced sampling given by
Deville and Tillé (2005), assuming that, through Poisson sampling, the vector (tˆr ,ht , tˆz′,ht )′
has approximately a multinormal distribution. The authors suggest the sampling variance
can be approximated by
V p ( bd tˆr ,ht | tˆz , ht = t z ) = V p ( bd tˆr ,ht + (t z − tˆz ,ht )′ B z , y ) = V p ( bd tˆr , ht − tˆz′,ht B z , y ) =
= V p ( ∑ a k ( bd δ k y r ,k − z ′k B z , y )) ≅
k∈s
where
bd
N
∑
N − Q k∈U
 1


− 1 bdη r2,k , (4.1)
πk

η = ( bd δ k yr , k − z′k B z , y ) 2 with
2
r ,k
B z , y = [∑k ∈U z k z ′k (1 / π k − 1)] −1
∑
k ∈U bd
δ k z k yr ,k (1 / π k − 1) .
In order to define the inclusion probabilities π k a two steps procedure is executed: in the
first step, denoted as optimization, the preliminary inclusion probabilities, π k′ , are
determined solving a minimum constrained problem; in the second step, denoted as
calibration, the final inclusion probabilities, π k , are obtained as a slight modification of
the π k′ . The calibration step assures the domain sample sizes nbd are integers.
As illustrated in the following, the π k values may be expressed as implicit functions of
the unknown residuals
bd
η r2,k . But, in real survey context, the determination of the
inclusion probabilities π k may be done using the predictions bd ηˆr2,k instead of bd η r2,k .
This is a general problem concerning the planning of the sampling designs, because the
variances are generally unknown quantities that may be suitably estimated.
We describe the allocation algorithm first assuming bd η r2,k as they were known. The
allocation takes into account a purely design based variance. Later we consider the
uncertainty on the bd ηˆ r2,k values leading to consider the anticipated variance (Isaki and
Fuller, 1982).
4.1. Sampling allocation algorithm when
bd
η r2,k are known
First step: Optimization
The inclusion probabilities π k′ can be defined as solution of the following non linear
programming problem with N unknowns, π k′ , and ( N + Q × R ) constraints



Min  ∑ π k′ 
 k∈U 



 1

(4.2)
 N
 − 1 bdη r2, k ≤ bdVη , r (b = 1,..., B; d = 1,..., M b ; r = 1,..., R)
 N − Q k∑
∈U  π k′



0 < π k′ ≤ 1 (k = 1,..., N )
The problem (4.2) without the inequalities on the inclusion probabilities has been dealt
with by Bethel (1989) that invokes the Kuhn-Tucker theorem to show that there exists a
solution to the above problem. He describes a simple algorithm and discusses its
convergence properties. Chromy (1987) develops an algorithm, suitable for automated
spreadsheets but without an explicit proof that always converges. Originally the
algorithm it has been for the multivariate allocation in stratified surveys. In general
allows one to find the unknown values vh > 0 (h=1,2…) which represent the solution of
the following non linear problem Min (∑hν h ) under the constraints
∑h Arh
ν h ≤ Ar ,
where Arh and Ar (r=1,2,…) are known positive quantities.
In our context we have to take into account two modifications. The first is needed for
guaranteeing the inequalities 0 < π k′ ≤ 1 ( k = 1, ..., N ) are respected. The second
modification is related to the dependence of
2
bd η r ,k
on B z , y that is function of the
unknown terms. For our purpose we develop the square of
bd
η r2, k = bd δ k yr2, k + π k2 g k2 − 2 π k
bd
2
bd η r ,k
δ k yr , k g k ,
being g k = δ′k B z , y . Then the inequality in (4.2) is equivalent to
as
 1

N

bd δ k yk2 ≤ bdV y ,r .
−
1
∑
k
′
N −Q
πk

where
(4.3)
[
]
N
∑k 2 (1 − π k′ )bd δ k yr ,k g k − ∑k π k′ (1 − π k′ ) g k2 . (4.4)
N −Q
To face this problem we launch Chromy algorithm several time as follows. We fix the
inclusion probabilities at initial values (for instance 0.5) in the right side of the (4.4) and
we find the solution of the (4.3) by means of the Chromy algorithm. In the iterative
process the bdV y ,r do not change. When the solution is found the new set of inclusion
Vy , r =bdVη , r +
bd
probabilities are plugged in the
V y ,r and the Chromy algorithm is launch again. When
bd
the bdV y ,r do not modify from a lunch to another the inclusion probabilities are defined.
After the Initialization, the Chromy algorithm finds the inclusion probabilities, in each
launch, by iterating the two actions of Calculus and Check. As far as the convergence
issue is concerned, it is worthwhile to note that the Chromy’s algorithm have been mostly
used for stratified sampling design, and indeed, the documentation refers to stratified
samples. Let us note that the modification of the Chromy’s algorithm, herein proposed,
treats the sampling units as strata and the resulting allocation, being fractional, defines
the inclusion probabilities.
Initialization: at initial iteration ( τ = 0 ), set τ γ k = 1 (k=1, …, N).
Calculus: the generic iteration ( τ = 1,2,... ) consists of a sequence of steps denoted with
u ( = 0, 1, 2, ...) .
N
τ ,u
2 τ
− At initial step ( u = 0 ), set bd
φr = 1 and calculate bdτ V0r =
∑
bdη r ,k γ k .
N − Q k∈U
− At subsequent steps (u=1,2,…), calculate the values of the following equations
1/ 2
Mb R
B

N
2 
τ ,u
.
π k = (1−τ γ k ) + τ γ k
∑
∑
∑
bd φr bdη r , k 
N − Q b =1 d =1 r =1


N
1
τ ,u
τ ,u
τ
τ ,u
2 τ
∑
bd Vr =
bdη r ,k γ k , and bd V r′ = bd V r + bd V0 r .
τ ,u
N − Q k∈U
πk
τ ,u
(4.5)
(4.6)
− If the following two conditions:
τ ,u
τ ,u
τ ,u
(4.7)
bd Vr′ ≤ bd Vr and bd φr ( bd Vr′ − bd Vr ) = 0 ,
are respected (for all b=1, …, B; d=1, …, M b ; r=1, …, R) then the action of
Calculus stops and the inclusion probabilities τ π k are those calculated in equation
(4.5). Otherwise, the updated quantities
τ ,u +1
bd
τ ,u +1
bd
φr are computed
τ ,u
φr = bd
φr [τbd,u Vr /(τbd,u Vr′ − bdτ Vr )] 2
and the equations (4.5) and (4.6) are calculated at u+1, over and over again with
τ , u +1
τ ,u
bd φr replacing bd φr until conditions (4.7) are respected.
Check: if the condition τ π k ≤ 1 is true for all k , then the algorithm stops and the π k′
values are set equal to π k′ =τ π k . Otherwise the
τ +1
γ k = 1 if τ π k ≤ 1 and
τ
γ k values are updated as
τ +1
γ k = 0 if τ π k > 1. The calculus is iterated at τ + 1 with
τ +1
γk
τ
replacing γ k .
A SAS macro that allows one to solve the problem (4.2) has been developed by the
authors of this paper and may be released on demand.
4.2. Sampling allocation algorithm when
2
bd η r ,k
are predicted
In practice, yr ,k and bd η r2,k are unknown and we must use a prediction in the algorithm
achieved by means of a statistical model. The algorithm has to take into account the
model uncertainty. Let ξ be the superpopulation model
yk ,r + uk
 yk ,r = ~
,

2
2
Eξ (u k ) = 0 ∀k ; Eξ (u k ) = σ k ; Eξ (u k , ul ) = 0 ∀k ≠ l
being ~yk ,r = x′k β . We consider a generic model and do not make any reference to the
previous model underlines the balanced sampling.
Then, bd η r ,k = bd δ k ( ~
yr ,k + uk ) − z′k B z , y and an upward approximation of the anticipated
variance is given by
 1

 1

AV p ( bd tˆr , ht | tˆz ,ht = t z ) =& ∑ k ∈U bd η~r2,k  − 1 + ∑k ∈U bd δ k σ r2,k  − 1
(4.8)
πk

πk

The (4.8) outlines in this case the input variance has a term of variability depending on
the explicative power of the prediction model.
5. Simulation
We have made a Monte Carlo experiment for an evaluation of the performances of the
algorithm, in particular to verify the convergence properties. We have used a
subpopulation of universe of the Italian Graduates’ Career Survey already introduced in
section 1. The survey is based on about 300,000 of graduates and produces estimates for
8 types of domains (8 partition of the population). In particular two very detailed
partitions are not nested, while the rest of partitions are aggregation of these two domain
types. Then the current stratification is based on the cross-classification given by the
variables degree by sex (first partition) and university by subject area degree (second
partition). More than 6,000 strata are obtained while for the two partitions we have
respectively about 160 and 90 domains. The sample allocation is defined to control the
sampling errors of two driving variables: the employed status (yes/no), denoted by y1 and
for the not employed graduates actively seeking work (yes/no) variable, denoted by y2 .
The sample allocation uses the standard Chromy algorithm according to Bethel approach
(1989). The H-T estimator is currently used to estimate the totals of the two variables at
domain level. In the simulation, we have selected a subpopulation of 3.427 graduates
covering respectively 20 and 15 domains of the two partitions. The subpopulation
distributions on the two partitions are skewed as the overall population. Table 1 shows
the number of domains by the percentage of graduates of the subpopulation.
Table 1: Number of domains by percentage class of graduates
Class percentage of graduates in a domain
Partition
>50%
>1%
15%⁄8%
8%⁄5%
5%⁄1%
First
1 (50.6%)
2
0
10
7
Second
1(53.3%)
2
1
9
1
Total
20
15
We may see that more the 50% of graduates belong to a single domain in the first and
second partition. Moreover, for the first partition 10 domains include a percentage of
graduates between 5% and 1%. In the second partition the number of domains with such
percentage is 9. The proposed approach needs a prediction of the binary variables y1
and y2 . We use a logistic model exploiting some auxiliary variables known at population
level. We introduce briefly the models for the two variables because the aim of the paper
is mainly to verify the convergence of the algorithm. Nevertheless, the choice of the
prediction model is crucial for reducing the AV in (4.8). For the two models we used the
following auxiliary variables: degree mark, sex, age class and an aggregation of subject
area degree. The aggregation in the last variable is different in two models. Then we
fixed the precision thresholds in terms of coefficient of variation (Table 2). These values
have been really given for the last sample allocation of the survey.
Table 2: Coefficient of variation threshold (%) for the
y1 and y 2 variables by the types of domains
Partition
y1
y2
First
13.2
21.0
Second
12.2
20.0
Allocation when bd η r2,k are known
Although we are in an ideal condition, the experiment resembles the case of a use of a
highly explicative model. The algorithm converges after six launches obtaining a sample
size of about 171 units rounded by calibration to 182. Table 3 gives expected coefficient
of variation according to (4.1) and the values obtained by means a Monte Carlo
experiment. In the simulation a 1,000 balanced sample of 182 units with the inclusion
probabilities resulting by the algorithm have been selected. For the sake of brevity we
show the results for the first partition (Table 3).
Table 3: Sample allocation, expected and simulated Coefficient of variation (%) of the HT estimates of the totals of y1 and y2 in the first partition of domains
First partition
domain
Population
size
1
91
Sample
size
10
Expected
CV(%) y1
7.2
Expected
CV(%)
y2
9.3
Simulated CV(%) y1
7.6
Simulated CV(%)
y2
9.7
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
97
52
380
136
3
65
64
1733
14
339
33
120
8
150
2
18
22
18
82
Allocation when
2
bd η r ,k
When the terms
2
bd η r ,k
11
7
25
9
2
9
9
20
4
20
7
9
3
12
1
5
5
5
9
8.7
1.3
7.4
3.1
7.2
6.4
2.9
9.2
0.1
9.7
1.7
2.4
0.2
8.6
1.8
5.8
0.1
1.7
7.0
14.5
7.0
7.7
9.3
13.0
6.9
14.9
16.0
12.2
13.3
13.0
6.8
11.5
11.2
11.2
15.9
5.0
9.3
10.9
9.0
1.4
7.9
3.3
9.0
6.5
3.1
9.7
0.1
10.1
1.8
2.5
0.2
8.6
2.6
6.0
0.1
1.8
7.2
14.4
7.2
7.9
9.4
16.2
6.9
15.1
15.7
12.7
13.1
13.3
7.0
12.2
10.9
15.7
15.9
5.2
9.5
10.9
are predicted
are predicted we take into account the expression (4.8). In this
experiment we used an estimation of σ r2,k given by ~yr ,k (1 − ~yr ,k ) being ~yr ,k the yes
probability for the yr variable. The algorithm converges after seven launches with about
699 units even though after two launches the sample size was already around 706 units.
The calibration step fixes the sample size to 707 units. Table 4 shows the sample
allocation and the values of the coefficient of variation (%) computed according to the
(4.5) for the first partition. The empirical coefficient of variation base on 1,000 Monte
Carlo samples is shown as well. The evidences of the results confirm the (4.8) gives an
upward AV with respect to the empirical values of the simulation.
6. Conclusion
Commonly, when the National Statistical Institute (NSI) plan a sampling survey, want to
observe population units belonging to all domains of interest. Moreover, the NSI have the
objective to define the minimum sample size in such a way that the estimates are reliable
for each domain. That represents a sample allocation problem that several algorithms are
able to solve. However, generally they are based on a one-way stratified design because it
guarantees that the sample covers all the domains. Nevertheless, such design can be
expensive and in some cases unfeasible. A multi-way design can be the alternative
solution. In the paper, we propose an algorithm for the sample allocation for such kind of
design assuming that the random selection in the domain is made by the cube method.
The algorithm defines the sample size for domains belonging to different partitions of the
population guaranteeing the sampling errors of domain estimates be lower than given
thresholds. On the other hand the cube method overcomes the problem of the random
selection for obtaining planned sample sizes in domains without using the cross-classified
strata. In fact other multi-way selection methods have problem in case of large scale
surveys. The innovative sampling strategy has been evaluated in a Monte Carlo
simulation on real survey data, that has highlighted the following conclusion: the
analytical expressions of the algorithm input variances are corrected; the proposed
algorithm converges in the analyzed data.
Table 4: Sample allocation, expected and simulated coefficient of variation (%) of the HT estimates of the totals of y1 and y2 in the first partition of domains. Case of bd η r2,k
estimated
First partition
domain
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Population
size
91
97
52
380
136
3
65
64
1733
14
339
33
120
8
150
2
18
22
18
82
Sample
size
40
40
28
90
52
3
32
31
110
11
78
19
44
7
39
2
14
17
13
37
Expected
CV(%) y1
8.9
9.6
9.1
7.0
6.3
0.0
10.4
7.5
5.6
8.1
9.6
9.5
7.5
7.2
11.3
0.0
12.1
8.9
10.5
13.1
Expected
CV(%)
y2
21.1
21.1
21.1
16.2
21.0
0.0
21.0
19.2
21.1
19.5
17.7
20.7
21.0
16.9
20.9
0.0
19.3
21.0
21.0
16.1
Simulated CV(%) y1
7.3
8.0
7.8
5.4
5.1
0.0
8.8
6.2
4.4
6.9
7.5
7.4
6.1
6.2
9.3
0.0
10.4
6.8
9.0
10.5
Simulated CV(%)
y2
17.6
16.3
16.1
12.6
16.7
0.0
16.9
16.2
17.1
16.7
14.4
16.6
16.5
15.8
17.0
0.0
17.7
19.5
18.2
13.1
References
Bethel, J. (1989) Sample Allocation in Multivariate Surveys, Survey Methodology, 15,
47-57
Chauvet G., Tillé Y. (2006) A Fast Algorithm of Balanced Sampling, Computational
Statistics, 21, 53-62
Chromy J. (1987). Design Optimization with Multiple Objectives, Proceedings of the
Survey Research Methods Section. American Statistical Association, 194-199
Cochran W.G. (1977) Sampling Techniques, Wiley, New York
Deville J.-C., Tillé Y. (2004) Efficient Balanced Sampling: the Cube Method, Biometrika,
91, 893-912
Deville J.-C., Tillé Y. (2005) Variance approximation under balanced sampling, Journal
of Statistical Planning and Inference, 128, 569-591
Falorsi P. D., Righi P. (2008) A Balanced Sampling Approach for Multi-way
Stratification Designs for Small Area Estimation, Survey Methodology, 34, 223-234
Falorsi P. D., Orsini D., Righi P., (2006) Balanced and Coordinated Sampling Designs
for Small Domain Estimation, Statistics in Transition, 7, 1173-1198
Isaki C.T. and Fuller W.A. (1982) Survey design under a regression superpopulation
model. Journal of the American Statistical Association, 77, 89-96