SAE malnutrition - Office for National Statistics

Transcription

SAE malnutrition - Office for National Statistics
Small Area Estimation of Child Malnutrition
Assessing the Omission of Maternal Anthropometrics
Ericka G. Rascon-Ramirez
∗
University of Essex
Preliminary Version - Please do not cite
July, 2012
Abstract
This paper aims at assessing the methodology of Elbers, Lanjouw and Lanjouw
(2003) to generate Small Area Estimates (SAE) of child malnutrition. Using Monte
Carlo simulations, I assess the performance of this estimation under the omission of
relevant variables for modelling child's height. An empirical application conrms the
potential bias of SAE of malnutrition when maternal anthropometrics are not controlled
for. These results highlight the implications on child malnutrition rankings based on
SAE for targeting social programme benets at the community level.
: small area estimation, malnutrition, social programme targeting, monte
carlo simulation.
Keywords
1
Introduction
Malnutrition has been deeply examined by social scientists to better comprehend cognitive and physical development since early childhood. Several indicators have been
I thank the Ministry of Social Development (SEDESOL) and the National Institute of Statistics and
Geography (INEGI) in Mexico for making available the Conteo 2005 at the individual level. The data
preparation and preliminary results of the rst Nutrition Map in Mexico were carried out while I was
working for the Development Research Group at the World Bank (DECRG). I also thank Peter Lanjouw
and Joao Santos Silva for their valuable comments and recommendations, as well as Marco Francesconi
and Cheti Nicoletti for their support and guidance as supervisors at the University of Essex. The views
presented here should not be considered as those of the World Bank or any of its aliates.E-mail address:
erasco@essex.ac.uk.
∗
1
underpinned by epidemiologists to understand the multidimensionality of this phenomenon. For instance, stunting (low height-for-age), underweight (low weight-for-age),
anaemia (iron deciency), and wasting (weight-for-height) are the most common indicators considered by researchers to understand and describe child malnutrition. The
accurate identication of malnourished children has been taking a key role in the public
policy of developing countries in order to eciently target the vulnerable and most-atrisk population. The relevance of this welfare indicator has increased in the past fteen
years with the presence of social programmes aiming at tackling poverty and equalizing
opportunities in the long-term.
As a result, data sources and methodologies have been developed to help on the
identication of poor population and malnourished children. Although the collection
of nutrition, income, and consumption indicators through household surveys provide
a more detailed picture of welfare dynamics and current performance; these data are
not representative enough in order to accurately allocate public resources at the community level. Therefore, the main challenge for statisticians and economists has been
the development of methodologies allowing the estimation of welfare indicators with
statistical accuracy at geographical levels not allowed by household surveys.
The set of methods to accurately obtaining indicators for subpopulations without
statistical representation in sample surveys is regularly encompassed in the eld known
as Small Area Estimation. Rao (2003) provides a comprehensive description of the
methods and theories on this eld where two types of modelling are identied for
producing Small Area Estimates (SAE). The rst one is based on area level models
relating small area means to area-specic auxiliary variables, and the second is grounded
on unit level models relating unit values with unit-specic auxiliary variables. Thus,
the focus of this paper is based on the unit-level methodology suggested by Elbers,
Lanjouw, and Lanjouw (2003, henceforth ELL) , which has been predominantly used
by governments - in collaboration with The World Bank, for identifying poor and
malnourished population in developing countries. Although this framework has been
widely used for the production of Poverty Maps to help on the identication of poor
households at small area levels, our main objective is to assess this methodology for the
nutritional context.
Using census records and a household survey, ELL suggests three steps for obtaining
SAE of welfare indicators at geographical levels not allowed by the current sample
surveys: 1) comparability between census and survey variables; 2) modelling of the
welfare indicator using a household survey; and 3) computation of the welfare indicator
in census records (head count ratio, inequality, among others). Albeit the relevance of
each step for accurately estimating welfare indicators at small area levels, the success
of the modelling of the welfare indicator relies on the availability and comparability of
2
data sources, as well as on the prediction power of observed variables.
Although the construction of Poverty and Nutrition Maps relies on a good prediction model, as well as a set of cluster-level variables for accounting for within geographic
heterogeneity, SAE of malnutrition may represent a dierent challenge during the modelling stage. For instance, empirical evidence has highlighted the relevance of parental
anthropometrics for predicting child nutrition. Rubalcava and Teruel (2004) shed light
on the relationship between parental anthropometrics and cognitive ability with child's
height using the Mexican Family Life Survey (MxFLS). Their results underpinned the
overestimation of maternal schooling coecient of about 70 per cent when maternal
height and cognitive ability are omitted in the model. In the same line, Subramanian et al. (2009) have shown that maternal height is inversely associated with child
mortality and anthropometric failure (underweight, wasting, stunting and anaemia).
In another medical study for developing countries, Black et al. (2008) have pointed
out maternal short stature as a risk factor for caesarean delivery, largely related to
cephalopelvic disproportion.
In spite of the lack of experiments to better comprehend the main determinants
for the production of child's height, social and medical researchers have agreed on the
relationship of parental anthropometrics and child nutrition, highlighting the strong
association between long-term nutritional indicators such as maternal and child height.
Albeit the existence of empirical evidence supporting the relationship between parental
anthropometrics and child nutrition, the eld of SAE have not explored the potential
consequences on SAE of child malnutrition when relevant parental variables are omitted
in the modelling stage of child anthropometrics.
Following the evidence underpinned by these studies, this paper aims at analyzing the implications on SAE of child malnutrition when maternal anthropometrics is
not considered as a covariate for child's height modelling. The assessment of the ELL
method becomes relevant when observable and comparable variables between census
and survey are not able to reect maternal height variation. Having a good set of covariates with high prediction power of child height, leans on the possibility of obtaining
a set of variables reecting the current sociodemographic and economic conditions of
the household, as well as information of the genetic endowments, pregnancy conditions,
and long-term characteristics of the household. For instance, individual exposure to
long spell of illness, scarcity of food for long periods, exposure to inappropriate water
and sanitation services, intergenerational transmission of health, among other unobservable variables have been shown to be crucial indicators for modelling long-term
health indicators, such as height-for-age, head cirumference at birth, among other child
indicators.
Although risk factors may dierently intervene in the genetic manifestation of child's
3
height, the introduction of some of these variables in the model may reduce the household eect related to genetic endowments. The omission of a well known relevant
variable (such as maternal height) is likely to be solved in the presence of explanatory
variables that reect long-term conditions of the family background. However, longterm risk factors are rarely collected in sample surveys and practically inexistent in
census records. Hence, if data do not allow to identify variables strongly related to the
genetic component and long-term conditions of the family/child, this unobserved component will be part of the household eect. Stronger correlations between the omitted
variable and observed covariates may help on better predicting the variable of interest, and estimating a less biased SAE of malnutrition when the omitted variable is not
strictly random.
It is worth mentioning the main challenge of the ELL method is a good prediction
of child's height and not the unbiased estimation of our regression coecients. Thus, if
maternal height can be reected through the coecients of those covariates available in
both sources (census and survey), the omission of this relevant covariate will not be a
concern for the nal estimates of malnutrition at small areas. However, when the omission of this covariate cannot be reected using the set of comparable variables, and we
know through the survey that maternal anthropometrics is still highly signicant in the
modelling of child's height, assessing the omission of maternal height becomes crucial
for understanding the potential bias of SAE of child malnutrion when this covariate is
not random.
To assess the consequences of the omission of a relevant variable for child nutrition
modelling, Monte Carlo simulations are used for characterizing hypothetical relationships between maternal height (omitted variable) and child's height, as well as between
maternal height and child's height covariates. Monte Carlo results shed light on the bias
of child's height z-score at the enumeration area level (small area), and an empirical
application reinforces these ndings.
This article has been organized in the following sections: Section II provides a brief
description of child malnutrition and mapping; Section III explains the stages of the ELL
methodology; Section IV discusses SAE of child's height assuming dierent relationships
between child's height, maternal anthropometrics and child's covariates using Monte
Carlo simulations; Section V presents an empirical case using the Mexican Survey
of Nutrition and Health (ENSANUD 2006) and the Mexican Census (Conteo 2005);
Section VI provides a discussion of public policy implications under the omission of
maternal anthropometrics in SAE modelling; and Section VII summarizes and concludes
the main ndings of this study.
4
2
Child Malnutrition and Mapping
The identication of deprived areas has been one of the main concerns of developing
countries for targeting social programmes. For instance, the use of census records to
produce community proles of infrastructure and socioeconomic characteristics have
been widely used for constructing indices of deprivation and unsatised basic needs.
However, although these measures may be highly related with child nutrition, these
indicators are not necessarily reliable for obtaining nutrition proles at the community
level.
For this reason, the construction of SAE of nutrition aims at obtaining the nutrition indicator of interest at geographical levels not allowed by household surveys, but
relevant for social programme targeting. The ELL method has become a useful tool for
identifying malnourished children using census records for capturing the heterogeneity
between and within communities; and household surveys for obtaining linkages of the
welfare indicator of interest and common covariates between census and survey.
The use of the ELL method started been widely used for targeting purposes since
early 2000s and nowadays around 45 developing countries use it for generating poverty
estimates at the local level (See Araujo et al. (2008); Bedi et al. (2007); Hentschel
et al. (2000); Lopez et al. (2008); Elbers et al. (2004)), and some of them have
complemented these exercises producing SAE of nutrition indicators (See Fujii (2005)
and Simler (2006)).
Although information regarding access to piped water and sewage, as well as gastrointestinal disease incidence may be highly related with child malnutrition, these
indicators may be seen not sucient for capturing child malnutrition. For instance,
information regarding food prices, availability and variety of food in the community, as
well as the condition of the child during early childhood and pregnancy may be crucial factors for understanding long-term nutrition indicators such as child's height and
weight. For this reason the use of SAE has become essential for estimating nutrition
indicators at local levels with better precision than the permitted by the sample design
of household surveys.
3
3.1
Small Area Estimation
ELL Approach
This study is based on the standard methodology of Small Area Estimation developed
by Elbers, et. al (ELL: 2000;2002;2003). Although Fujii (2005) suggests an adaptation
of the methodology for modelling anthropometric indicators based on a simultaneous
5
equation framework, this study will be focused on the original ELL method - a single
model approach. The basic idea of this procedure is to use a survey data in order to
project welfare indicators into census records. Once these indicators are obtained at
the individual/household level in the census, the indicator is aggregated at geographical
levels not representative in the survey. To better comprehend the ELL methodology, a
brief description is given under the context of SAE of child malnutrition. The stages
suggested by ELL (2003) are: 1) comparability between census and survey variables;
2) modelling of the welfare indicator of interest using the survey; and 3) computation
of welfare indicators on census records.
To understand the method using anthropometric indicators, the process is described
as if we would be interested in producing a Nutrition Map using the standard ELL
methodology.
Stage zero. Grouping and Comparability between Data Sources
The aggregation of states or districts to dene geographical regions has been convenient for modelling in Poverty and Nutrition Mapping; it enables the gathering of
administrative divisions with alike socio-economic features in advance. This grouping
is useful when we have a reduced number of observations at the level for which the
survey is representative (for instance, state, district or sub district level), or when the
survey is representative only at the region level. In any case, the aggregation criterion must follow the representativity of the health or nutrition survey. After dening
the geographical partition for modelling, we proceed with the selection of comparable
variables between census and survey.
To dene a set of variables strictly comparable, denitions of each variable should
be compared between census and survey based on their questionnaires. Subsequently,
statistical comparisons between census and survey are carried out controlling for the
sample design of the survey. The selection of continuous variables is complemented
with distributional comparisons using Kolmogorov-Smirnov distributional tests.
Stage one. Anthropometric modelling
Once comparable variables between census and survey are identied, the anthropometric indicator is modelled using the survey (height-for-age, weight-for-age, or other).
The basic idea is to estimate a GLS model using as covariates only those variables comparable between both sources. Hence, the main model has the following composition:
s
s
Hch
= Xch β + Uch
(1)
s denotes height z-score of the child s of the household h belonging to the
where Hch
s the error component.
cluster c, Xch household and individual characteristics, and Uch
6
The latter may be decomposed into cluster and household eect:
s
Uch
= ηc + ch
(2)
The decomposition of error terms helps on obtaining the household error vector to
be modelled in a second stage and be used for the GLS nal estimate.
Stage two. Computation of the anthropometric measure
Once the selection of the best model is achieved, the empirical distributions of β ,
ηc and ch are used for obtaining r random draws to be used in census records and
estimating the welfare indicator at the individual level:
s
H̃ch,(r)
= Xch β̃(r) + η̃c,(r) + ˜ch,(r)
(3)
The number of r replications provides r values of the welfare indicator for each
census record. As equation (3) shows, using the same covariates selected for the nal
model in the survey, β vector is multiplied by the corresponding covariate, and each
type of error is added up. Cluster and household components are treated as random
for obtaining the projection of child's height in the census records.
3.2
Omitted Variable Problem in SAE
Several developing countries have constructed SAE based on the available variables in
survey and census datasets. In fact, when we talk about SAE, the modeling of our
welfare indicator is based on those variables strictly comparable between both sources.
Nevertheless, empirical studies may show additional relevant variables for modeling the
welfare indicator of interest, it is likely that if those covariates are not available in the
census, they will be ignored. This issue had not been properly addressed on the literature on Poverty Mapping given that it is less clear what sort of variables must appear
in the nal model for obtaining SAE. However, when we try to model anthropometric
indicators, no matter the condition of the country, it is very likely that parental anthropometrics play a relevant role on child's health and nutrition; specially for long-term
indicators such as height-for-age, birth weight, head circumference, among others. The
ELL methodology has indirectly addressed the omitted variable problem, mainly for
constructing Poverty Maps, claiming that the remaining residual after controlling for
highly signicant individual and cluster variables is purely random. Therefore, by treating as random household and cluster components, ELL simulates the welfare indicator
at the enumeration area (or geographical level of interest).
However, the omission of a relevant variable had not been the main concern of the
ELL methodology given that the inclusion of variables at the cluster level, help to
7
signicantly reduced the cluster eect. Hence, after an extensive use of variables at
the cluster level, it is assumed that the remaining residual can be treated as a random
eect.
In spite of the extensive literature on Poverty Mapping showing the importance
of cluster variables for reducing the variance of the nal SAE, it is not evident that
the same logics may apply to anthropometric variables. The main reason leans on the
empirical evidence regarding the impact of parental anthropometrics on child's development. Hence, under this context, it is likely that the omission of a relevant variable may
be detrimental if this is not reected through the observable and comparable variables
between census and survey.
To acknowledge that parental anthropometrics may be compounded by characteristics not necessarily related with child's height covariates, make us think of dierent
scenarios on the relationship between child's and maternal covariates. If we assume the
omitted variable is maternal anthropometrics, the following scenarios may be plausible:
1) maternal height covariates (Zch ) highly correlated with child's (Xch ); 2) maternal
height covariates (Zch ) weakly correlated with child's (Xch ); and 3) No correlation between maternal and child's covariates E[Zch , Xch ] =0. Each scenario explores dierent
magnitudes of the coecient of maternal height on child's.
The relevance of the omission of a variable will be relying on the correlation between
child's and mother's height covariates, as well as on the signicance and magnitude of
maternal height coecient on child's height.
3.3
Solutions to the Omitted Variable Problem
3.3.1 Two-Step Small Area Estimation
This study suggests an alternative approach for considering relevant variables for modelling child's anthropometrics. Using the ELL methodology, I suggest the small area
estimation of child malnutrition through a two-step-imputation. Firstly, the omitted
variable has to be identied in the survey. Several specications of the anthropometric
model have to be tested in order to ensure a signicant relationship between the variable of interest not available in the census, and the relevant covariate omitted during
the model given its missing counterpart in the census. The identication of the omitted
covariate should be clearly highlighted by the empirical literature.
Hence, if we follow the empirical evidence regarding child's height modelling, we
may think that the true model follows the following structure:
s
m
s
Hch
= Xch β + Hch
δ + Uch
8
(4)
s is child's height, X
m
Where Hch
ch is a set of household and child's characteristics, Hch
s is a random component composed by cluster η and
represents maternal height and Uch
c
household ch elements:
s
Uch
= ηc + ch
(5)
Even though maternal height is a relevant variable for child's height modelling, the
standard ELL methodology does not allow us to estimate child's height based on model
equation (3). As a consequence, instead of leaving maternal height out of the child's
mother, we suggest a two-step imputation. The rst step is to impute maternal height
with comparable variables between census and survey, and after obtaining maternal
height values for each census record, we use the imputed variable as a covariate for the
nal model of child's height.
4
4.1
Monte Carlo Simulation Exercise
Data Generating Process
We simulate child's height for a census of 100,000 records for which the following geographical partitions are identied: state (11), municipalities (870), and enumeration
areas (2870).1 . The Data Generating Process for simulating child's height follows the
structure shown in (3), in which maternal height is described by:
m
m
Hch
= Zch θ + τch
(6)
m denotes maternal height of the mum m, belonging to the household h in the
where Hch
cluster c . Clusters may refer to enumeration areas, localities, or other low geographical
partition; regularly dened at a lower geographical level than the nal geographical tarm an error component
get. Zch is a set of household and individual characteristics, and τch
decomposed into cluster φc and household eect υch .
We assume Cov(φc ,υch )=0, Cov(υch ,υch0 )=0, Cov(Zch ,φc )=0, and Cov(Zch ,υch )=0 for
all h 6=h'. However, it is possible to have some correlation between mother's and child's
covariates.
Because our objective is to assess the standard ELL methodology under the omission
of a relevant variable for predicting child's malnutrition, we analyze two scenarios that
encompasses the prediction power observed in real data: R2 =0.25 and R2 =0.45. For
these cases, we explore the bias of SAE when maternal height is not included, and when
1 This geographical structure has been taken from the original Census records used for the empirical
section of this study.
9
child's covariates face a positive correlation with maternal covariates. For the latter,
0
0
we inspect a case for which child's covariates have a low (E[Xch ,Zch ]=0.25) and high
0
0
correlation with mother's covariates (E[Xch ,Zch ]=0.65).
Our scenarios assume dierent contributions of mother's height on child's height
variance for which we assume normal distributions for all the components of equation
(3). For each case mentioned above, we x the distributional parameters and change
the corresponding coecient of maternal height (δ ) in order to change the contribution
m ).2 For drawing a more realistic picture during the DGP,
of the variable of interest (Hch
child's covariates were split into potential common variables between mother and child,
0
0
which for helping on their distinction we call them Xch in the child's model and Zch
in the mother's. As common variables we dene education ∼ N (10, 5) and income ∼
N (2500, 801); and as the rest of variables based on child's DGP we have maternal age
∼ N (28, 36) and potable water at the community level ∼ N (50, 35). The variable of
m has been explited into an observed Z
interest Hch
ch ∼ N (155, 28) and an unobserved
m
components τch ∼ N (0, 9.6) . As equation (2) shows, child's residual is splitted into a
cluster ηc ∼ N (0, 1) and household eect ch ∼ N (0, 1).3 Finally, our dependet variable
s ∼ N (0, 1)
follows a standard normal distribution Hch
To assess SAE under the omission of maternal height in child's height models,
random samples of the 5 per cent were drawn from the a census of 100,000 observations
based on child's DGP (3); the geographical stratication was respected by sampling
at the locality level. Random draws were taken from the same seed for which the
census was created. The number of observations of the sample survey is around 5,000
observations and for each scenario we constructed 25 surveys. The true values of child's
z-score were created by multiplying the values derived from the above distributions and
the coecients dened for each scenario:
s
|Xch , Zch ) = E(Xch )β + E[Zch θ + φc + υch ]δ + E(ηc + uch )
E(Hch
(7)
Constructing the means at the enumeration area level using child's z-score from (7), we
obtain the true values for which the nal SAE will be compared.
2 To increase the relevance of H m each variable maintained the same coecients with exception of a
ch
"support variable" called income. This variable was used for increasing the relevance of maternal height
without changing the prediction power of the model or other parameters.
3 The contribution of error terms is modied by multiplying a x coecient to both of them. For simplicity,
we are assuming that both eects aect in the same manner child's height z-score, however, this may not be
the case using real data.
10
4.2
Monte Carlo Results
Based on the scenarios described above, we calculate the mean of the MSE of the
estimated z-score of child's height at the enumeration area level. This indicator is
constructed by using the average child's z-score calculated from equation (7) and the
SAE obtained by using the ELL standard methodology:
M SEea =
ell −Ȳ true |
|Ȳea,r
ea
R
true |
|Ȳea
r=1
R
X
(8)
ell
where M SEea denotes de mean squared error at the enumeration area level (EA); Ȳea,r
is the mean of the z-score of the child at the EA level for each r -replication, and Ȳeatrue
is the true value of the z-score based on our Monte Carlo Exercises. The survey derived
from r, generates 250 values for each individual using the ELL method. R replications
correspond to 25 surveys considered for each case we described above, based on R2 and
correlation between common variables and maternal height.4
Table 1 and 2 present two scenarios assuming prediction power of 0.25 and 0.45,
and dierent cases regarding the correlation of maternal height and common covariates
between child's z-score. The mean M SE ea or bias is expressed as the relative change
in the bias with respect to the best approach we may consider for reducing the bias.
Intuitevely, although the prediction power is quite low, either 0.25 or 0.45, according
to our scenarios, the best approach would be considering maternal height in child's
modelling. As we would expect, the higher the correlation between maternal and child's
covariates, the smaller the bias of the nal SAE at the enumeration level.
Because the prediction power assumed for these exercises is quite low, although
similar to the real data evidence, for child malnutrition is expected biased SAE even
for our best scenario (R2 ). Although this may be the case if the error component is
not purely random at the enumeration area, we may be able to reduce the bias through
nding signicant child's covariates related to mother's height, or including maternal
height as a covariate of child's z-score through a two-step-SAE.
Table 1 and 2 assume a prediction power of 0.45 for which maternal height presents
a contribution of 25 and 75 per cent on the total variance of child's zscore. For the case
of 25 per cent of contribution, we observe that the no-inclusion of maternal height into
child's model just biases the nal enumeration area mean of child's zscore in 4 per cent
for the rst, median and third quartiles. The existence of a high correlation between
common covariates of child's z-score and maternal height reduces the bias from 4 to
1 per cent. However, although this high correlation reduces the bias, there are still
4 The
number of random surveys will be increased for generating the nal standard errors of SAE using
Tarozzi's ado-les.
11
some gains from including maternal height. In addition, Table 2 shows that when the
contribution of maternal height is quite high (0.75 per cent of the total variance), not
including maternal height in the model when there is no correlation among mother's
and child's covariates, increases the bias in almost 20 per cent. A high correlation
between common covariates reduces the bias to 6 per cent for this case.
5
Empirical Evidence using Mexican Data
Using the Mexican Health and Nutrition Survey (ENSANUT-2006) and the Mexican
Census records (CONTEO-2005), this section will provide further evidence regarding
the consequences of omitting a crucial variable on child's height modelling. To achieve
our aim, I obtain SAE of child's height with and without considering maternal height
in the model. Because maternal anthropometric indicators are not available in census
data, we suggest a two-stage-SAE approach for estimating maternal height in a rst
step to be considered as a covariate in the nal model of child's height. The following
subsection will detailed the characteristics of our data, as well as the model selected for
carrying out this exercise.
Based on equation (4), we estimate the child's height zscore using as Xch household and child's characteristics (age, sex, birth order, dwelling characteristics, parental
m
schooling and age, municipality and locality information). Following equation (6), Hch
contains two types of variables, one is contained in child's covariate matrix, and the
other is not related to the child's DGP.
5.1
Data and Variables
This study models the height for children between 0 to 5 years old for two geographical
regions in Mexico. Under the classication suggested by the National Council of Population (CONAPO), for which we select the regions identied as Regions of Very High
5 Using principal components, the
and High Vulnerability considering rural localities.
index of vulnerability (or "marginalization") considers the following variables: education (illiteracy and population without complete primary school; dwelling (occupants
without piped water, without drainage and sanitary service, with soil oor, without
electricity and with overcrowding conditions); income (employed population earning up
to two minimum salaries); and location (population living in localities with less than
5,000 inhabitants). The regions considered for this study are those with high incidence
5 This
study uses the data preparation carried out for constructing the Nutrition Map 2005 nanced by
the Mexican Government and produced during my work at the World Bank. The original Nutrition Map
was achieved for the 32 states of Mexico
12
of stunting for children under 5 years old: Very High Vulnerability (Chiapas, Guerrero,
and Oaxaca) and High Vulnerability (Campeche, Hidalgo, Michoacán, Puebla, Tabasco,
Veracruz, San Luis Potosi, and Yucatan). Because the incidence is signicantly higher
in rural regions, the current exercise is carried out using children living in rural localities (under 2,500 inhabitants) where the rst region presents 23 per cent of stunted
children and the second 13 per cent. 6
Once geographical regions were dened, comparisons of concepts, distributions and
summary statistics were achieved between the ENSANUT (2006) and CONTEO (2005).
For classifying a pair of variables statistically similar, we compare distributions and
summary statistics between survey and census records controlling for the survey design.7 The selected household and individual characteristics were the following: type of
ooring, asset possession, water source, sanitary service, piped water inside the dwelling
and type of drainage; and household size, kinship, sex, age, social security, indigenous
identication, literacy, schooling attendance and schooling level. In addition, municipality and locality variables were included in the child's model too.
Dwelling and population characteristics at locality level were extracted from the
ITER 2005 which consist on a dataset of variables at the locality level constructed by
the Institute of Statistics, Geography and Information Technology (INEGI) from census
records. The Mexican Ministry of Social Development, the Ministry of Environment
and Natural Resources, as well as the National Committee of Water Provision provided the following variables at the locality level: longitude, latitude, altitude, annual
precipitation, type of vegetation, type of agriculture, tree coverage, soil erosion, and
climate.
5.1.1 Denition of Height per Age
Anthropometric indicators were constructed based on the international standards suggested by the World Health Organization (WHO) in 2006. In 1993, the WHO carried
out an exhaustive revision of the use and interpretation of the anthropometric benchmarks used since the 1970s known as NCHS/WHO international reference population. The international reference growth curves were formulated in the 1970s originally
planned to serve as a reference for the USA population. After this revision, it was
concluded that the NCHS/WHO growth references were not adequately reecting the
growth of children in early stages. From 1997 to 2003 the WHO Multicentre Growth
Reference Study (MGRS) was implemented to develop growth references for children
under 5 years old. The MGRS collected weight, height and head circumference mea6 The
national incidence is 12.5 per cent of children under 5 years old.
sample units (PSU) and geographical strata was provided by INEGI (Institute of Statistics,
Geography and Information Technology).
7 Primary
13
sures from 8440 breast-fed children of dierent ethnic and cultural backgrounds. The
countries considered for the new references were: USA (California), Oman (Muscat),
Norway (Oslo), Brazil (Pelotas), Ghana (Accra) and India (South Delhi). Selected
children were living in favorable socio-economic conditions and healthy environments
(disease incidence was low). Around 80 per cent of mothers followed the WHO recommendations; some of them related to non-smoking habits during and after pregnancy,
as well as breast-feeding. Base on the anthropometric indicators collected from these
children, new growth references were constructed to identify underweighted, stunted
and wasted children under 5 years old.
For the purposes of this study, the new international benchmarks were used comparing each child of the survey with his respective "healthy" alike - considering age
and gender. The dependent variables used for modeling were transformed into height
z-scores for children under 5 years old. This transformation expresses the number of
standard deviations that an individual is away from the international reference according to gender and age.
5.2
Estimation of Child's Height Z-Score
5.2.1 Relevance of Cluster identication
Before starting the modeling stage, the identication of the geographical level dened
as cluster was carried out looking at the hierarchical composition of each level, the
identical codication in both sources (CONTEO and ENSANUT), and the number
of children under 5 years old per level. Administrative divisions are based on the
following hierarchical levels for the two regions considered: states (11), municipalities
(1,557), localities (47,451), and manzanas (254,262).8 Due to the signicant number of
"manzanas" (enumeration areas) with only one child under 5 years old in the survey, the
cluster eect was dened one level above. We selected localities as clusters to avoid the
construction of between heterogeneity with just a few individuals inside the cluster. The
identication of cluster at the locality level implies that the variation of estimates will
be mainly explained by this level besides the household eect. However, the possibility
of having additional variation at higher levels is not neglected; therefore anthropometric
models consider state and municipality level variables to capture higher level variation.
Identifying dierent levels for cluster eect, Elbers, et.al (2008) demonstrate that the
contribution of the correlation of higher levels is very low. Nevertheless, it is worthy
to try some of these variables during the modeling stage given that this is purely an
empirical question.
8 At the national level the geographical partition contains:
and 1,183,678 manzanas.
14
32 states, 2,451 municipalities, 284,485 localities
5.2.2 Chid's Height Modelling and SAE without using Maternal Height
Stage one of modelling was carried out using OLS regressions using the set of comparable
variables selected in the previous stage. The modeling and simulations were achieved
using children under 5 years old in the survey and census. According to the National
Institute of Statistics, Geography and Information Technology (INEGI), 52 per cent
of Mexican Localities presents children under 5 years old, and only half of them are
concentrated in localities with more than 100 inhabitants.
5.3
Child's Height Modelling and SAE using Maternal Height
As Section 3 has discussed, the omission of a relevant variable for child anthropometric
may mislead the action of policy makers in the task of tackling malnutrition. Nevertheless the objective of the current methodologies of SAE do not aim to provide causal
interpretations of individual, household and community characteristics considered in
the model, it is crucial to understand the nature of the welfare indicator we are aiming
at obtaining at small area levels. As it has been mentioned throughout this study, the
empirical literature has highlighted not only the strong relationship between infrastructure and malnutrition, but also the strong and signicant relationship between parental
anthropometrics and child health.
Using the data described above, we rstly modelled the height of children under
ve considering individual, household, locality, and municipality variables available in
census and survey data (See Appendix I for further details regarding the model [not
available for this draft version]). To obtain the best model, dierent specications have
been tried based on the criterion of high prediction power R2 . The selected model
presents a R2 of 0.27 which is quite high for anthropometric models, as Fujii (2005)
also points out in his work for Cambodia.
Looking at the Figures 1 and 2 of Appendix II, we analyze two states belonging
to the geographical regions compounded by 11 states. We observe in these gures the
percentages of stunting for children under 5 years old (zscore<-2) at the municipality
level. Maps on the left show SAE obtained using the standard ELL and the second,
result after carrying out a two-step-SAE for considering maternal height in the child's
modelling. Figure 1 shows the muncipalities of the state of Chiapas (south of Mexico)
for which the map before and after considering maternal height does not make any
dierence on the incidence of stunting (zscore<-2) at the municipality level. However,
when we look at the state of Hidalgo (centre of Mexico) in Fig 2, we observe a move
heterogeneous picture. The two-step-SAE shows a higher number of municipalities with
a high value of the average child's zscore. The location of the municiapalities with high
stunting are scattered all over the state, in contrast to what we observe under the
15
standard ELL method.
Although the states of Chiapas and Hidalgo are classied with high levels of poverty,
both of them have dierent perfomances after including maternal anthropometrics. It
is likely that states without experiencing dramatic changes in social mobility, will be
easier to map child nutrition based on infrastructure and household variables. Specially if long-term variables, such as indigenous identication, are available in the data.
However, if there are states with more heterogenous compositions, facing economic and
social changes such as migration and social mobility, it is plausible that the accurate
representation of long-term indicators on child malnutrition becomes more complex.
Hence, for regions or states presenting signicant socioeconomic changes may be relevant to consider a two-step-SAE if the main purpose is to estimate indicators clearly
related to long-term circumstances.
6
Conclusion
Using Monte Carlo simulations assuming dierent contributions of maternal height on
child's, we observe that higher contributions of the omitted variable (maternal height)
may signicantly bias the nal estimates of child's height z-scores at small area levels.
Unless the omitted variable has a strong relationship with child's covariates, the ELL
method does not provide enough adjustment to solve the bias produced by the omission
of maternal height (which has been assumed not purely random in this study).
An empirical application provides evidence for supporting a two-step SAE following
the criterion suggested by ELL. Using the Mexican Census (2005) and the Mexican
Survey of Health and Nutrition (2006), this study shows how the Nutrition Map at the
state level with municipality partitions may vary when maternal height is considered in
the prediction of child's height. Although the two states analyzed are classied by the
Mexican Government as having a similar poverty incidence (and we may expect similar
levels of malnutrition), the omission of maternal height has a signicantly dierent
eect on the nal estimates of child's height (z-score) at the municipality level. These
dierences may be reecting the fact that the state of Chiapas is mainly characterized
as a state with low social mobility and high concentration of indigenous communities,
whereas the state of Hidalgo displays higher mobility through rural-urban migration.
This would seem to indicate that the exclusion of a relevant variable, in this case
maternal height, may be detrimental for local level estimates of child malnutrition if the
region or state shows relevant socioeconomic changes. However, areas with stable social
mobility and low economic growth present no signicant changes in the malnutrition
ranking of municipalities after taking maternal height into consideration. In conclusion,
it is possible to say that the implementation of the method outlined in this study,
16
together with further research in this area, could help map a more accurate estimation
of child malnutrition for those areas with high socioeconomic variability, enabling policy
makers to target their eorts more eectively in the future.
17
7
Appendix I: Tables
Table 1: MC Exercises: Relative Mean of the MSE at the enumeration area level
Contribution of Maternal Height: 25 per cent of Childs' Height Z-score Variance
m
] 1st Quartile Median 3rd Quartile
Corr[Xch , Hch
0
3.94
3.83
3.64
0.25
3.56
3.45
3.29
0.65
1.00
0.98
1.07
Table 2: MC Exercises: Relative Mean of the MSE at the enumeration area level
Contribution of Maternal Height: 75 per cent of Childs' Height Z-score Variance
m
] 1st Quartile Median 3rd Quartile
Corr[Xch , Hch
8
0
18.4
17.9
16.4
0.25
16.1
15.7
14.4
0.65
5.9
5.8
5.6
Appendix II: Figures and Graphics
Figure 1:
State of Chiapas: Z-scores of Height for Children under 5 years old
18
Figure 2:
State of Hidalgo: Z-scores of Height for Children under 5 years old
19