SAE malnutrition - Office for National Statistics
Transcription
SAE malnutrition - Office for National Statistics
Small Area Estimation of Child Malnutrition Assessing the Omission of Maternal Anthropometrics Ericka G. Rascon-Ramirez ∗ University of Essex Preliminary Version - Please do not cite July, 2012 Abstract This paper aims at assessing the methodology of Elbers, Lanjouw and Lanjouw (2003) to generate Small Area Estimates (SAE) of child malnutrition. Using Monte Carlo simulations, I assess the performance of this estimation under the omission of relevant variables for modelling child's height. An empirical application conrms the potential bias of SAE of malnutrition when maternal anthropometrics are not controlled for. These results highlight the implications on child malnutrition rankings based on SAE for targeting social programme benets at the community level. : small area estimation, malnutrition, social programme targeting, monte carlo simulation. Keywords 1 Introduction Malnutrition has been deeply examined by social scientists to better comprehend cognitive and physical development since early childhood. Several indicators have been I thank the Ministry of Social Development (SEDESOL) and the National Institute of Statistics and Geography (INEGI) in Mexico for making available the Conteo 2005 at the individual level. The data preparation and preliminary results of the rst Nutrition Map in Mexico were carried out while I was working for the Development Research Group at the World Bank (DECRG). I also thank Peter Lanjouw and Joao Santos Silva for their valuable comments and recommendations, as well as Marco Francesconi and Cheti Nicoletti for their support and guidance as supervisors at the University of Essex. The views presented here should not be considered as those of the World Bank or any of its aliates.E-mail address: erasco@essex.ac.uk. ∗ 1 underpinned by epidemiologists to understand the multidimensionality of this phenomenon. For instance, stunting (low height-for-age), underweight (low weight-for-age), anaemia (iron deciency), and wasting (weight-for-height) are the most common indicators considered by researchers to understand and describe child malnutrition. The accurate identication of malnourished children has been taking a key role in the public policy of developing countries in order to eciently target the vulnerable and most-atrisk population. The relevance of this welfare indicator has increased in the past fteen years with the presence of social programmes aiming at tackling poverty and equalizing opportunities in the long-term. As a result, data sources and methodologies have been developed to help on the identication of poor population and malnourished children. Although the collection of nutrition, income, and consumption indicators through household surveys provide a more detailed picture of welfare dynamics and current performance; these data are not representative enough in order to accurately allocate public resources at the community level. Therefore, the main challenge for statisticians and economists has been the development of methodologies allowing the estimation of welfare indicators with statistical accuracy at geographical levels not allowed by household surveys. The set of methods to accurately obtaining indicators for subpopulations without statistical representation in sample surveys is regularly encompassed in the eld known as Small Area Estimation. Rao (2003) provides a comprehensive description of the methods and theories on this eld where two types of modelling are identied for producing Small Area Estimates (SAE). The rst one is based on area level models relating small area means to area-specic auxiliary variables, and the second is grounded on unit level models relating unit values with unit-specic auxiliary variables. Thus, the focus of this paper is based on the unit-level methodology suggested by Elbers, Lanjouw, and Lanjouw (2003, henceforth ELL) , which has been predominantly used by governments - in collaboration with The World Bank, for identifying poor and malnourished population in developing countries. Although this framework has been widely used for the production of Poverty Maps to help on the identication of poor households at small area levels, our main objective is to assess this methodology for the nutritional context. Using census records and a household survey, ELL suggests three steps for obtaining SAE of welfare indicators at geographical levels not allowed by the current sample surveys: 1) comparability between census and survey variables; 2) modelling of the welfare indicator using a household survey; and 3) computation of the welfare indicator in census records (head count ratio, inequality, among others). Albeit the relevance of each step for accurately estimating welfare indicators at small area levels, the success of the modelling of the welfare indicator relies on the availability and comparability of 2 data sources, as well as on the prediction power of observed variables. Although the construction of Poverty and Nutrition Maps relies on a good prediction model, as well as a set of cluster-level variables for accounting for within geographic heterogeneity, SAE of malnutrition may represent a dierent challenge during the modelling stage. For instance, empirical evidence has highlighted the relevance of parental anthropometrics for predicting child nutrition. Rubalcava and Teruel (2004) shed light on the relationship between parental anthropometrics and cognitive ability with child's height using the Mexican Family Life Survey (MxFLS). Their results underpinned the overestimation of maternal schooling coecient of about 70 per cent when maternal height and cognitive ability are omitted in the model. In the same line, Subramanian et al. (2009) have shown that maternal height is inversely associated with child mortality and anthropometric failure (underweight, wasting, stunting and anaemia). In another medical study for developing countries, Black et al. (2008) have pointed out maternal short stature as a risk factor for caesarean delivery, largely related to cephalopelvic disproportion. In spite of the lack of experiments to better comprehend the main determinants for the production of child's height, social and medical researchers have agreed on the relationship of parental anthropometrics and child nutrition, highlighting the strong association between long-term nutritional indicators such as maternal and child height. Albeit the existence of empirical evidence supporting the relationship between parental anthropometrics and child nutrition, the eld of SAE have not explored the potential consequences on SAE of child malnutrition when relevant parental variables are omitted in the modelling stage of child anthropometrics. Following the evidence underpinned by these studies, this paper aims at analyzing the implications on SAE of child malnutrition when maternal anthropometrics is not considered as a covariate for child's height modelling. The assessment of the ELL method becomes relevant when observable and comparable variables between census and survey are not able to reect maternal height variation. Having a good set of covariates with high prediction power of child height, leans on the possibility of obtaining a set of variables reecting the current sociodemographic and economic conditions of the household, as well as information of the genetic endowments, pregnancy conditions, and long-term characteristics of the household. For instance, individual exposure to long spell of illness, scarcity of food for long periods, exposure to inappropriate water and sanitation services, intergenerational transmission of health, among other unobservable variables have been shown to be crucial indicators for modelling long-term health indicators, such as height-for-age, head cirumference at birth, among other child indicators. Although risk factors may dierently intervene in the genetic manifestation of child's 3 height, the introduction of some of these variables in the model may reduce the household eect related to genetic endowments. The omission of a well known relevant variable (such as maternal height) is likely to be solved in the presence of explanatory variables that reect long-term conditions of the family background. However, longterm risk factors are rarely collected in sample surveys and practically inexistent in census records. Hence, if data do not allow to identify variables strongly related to the genetic component and long-term conditions of the family/child, this unobserved component will be part of the household eect. Stronger correlations between the omitted variable and observed covariates may help on better predicting the variable of interest, and estimating a less biased SAE of malnutrition when the omitted variable is not strictly random. It is worth mentioning the main challenge of the ELL method is a good prediction of child's height and not the unbiased estimation of our regression coecients. Thus, if maternal height can be reected through the coecients of those covariates available in both sources (census and survey), the omission of this relevant covariate will not be a concern for the nal estimates of malnutrition at small areas. However, when the omission of this covariate cannot be reected using the set of comparable variables, and we know through the survey that maternal anthropometrics is still highly signicant in the modelling of child's height, assessing the omission of maternal height becomes crucial for understanding the potential bias of SAE of child malnutrion when this covariate is not random. To assess the consequences of the omission of a relevant variable for child nutrition modelling, Monte Carlo simulations are used for characterizing hypothetical relationships between maternal height (omitted variable) and child's height, as well as between maternal height and child's height covariates. Monte Carlo results shed light on the bias of child's height z-score at the enumeration area level (small area), and an empirical application reinforces these ndings. This article has been organized in the following sections: Section II provides a brief description of child malnutrition and mapping; Section III explains the stages of the ELL methodology; Section IV discusses SAE of child's height assuming dierent relationships between child's height, maternal anthropometrics and child's covariates using Monte Carlo simulations; Section V presents an empirical case using the Mexican Survey of Nutrition and Health (ENSANUD 2006) and the Mexican Census (Conteo 2005); Section VI provides a discussion of public policy implications under the omission of maternal anthropometrics in SAE modelling; and Section VII summarizes and concludes the main ndings of this study. 4 2 Child Malnutrition and Mapping The identication of deprived areas has been one of the main concerns of developing countries for targeting social programmes. For instance, the use of census records to produce community proles of infrastructure and socioeconomic characteristics have been widely used for constructing indices of deprivation and unsatised basic needs. However, although these measures may be highly related with child nutrition, these indicators are not necessarily reliable for obtaining nutrition proles at the community level. For this reason, the construction of SAE of nutrition aims at obtaining the nutrition indicator of interest at geographical levels not allowed by household surveys, but relevant for social programme targeting. The ELL method has become a useful tool for identifying malnourished children using census records for capturing the heterogeneity between and within communities; and household surveys for obtaining linkages of the welfare indicator of interest and common covariates between census and survey. The use of the ELL method started been widely used for targeting purposes since early 2000s and nowadays around 45 developing countries use it for generating poverty estimates at the local level (See Araujo et al. (2008); Bedi et al. (2007); Hentschel et al. (2000); Lopez et al. (2008); Elbers et al. (2004)), and some of them have complemented these exercises producing SAE of nutrition indicators (See Fujii (2005) and Simler (2006)). Although information regarding access to piped water and sewage, as well as gastrointestinal disease incidence may be highly related with child malnutrition, these indicators may be seen not sucient for capturing child malnutrition. For instance, information regarding food prices, availability and variety of food in the community, as well as the condition of the child during early childhood and pregnancy may be crucial factors for understanding long-term nutrition indicators such as child's height and weight. For this reason the use of SAE has become essential for estimating nutrition indicators at local levels with better precision than the permitted by the sample design of household surveys. 3 3.1 Small Area Estimation ELL Approach This study is based on the standard methodology of Small Area Estimation developed by Elbers, et. al (ELL: 2000;2002;2003). Although Fujii (2005) suggests an adaptation of the methodology for modelling anthropometric indicators based on a simultaneous 5 equation framework, this study will be focused on the original ELL method - a single model approach. The basic idea of this procedure is to use a survey data in order to project welfare indicators into census records. Once these indicators are obtained at the individual/household level in the census, the indicator is aggregated at geographical levels not representative in the survey. To better comprehend the ELL methodology, a brief description is given under the context of SAE of child malnutrition. The stages suggested by ELL (2003) are: 1) comparability between census and survey variables; 2) modelling of the welfare indicator of interest using the survey; and 3) computation of welfare indicators on census records. To understand the method using anthropometric indicators, the process is described as if we would be interested in producing a Nutrition Map using the standard ELL methodology. Stage zero. Grouping and Comparability between Data Sources The aggregation of states or districts to dene geographical regions has been convenient for modelling in Poverty and Nutrition Mapping; it enables the gathering of administrative divisions with alike socio-economic features in advance. This grouping is useful when we have a reduced number of observations at the level for which the survey is representative (for instance, state, district or sub district level), or when the survey is representative only at the region level. In any case, the aggregation criterion must follow the representativity of the health or nutrition survey. After dening the geographical partition for modelling, we proceed with the selection of comparable variables between census and survey. To dene a set of variables strictly comparable, denitions of each variable should be compared between census and survey based on their questionnaires. Subsequently, statistical comparisons between census and survey are carried out controlling for the sample design of the survey. The selection of continuous variables is complemented with distributional comparisons using Kolmogorov-Smirnov distributional tests. Stage one. Anthropometric modelling Once comparable variables between census and survey are identied, the anthropometric indicator is modelled using the survey (height-for-age, weight-for-age, or other). The basic idea is to estimate a GLS model using as covariates only those variables comparable between both sources. Hence, the main model has the following composition: s s Hch = Xch β + Uch (1) s denotes height z-score of the child s of the household h belonging to the where Hch s the error component. cluster c, Xch household and individual characteristics, and Uch 6 The latter may be decomposed into cluster and household eect: s Uch = ηc + ch (2) The decomposition of error terms helps on obtaining the household error vector to be modelled in a second stage and be used for the GLS nal estimate. Stage two. Computation of the anthropometric measure Once the selection of the best model is achieved, the empirical distributions of β , ηc and ch are used for obtaining r random draws to be used in census records and estimating the welfare indicator at the individual level: s H̃ch,(r) = Xch β̃(r) + η̃c,(r) + ˜ch,(r) (3) The number of r replications provides r values of the welfare indicator for each census record. As equation (3) shows, using the same covariates selected for the nal model in the survey, β vector is multiplied by the corresponding covariate, and each type of error is added up. Cluster and household components are treated as random for obtaining the projection of child's height in the census records. 3.2 Omitted Variable Problem in SAE Several developing countries have constructed SAE based on the available variables in survey and census datasets. In fact, when we talk about SAE, the modeling of our welfare indicator is based on those variables strictly comparable between both sources. Nevertheless, empirical studies may show additional relevant variables for modeling the welfare indicator of interest, it is likely that if those covariates are not available in the census, they will be ignored. This issue had not been properly addressed on the literature on Poverty Mapping given that it is less clear what sort of variables must appear in the nal model for obtaining SAE. However, when we try to model anthropometric indicators, no matter the condition of the country, it is very likely that parental anthropometrics play a relevant role on child's health and nutrition; specially for long-term indicators such as height-for-age, birth weight, head circumference, among others. The ELL methodology has indirectly addressed the omitted variable problem, mainly for constructing Poverty Maps, claiming that the remaining residual after controlling for highly signicant individual and cluster variables is purely random. Therefore, by treating as random household and cluster components, ELL simulates the welfare indicator at the enumeration area (or geographical level of interest). However, the omission of a relevant variable had not been the main concern of the ELL methodology given that the inclusion of variables at the cluster level, help to 7 signicantly reduced the cluster eect. Hence, after an extensive use of variables at the cluster level, it is assumed that the remaining residual can be treated as a random eect. In spite of the extensive literature on Poverty Mapping showing the importance of cluster variables for reducing the variance of the nal SAE, it is not evident that the same logics may apply to anthropometric variables. The main reason leans on the empirical evidence regarding the impact of parental anthropometrics on child's development. Hence, under this context, it is likely that the omission of a relevant variable may be detrimental if this is not reected through the observable and comparable variables between census and survey. To acknowledge that parental anthropometrics may be compounded by characteristics not necessarily related with child's height covariates, make us think of dierent scenarios on the relationship between child's and maternal covariates. If we assume the omitted variable is maternal anthropometrics, the following scenarios may be plausible: 1) maternal height covariates (Zch ) highly correlated with child's (Xch ); 2) maternal height covariates (Zch ) weakly correlated with child's (Xch ); and 3) No correlation between maternal and child's covariates E[Zch , Xch ] =0. Each scenario explores dierent magnitudes of the coecient of maternal height on child's. The relevance of the omission of a variable will be relying on the correlation between child's and mother's height covariates, as well as on the signicance and magnitude of maternal height coecient on child's height. 3.3 Solutions to the Omitted Variable Problem 3.3.1 Two-Step Small Area Estimation This study suggests an alternative approach for considering relevant variables for modelling child's anthropometrics. Using the ELL methodology, I suggest the small area estimation of child malnutrition through a two-step-imputation. Firstly, the omitted variable has to be identied in the survey. Several specications of the anthropometric model have to be tested in order to ensure a signicant relationship between the variable of interest not available in the census, and the relevant covariate omitted during the model given its missing counterpart in the census. The identication of the omitted covariate should be clearly highlighted by the empirical literature. Hence, if we follow the empirical evidence regarding child's height modelling, we may think that the true model follows the following structure: s m s Hch = Xch β + Hch δ + Uch 8 (4) s is child's height, X m Where Hch ch is a set of household and child's characteristics, Hch s is a random component composed by cluster η and represents maternal height and Uch c household ch elements: s Uch = ηc + ch (5) Even though maternal height is a relevant variable for child's height modelling, the standard ELL methodology does not allow us to estimate child's height based on model equation (3). As a consequence, instead of leaving maternal height out of the child's mother, we suggest a two-step imputation. The rst step is to impute maternal height with comparable variables between census and survey, and after obtaining maternal height values for each census record, we use the imputed variable as a covariate for the nal model of child's height. 4 4.1 Monte Carlo Simulation Exercise Data Generating Process We simulate child's height for a census of 100,000 records for which the following geographical partitions are identied: state (11), municipalities (870), and enumeration areas (2870).1 . The Data Generating Process for simulating child's height follows the structure shown in (3), in which maternal height is described by: m m Hch = Zch θ + τch (6) m denotes maternal height of the mum m, belonging to the household h in the where Hch cluster c . Clusters may refer to enumeration areas, localities, or other low geographical partition; regularly dened at a lower geographical level than the nal geographical tarm an error component get. Zch is a set of household and individual characteristics, and τch decomposed into cluster φc and household eect υch . We assume Cov(φc ,υch )=0, Cov(υch ,υch0 )=0, Cov(Zch ,φc )=0, and Cov(Zch ,υch )=0 for all h 6=h'. However, it is possible to have some correlation between mother's and child's covariates. Because our objective is to assess the standard ELL methodology under the omission of a relevant variable for predicting child's malnutrition, we analyze two scenarios that encompasses the prediction power observed in real data: R2 =0.25 and R2 =0.45. For these cases, we explore the bias of SAE when maternal height is not included, and when 1 This geographical structure has been taken from the original Census records used for the empirical section of this study. 9 child's covariates face a positive correlation with maternal covariates. For the latter, 0 0 we inspect a case for which child's covariates have a low (E[Xch ,Zch ]=0.25) and high 0 0 correlation with mother's covariates (E[Xch ,Zch ]=0.65). Our scenarios assume dierent contributions of mother's height on child's height variance for which we assume normal distributions for all the components of equation (3). For each case mentioned above, we x the distributional parameters and change the corresponding coecient of maternal height (δ ) in order to change the contribution m ).2 For drawing a more realistic picture during the DGP, of the variable of interest (Hch child's covariates were split into potential common variables between mother and child, 0 0 which for helping on their distinction we call them Xch in the child's model and Zch in the mother's. As common variables we dene education ∼ N (10, 5) and income ∼ N (2500, 801); and as the rest of variables based on child's DGP we have maternal age ∼ N (28, 36) and potable water at the community level ∼ N (50, 35). The variable of m has been explited into an observed Z interest Hch ch ∼ N (155, 28) and an unobserved m components τch ∼ N (0, 9.6) . As equation (2) shows, child's residual is splitted into a cluster ηc ∼ N (0, 1) and household eect ch ∼ N (0, 1).3 Finally, our dependet variable s ∼ N (0, 1) follows a standard normal distribution Hch To assess SAE under the omission of maternal height in child's height models, random samples of the 5 per cent were drawn from the a census of 100,000 observations based on child's DGP (3); the geographical stratication was respected by sampling at the locality level. Random draws were taken from the same seed for which the census was created. The number of observations of the sample survey is around 5,000 observations and for each scenario we constructed 25 surveys. The true values of child's z-score were created by multiplying the values derived from the above distributions and the coecients dened for each scenario: s |Xch , Zch ) = E(Xch )β + E[Zch θ + φc + υch ]δ + E(ηc + uch ) E(Hch (7) Constructing the means at the enumeration area level using child's z-score from (7), we obtain the true values for which the nal SAE will be compared. 2 To increase the relevance of H m each variable maintained the same coecients with exception of a ch "support variable" called income. This variable was used for increasing the relevance of maternal height without changing the prediction power of the model or other parameters. 3 The contribution of error terms is modied by multiplying a x coecient to both of them. For simplicity, we are assuming that both eects aect in the same manner child's height z-score, however, this may not be the case using real data. 10 4.2 Monte Carlo Results Based on the scenarios described above, we calculate the mean of the MSE of the estimated z-score of child's height at the enumeration area level. This indicator is constructed by using the average child's z-score calculated from equation (7) and the SAE obtained by using the ELL standard methodology: M SEea = ell −Ȳ true | |Ȳea,r ea R true | |Ȳea r=1 R X (8) ell where M SEea denotes de mean squared error at the enumeration area level (EA); Ȳea,r is the mean of the z-score of the child at the EA level for each r -replication, and Ȳeatrue is the true value of the z-score based on our Monte Carlo Exercises. The survey derived from r, generates 250 values for each individual using the ELL method. R replications correspond to 25 surveys considered for each case we described above, based on R2 and correlation between common variables and maternal height.4 Table 1 and 2 present two scenarios assuming prediction power of 0.25 and 0.45, and dierent cases regarding the correlation of maternal height and common covariates between child's z-score. The mean M SE ea or bias is expressed as the relative change in the bias with respect to the best approach we may consider for reducing the bias. Intuitevely, although the prediction power is quite low, either 0.25 or 0.45, according to our scenarios, the best approach would be considering maternal height in child's modelling. As we would expect, the higher the correlation between maternal and child's covariates, the smaller the bias of the nal SAE at the enumeration level. Because the prediction power assumed for these exercises is quite low, although similar to the real data evidence, for child malnutrition is expected biased SAE even for our best scenario (R2 ). Although this may be the case if the error component is not purely random at the enumeration area, we may be able to reduce the bias through nding signicant child's covariates related to mother's height, or including maternal height as a covariate of child's z-score through a two-step-SAE. Table 1 and 2 assume a prediction power of 0.45 for which maternal height presents a contribution of 25 and 75 per cent on the total variance of child's zscore. For the case of 25 per cent of contribution, we observe that the no-inclusion of maternal height into child's model just biases the nal enumeration area mean of child's zscore in 4 per cent for the rst, median and third quartiles. The existence of a high correlation between common covariates of child's z-score and maternal height reduces the bias from 4 to 1 per cent. However, although this high correlation reduces the bias, there are still 4 The number of random surveys will be increased for generating the nal standard errors of SAE using Tarozzi's ado-les. 11 some gains from including maternal height. In addition, Table 2 shows that when the contribution of maternal height is quite high (0.75 per cent of the total variance), not including maternal height in the model when there is no correlation among mother's and child's covariates, increases the bias in almost 20 per cent. A high correlation between common covariates reduces the bias to 6 per cent for this case. 5 Empirical Evidence using Mexican Data Using the Mexican Health and Nutrition Survey (ENSANUT-2006) and the Mexican Census records (CONTEO-2005), this section will provide further evidence regarding the consequences of omitting a crucial variable on child's height modelling. To achieve our aim, I obtain SAE of child's height with and without considering maternal height in the model. Because maternal anthropometric indicators are not available in census data, we suggest a two-stage-SAE approach for estimating maternal height in a rst step to be considered as a covariate in the nal model of child's height. The following subsection will detailed the characteristics of our data, as well as the model selected for carrying out this exercise. Based on equation (4), we estimate the child's height zscore using as Xch household and child's characteristics (age, sex, birth order, dwelling characteristics, parental m schooling and age, municipality and locality information). Following equation (6), Hch contains two types of variables, one is contained in child's covariate matrix, and the other is not related to the child's DGP. 5.1 Data and Variables This study models the height for children between 0 to 5 years old for two geographical regions in Mexico. Under the classication suggested by the National Council of Population (CONAPO), for which we select the regions identied as Regions of Very High 5 Using principal components, the and High Vulnerability considering rural localities. index of vulnerability (or "marginalization") considers the following variables: education (illiteracy and population without complete primary school; dwelling (occupants without piped water, without drainage and sanitary service, with soil oor, without electricity and with overcrowding conditions); income (employed population earning up to two minimum salaries); and location (population living in localities with less than 5,000 inhabitants). The regions considered for this study are those with high incidence 5 This study uses the data preparation carried out for constructing the Nutrition Map 2005 nanced by the Mexican Government and produced during my work at the World Bank. The original Nutrition Map was achieved for the 32 states of Mexico 12 of stunting for children under 5 years old: Very High Vulnerability (Chiapas, Guerrero, and Oaxaca) and High Vulnerability (Campeche, Hidalgo, Michoacán, Puebla, Tabasco, Veracruz, San Luis Potosi, and Yucatan). Because the incidence is signicantly higher in rural regions, the current exercise is carried out using children living in rural localities (under 2,500 inhabitants) where the rst region presents 23 per cent of stunted children and the second 13 per cent. 6 Once geographical regions were dened, comparisons of concepts, distributions and summary statistics were achieved between the ENSANUT (2006) and CONTEO (2005). For classifying a pair of variables statistically similar, we compare distributions and summary statistics between survey and census records controlling for the survey design.7 The selected household and individual characteristics were the following: type of ooring, asset possession, water source, sanitary service, piped water inside the dwelling and type of drainage; and household size, kinship, sex, age, social security, indigenous identication, literacy, schooling attendance and schooling level. In addition, municipality and locality variables were included in the child's model too. Dwelling and population characteristics at locality level were extracted from the ITER 2005 which consist on a dataset of variables at the locality level constructed by the Institute of Statistics, Geography and Information Technology (INEGI) from census records. The Mexican Ministry of Social Development, the Ministry of Environment and Natural Resources, as well as the National Committee of Water Provision provided the following variables at the locality level: longitude, latitude, altitude, annual precipitation, type of vegetation, type of agriculture, tree coverage, soil erosion, and climate. 5.1.1 Denition of Height per Age Anthropometric indicators were constructed based on the international standards suggested by the World Health Organization (WHO) in 2006. In 1993, the WHO carried out an exhaustive revision of the use and interpretation of the anthropometric benchmarks used since the 1970s known as NCHS/WHO international reference population. The international reference growth curves were formulated in the 1970s originally planned to serve as a reference for the USA population. After this revision, it was concluded that the NCHS/WHO growth references were not adequately reecting the growth of children in early stages. From 1997 to 2003 the WHO Multicentre Growth Reference Study (MGRS) was implemented to develop growth references for children under 5 years old. The MGRS collected weight, height and head circumference mea6 The national incidence is 12.5 per cent of children under 5 years old. sample units (PSU) and geographical strata was provided by INEGI (Institute of Statistics, Geography and Information Technology). 7 Primary 13 sures from 8440 breast-fed children of dierent ethnic and cultural backgrounds. The countries considered for the new references were: USA (California), Oman (Muscat), Norway (Oslo), Brazil (Pelotas), Ghana (Accra) and India (South Delhi). Selected children were living in favorable socio-economic conditions and healthy environments (disease incidence was low). Around 80 per cent of mothers followed the WHO recommendations; some of them related to non-smoking habits during and after pregnancy, as well as breast-feeding. Base on the anthropometric indicators collected from these children, new growth references were constructed to identify underweighted, stunted and wasted children under 5 years old. For the purposes of this study, the new international benchmarks were used comparing each child of the survey with his respective "healthy" alike - considering age and gender. The dependent variables used for modeling were transformed into height z-scores for children under 5 years old. This transformation expresses the number of standard deviations that an individual is away from the international reference according to gender and age. 5.2 Estimation of Child's Height Z-Score 5.2.1 Relevance of Cluster identication Before starting the modeling stage, the identication of the geographical level dened as cluster was carried out looking at the hierarchical composition of each level, the identical codication in both sources (CONTEO and ENSANUT), and the number of children under 5 years old per level. Administrative divisions are based on the following hierarchical levels for the two regions considered: states (11), municipalities (1,557), localities (47,451), and manzanas (254,262).8 Due to the signicant number of "manzanas" (enumeration areas) with only one child under 5 years old in the survey, the cluster eect was dened one level above. We selected localities as clusters to avoid the construction of between heterogeneity with just a few individuals inside the cluster. The identication of cluster at the locality level implies that the variation of estimates will be mainly explained by this level besides the household eect. However, the possibility of having additional variation at higher levels is not neglected; therefore anthropometric models consider state and municipality level variables to capture higher level variation. Identifying dierent levels for cluster eect, Elbers, et.al (2008) demonstrate that the contribution of the correlation of higher levels is very low. Nevertheless, it is worthy to try some of these variables during the modeling stage given that this is purely an empirical question. 8 At the national level the geographical partition contains: and 1,183,678 manzanas. 14 32 states, 2,451 municipalities, 284,485 localities 5.2.2 Chid's Height Modelling and SAE without using Maternal Height Stage one of modelling was carried out using OLS regressions using the set of comparable variables selected in the previous stage. The modeling and simulations were achieved using children under 5 years old in the survey and census. According to the National Institute of Statistics, Geography and Information Technology (INEGI), 52 per cent of Mexican Localities presents children under 5 years old, and only half of them are concentrated in localities with more than 100 inhabitants. 5.3 Child's Height Modelling and SAE using Maternal Height As Section 3 has discussed, the omission of a relevant variable for child anthropometric may mislead the action of policy makers in the task of tackling malnutrition. Nevertheless the objective of the current methodologies of SAE do not aim to provide causal interpretations of individual, household and community characteristics considered in the model, it is crucial to understand the nature of the welfare indicator we are aiming at obtaining at small area levels. As it has been mentioned throughout this study, the empirical literature has highlighted not only the strong relationship between infrastructure and malnutrition, but also the strong and signicant relationship between parental anthropometrics and child health. Using the data described above, we rstly modelled the height of children under ve considering individual, household, locality, and municipality variables available in census and survey data (See Appendix I for further details regarding the model [not available for this draft version]). To obtain the best model, dierent specications have been tried based on the criterion of high prediction power R2 . The selected model presents a R2 of 0.27 which is quite high for anthropometric models, as Fujii (2005) also points out in his work for Cambodia. Looking at the Figures 1 and 2 of Appendix II, we analyze two states belonging to the geographical regions compounded by 11 states. We observe in these gures the percentages of stunting for children under 5 years old (zscore<-2) at the municipality level. Maps on the left show SAE obtained using the standard ELL and the second, result after carrying out a two-step-SAE for considering maternal height in the child's modelling. Figure 1 shows the muncipalities of the state of Chiapas (south of Mexico) for which the map before and after considering maternal height does not make any dierence on the incidence of stunting (zscore<-2) at the municipality level. However, when we look at the state of Hidalgo (centre of Mexico) in Fig 2, we observe a move heterogeneous picture. The two-step-SAE shows a higher number of municipalities with a high value of the average child's zscore. The location of the municiapalities with high stunting are scattered all over the state, in contrast to what we observe under the 15 standard ELL method. Although the states of Chiapas and Hidalgo are classied with high levels of poverty, both of them have dierent perfomances after including maternal anthropometrics. It is likely that states without experiencing dramatic changes in social mobility, will be easier to map child nutrition based on infrastructure and household variables. Specially if long-term variables, such as indigenous identication, are available in the data. However, if there are states with more heterogenous compositions, facing economic and social changes such as migration and social mobility, it is plausible that the accurate representation of long-term indicators on child malnutrition becomes more complex. Hence, for regions or states presenting signicant socioeconomic changes may be relevant to consider a two-step-SAE if the main purpose is to estimate indicators clearly related to long-term circumstances. 6 Conclusion Using Monte Carlo simulations assuming dierent contributions of maternal height on child's, we observe that higher contributions of the omitted variable (maternal height) may signicantly bias the nal estimates of child's height z-scores at small area levels. Unless the omitted variable has a strong relationship with child's covariates, the ELL method does not provide enough adjustment to solve the bias produced by the omission of maternal height (which has been assumed not purely random in this study). An empirical application provides evidence for supporting a two-step SAE following the criterion suggested by ELL. Using the Mexican Census (2005) and the Mexican Survey of Health and Nutrition (2006), this study shows how the Nutrition Map at the state level with municipality partitions may vary when maternal height is considered in the prediction of child's height. Although the two states analyzed are classied by the Mexican Government as having a similar poverty incidence (and we may expect similar levels of malnutrition), the omission of maternal height has a signicantly dierent eect on the nal estimates of child's height (z-score) at the municipality level. These dierences may be reecting the fact that the state of Chiapas is mainly characterized as a state with low social mobility and high concentration of indigenous communities, whereas the state of Hidalgo displays higher mobility through rural-urban migration. This would seem to indicate that the exclusion of a relevant variable, in this case maternal height, may be detrimental for local level estimates of child malnutrition if the region or state shows relevant socioeconomic changes. However, areas with stable social mobility and low economic growth present no signicant changes in the malnutrition ranking of municipalities after taking maternal height into consideration. In conclusion, it is possible to say that the implementation of the method outlined in this study, 16 together with further research in this area, could help map a more accurate estimation of child malnutrition for those areas with high socioeconomic variability, enabling policy makers to target their eorts more eectively in the future. 17 7 Appendix I: Tables Table 1: MC Exercises: Relative Mean of the MSE at the enumeration area level Contribution of Maternal Height: 25 per cent of Childs' Height Z-score Variance m ] 1st Quartile Median 3rd Quartile Corr[Xch , Hch 0 3.94 3.83 3.64 0.25 3.56 3.45 3.29 0.65 1.00 0.98 1.07 Table 2: MC Exercises: Relative Mean of the MSE at the enumeration area level Contribution of Maternal Height: 75 per cent of Childs' Height Z-score Variance m ] 1st Quartile Median 3rd Quartile Corr[Xch , Hch 8 0 18.4 17.9 16.4 0.25 16.1 15.7 14.4 0.65 5.9 5.8 5.6 Appendix II: Figures and Graphics Figure 1: State of Chiapas: Z-scores of Height for Children under 5 years old 18 Figure 2: State of Hidalgo: Z-scores of Height for Children under 5 years old 19