√ K−Consistent Semiparametric Estimators of A Dynamic Panel Sample Selection Models .
Transcription
√ K−Consistent Semiparametric Estimators of A Dynamic Panel Sample Selection Models .
√ K−Consistent Semiparametric Estimators of A Dynamic Panel Sample Selection Models∗. George-Levi Gayle Dept. Of Economics, University of Pittsburgh. Christelle Viauroux Dept. Of Economics, University of Pittsburgh. Previous Draft: December 2002 Present Draft: January 2003 Abstract This paper considers the problem of identification and estimation in panel data sample selection models with a binary selection rule, when the latent equations contain possible predetermined variables, lags of the dependent variables, and unobserved individual effects. The selection equation contains lags of the dependent variables from the latent equations and other possible predetermined variables relative to the latent equations. We derive a set of conditional moment restrictions which are then exploited to construct a two-step GMM sieve estimators for the parameters of the main equation and a nonparametric estimator of the sample selection term. In the first step, the unknown parameters of the selection equation are consistently estimated using a transformation approach in the spirit of Berkson’s minimum chi-square method and a kernel estimator for the selection probability. In the second step, the estimates are used to construct sieve GMM estimator for the unknown parametric parameters √ and the unknown functions of the ”selection bias” term. The estimators are K -consistent and asymptotically normal. JEL Classification: C33, C34 Keywords: Predetermined Variable, Efficiency Bound, Series Estimators, Kernel Estimator, Single Index Models. ∗ Incomplete, Please do not quote without permission. We would like to thank Whitney Newey and Mehmet Caner for helpful suggestions and comments. All errors are naturally ours. 1 1 Introduction Panel data are very useful in applied research. Not only do they allow researchers to study the intertemporal behavior of individuals, they also enable them to control for the presence of unobserved permanent individual heterogeneity. At present, there exists a large body of literature of panel data models with unobserved individual effects that enter additively in the regression model (see, for example Hsiao ,1986 and Matyas and Sevestre,1996). In recent years, considerable advances in the panel data literature have been made in the direction of linear models that allow for the presence of lags of the dependent variable and other predetermined variables, as well as in the direction of “static” limited dependent variable models, that contain only strictly exogenous variables. These are reviewed in Arellano and Honore (1999), who also describe results for dynamic non-linear panel data models, for which, much less is known. Moreover, it is well known that parameter estimates from short panels jointly estimated with individual specific effects can be seriously biased when the explanatory variables are only predetermined as opposed to strictly exogenous. This situation includes models with lagged dependent variables as well as other models in which the explanatory variables are Granger caused by the endogenous variables. In linear models with additive effects, the standard response to this problem has been to consider IV estimates that exploit the lack of correlation between future errors in first-differences and lagged values of the variables see e.g. Anderson and Hsiao, 1981, Holtz-Eakin, Newey and Rosen, 1988, or Aralleno and Bond, 1991). However, much less results are available on limited dependent variables models with predetermined variables. This is the case when current values of the explanatory variables are influenced by past values of the dependent variables. The economic literature contains many important situations when this would be the case. For example, consider the case of Euler equation for household consumption(Zeldes, 1989, Runkle, 1991, and Keane and Runkle, 1992) or Company investment (Bond and Meghir, 1994). In these cases, the explanatory variables include variables in the agents information sets. However, these variables would be correlated with past shocks and hence past values of the dependent variables. Another example which will be of particular interest is models of life-cycle behavior, for example a model of labor supply of females. The assumption that all the explanatory variables are exogenous would mean for instance, that the current number of children does not depend on past labour supply decisions, which is unlikely if theoretical models of life cycle behavior are to be taken seriously (see e.g. Gayle and Miller, 2002 or Browning, 1992). 2 Sample selection is a problem frequently encountered in applied research. It arises as a result of either self-selection by the individuals under investigation, or sample selection decisions made by data analysts. A classic example is the studied in the seminal work of Gronau (1974) and Heckman (1976), is female labor supply, where hours worked are observed only for those women who decide to participate in the labor force. Failure to account for sample selection is well known to lead to inconsistent estimation of the behavioral parameters of interest, as these are confounded with parameters that determine the probability of entry into the sample. In recent years, a vast amount of econometric literature has been devoted to the problem of controlling for sample selectivity. Methods for deAing with sample selectivity is well known in the cross-section case. Recently, this problem was analyzed by Kyriazidou (1997 and 1999) in the “static” and “dynamic” panel data models respectively. However, in both papers, it was assumed that explanatory variables were strictly exogenous. However, as mentioned above and illustrated by the above classical case of Gronau (1974) and Heckman (1976) of female labor supply, it is obvious that this may not be a vAid assumption in many applications. Her estimator also have a very serious limitation, namely that selection equation does not contain the lagged continuous endogenous variable or other predetermined variables. More √ K. Chen (1998) proposed an over those estimators converge at a rate slower than √ estimator of a panel selection model which is K consistent however this model is a static model and does not allow for predetermined variable whether in the selection equation or the structural equation. In this paper we consider the problem of estimating a more general dynamic panel data sample selection model where the latent equations and binary selection equation each includes an additive unobservable individual specific effects and explanatory variables which may be predetermined with respect to the endogenous variable in the latent equations and are dynamic in that they depend on lagged endogenous variable of the latent equation. The binary selection equation also depend on lags of the observed endogenous variable of the latent equation. To estimate the binary selection equation,we will adopt a transformation approach in the spirit of Berkson’s minimum Chi-Square method for cross section binary choice model with group data which has been recently used by Chen(1998) to estimate a panel sample selection model and modify it to fit our case. As a result, an artificial panel data partial linear regression model with parameters of interest appearing linearly with a nonlinear component being an additive functions of the selection propensity scores of unknown form and the dependent variable being the observed endogenous variable of the latent equation lagged one period. Conse- 3 quently, we use a least squares type approach for the resulting partial linear model as in Andrews(1991), Chen(1998), Donald(1995) and Newey(1988). The nonlinear component is estimated nonparametrically using a series expansion with the selection probabilities replaced by nonparametric kernel estimates. The number of basis functions in the series approximation increases while the bandwidth in the kernel estimation decreases as the number of cross section units in the sample increases, allowing the approximation to become arbitrarily close. In addition as in Chen(1998), the additive structure of the nonlinear component is specifically taken into account in the series approximation, leading to a weaker identification condition as well as likely efficiency gain. In the latent equations, we adopt the classical view of a “selection bias” term as an unknown function of the selection propensity score or parameters summarizing the selection process. Consequently, the structure of the outcome equation can be expressed as a panel data partial linear regression for the particular subsample for which the outcome variable is observed for at least two consecutive period. In order to account for the lagged dependent variable and other predetermined variables we derived moment conditions following Arellano and Bover(1995). We then used a conditional moment seive estimator to estimate the structural parameters simultaneously with the “selection bias” term. In this approach the nonlinear sample selection component is approximated by a series expansion and the propensity score replaced by the double index representation with the first stage estimates inserted instead of the true parameters. The paper is organized as follows. The next section describes the model,some empirical examples, identification and derivation of the moment conditions Section 3 describes in more details the estimation procedure. Large sample properties of the selection equation estimators are investigated in Section 4. Section 5 investigates the large properties of the second stage estimator. Section 6 concludes with some discussions. All proofs are collected in a mathematical appendix. 2 Model The most typical concern in empirical work using panel data has been the presence of unobserved heterogeneity. Heterogeneity across economic agents may arise for example as a result of different preferences, endowments or attributes. These permanent individual characteristics are commonly unobservable, or may simply not be measurable due to their quAitative nature. Failure to account for such individualspecific effects may result in biased and inconsistent estimates of the parameters of 4 interest. The simultaneous presence of sample selection and unobserved heterogeneity has been noted in empirical work ( as for example in Hausman and Wise(1979) , Nijman and Verbeek(1992), Rosholm and Smith( 1994), Altug and Miller(1998) and Gayle and Miller(2002). In recent years, considerable attention have been placed in the panel data literature on the dynamic linear models that allowed for the presence of lags of the dependent variable and other predetermined variables( see for example Ahn and Schmidt(1995), Arellano and Bover(1995), and Blundell and Bond(1998)). As noted in Arellano and Honore (1999), there are a abundance of results for the dynamic linear model but much less is known for the dynamic non-linear panel model. A seminal contribution in that regard is provided in Kyriazidou(2001), she considered the problem of identification and estimation in panel data sample selection models with a binary selection rule( Type 2 Tobit models in the terminology of Ameniya(1985)) when the latent equations contain strictly exogenous variables, lags of the dependent variables and additive unobserved individual effects. Her model does not allow for the possibility of predetermined variables in either the selection equation or the latent equations. However in many empirical applications one might think that some variable in the both the selection equation and the latent equations may be predetermined relative to the latent dependent variable. For example,as pointed out in Arrellano and Honore(1999), the assumption that correct values of the regressors are not influenced by past values of the dependent variable and the error term is often unreAistic. Here we say that a regressor is predetermined in the model if the current error is uncorrelated with past values of the dependent variable and with current and past values of the regressors, but feedback effects from lagged dependent variables( or lagged errors) to current and future values of the explanatory variable are not ruled out. Empirical examples of these situations include Euler equations for household consumption Zeldes 1989, Runkle, 1991, Keane and Runkle, 1992) or for company investment ( Bond and Meghir, 1994), in which variables in the agents’ information sets are uncorrelated with the current and future idiosyncratic shocks but not with past shocks, together with the assumption that the empirical model’s errors are given by such shocks. An example which is more directly related to our present model is the effect of children on female labor supply and labor force participation decisions. In this context, assuming that children are strictly exogenous is much stronger than the assumption of predeterminedness, since it would require us to maintain that labor supply plans have no effect on fertility decisions at any point in the life cycle( see for example, Gayle and Miller, 2002). Kyriazidou(2001) also rule out the possibility of feedback effects from the latent equation to the selection equation. Continuing with the previous example on female 5 labor supply and labor force participation, this assumption rules out the possibility that the decision to participation in the labor today does not depend on the number of hours worked in the past. This has been noted in empirical work, for example, Gayle and Miller( 2002), found that with time nonseparable preferences over leisure there will be persistence in the labor participation decision of female. This paper seeks to contributed to the state of the art in panel data models by studying the identification and estimation of a model which account for the above issues. The model under consideration has the form ∗ ∗ = vfq−1 + u∗fq + ∗f + ∗fq vfq (1) vfq = afq vfq∗ (2) afq = 1{ (3) 0 afq−1 + 1 vfq−1 +tfq + f +fq ≥0} where afq ∈ {0 1}, f = 1 K and q = 1 Q Throughout the paper, K is considered to be large relative to Q It is assumed that the sample starts at date q = 0 and that af0 and vf0 are observed although the model is not specified for initial period. In the model given by (1)-(3), 0 0 1 ∈ R, ∈ Rh , ∈ Rn , u∗fq and tfq are vectors of explanatory variables (with possibly common elements ) while ∗f and f are unobservable time-invariant individual specific effects that are possibly correlated with each other as well as with the errors, the regressors and dependent variable are latent variables whose observability depends on the outcome of the ∗ u∗ ) is indicator variables afq . It is assumed that (afq tfq ) is always observed, (vfq fq observed only if afq = 1. In other words, the “ selection ” variable afq determines whether the fqe observation in equation(1) is censored or not. Thus, the observed sample consists of quadruples (afq tfq vfq ufq ) where ufq = afq u∗fq An important feature of this model as with the model in Kyriazidou(2001) that should be pointed out is that, although u∗fq and tfq may contain common variables, the two vectors do not coincide, which rules the censored regression model ( the Type I Tobit model ) as a special case of the model considered in this paper. The reason is that our semiparametric identification scheme of continuous outcome equation requires that the selection equation contains at least one variable that is not included in the outcome equations. This is the standard exclusion restriction in the literature on semiparametric identification of Type 2 Tobit models. The model under consideration may be relevant, for example, for estimating intertemporal labor supply responses to wage rate and non-labor incomes changes, or the joint effect of female labour supply and fertility behavior. Intertemporal substitution of labor has been studied in many empirical applications, as it pertains to 6 aggregates fluctuations and human capital accumulation. Dynamic models of labor supply of the form of (1) can be found in Hotz et al.(1988), Atlug and Miller(1998), and Gayle and Milller(2002)). These studies have found that these dynamic models yield intertemporal labor supply elasticities of substitution higher than models that assume intertemporal separability. In most studies they are considered interior solution, however when looking at female supply the it is obvious that one has to consider the relevance of corner solution. See Heckman(1993) for a recent survey of this literature. If one is only interested in participation decision alone, models of discrete choice that incorporate state dependence of the form of (3) have been used to account for the presence of human capital accumulation( Heckman(1981), Altug and Miller(1998) and Gayle and Miller(2002)). These models are also used to analyze search costs( Eckstein and Wolpin(1990), and Hyslop(1999). These is also a number of papers ( Cogan(1981), Hanoch(1980), Hausman(1980) , Altug and Miller(1998) and Gayle and Miller(2002)) have found considerable evidence of fixed costs associated with working, implying a Type 2 Tobit specification may be more appropriate for analyzing labor supply. While the model studied in Kyriazidou(2001) incorporates most of these sAient features of all these strands of literature, it would be difficult to derive her model directly from a structural dynamic utility maximization problem. The reason is that typically the model would introduce the lagged ∗ ) in the selection covariates equation. Our model overcome this limitation vfq ( or vfq and hence could be directly derived from a structural dynamic utility maximization problem. 2.1 Identification In order to identify our model, we made the following assumptions ∗ and afq is independent of u∗fq is stationary A1: { ∗fq }Qq=1 conditional on vfq−1 over time and is i.i.d. distributed over individuals and is independent of ∗f The strict stationary assumption of { ∗fq }Qq=1 is although stranger than the second moment restrictions found in the standard literature on dynamic linear panel data models, it is standard in the nonlinear semiparametric panel data models ( see for example Manki (1987), and Honore ( 1992, 1993). This is however less restrictive that the similar assumption in Kyriazidou (2001) b are the strictly exogenous compoA2: f = 0 bf + f where f are i.i.d. and tfq nents of tfq , bf Q 1X b = tfq Q (4) q=1 7 Assumption of this type is frequently found in models of panel data and is usually called the Mundlak specification, which is very flexible in that is allowed for the possibility of both a fixed or random effect. This is a cost we have to pay in our work relative to Kyriazidou (2001) since she does not make any assumption in this √ regard. We however think this is a small cost to pay for the fact we will obtain a K − lkpfpqbkq estimator. Let us define an additional indicator,Afq = afq afq−1 then in the spirit of the classical literature on sample selection we reformulate the model defined by equations(1)(3) as fq vfq = 0 vfq−1 + ufq 0 + f + fq + e (5) where ∗ fq = B[∗fq | Afq = 1 vfq−1 u∗fq ∗f ] (6) ∗ u∗fq ∗f ] = 0 The term fq is analogous to and by construction B[e fq | Afq = 1 vfq−1 the selection term or inverse Mill’s ratio in the classical sample selection literature but is defined conditional on Afq , to account for the dynamic /lagged dependent variables. Indeed, ¤ £ ∗ u∗fq ∗f B vfq∗ |Afq = 1 vfq−1 £ ¤ ∗ u∗fq ∗f (7) = 0 vfq−1 + ufq + f + B ∗fq |Afq = 1 vfq−1 When Afq = 1, we have both 0 afq−1 + 1 vfq−1 + tfq + f + fq ≥ 0 and 0 afq−2 + v 1 fq−2 + tfq−1 + f + fq−1 ≥ 0 , noting this and assumption A1 hence we can write the selection bias term as ¤ £ ∗ u∗fq ∗f B ∗fq |Afq = 1 vfq−1 ¸ · ∗ fq | 0 afq−1 + 0 vfq−1 + tfq + f + fq ≥ 0 (8) = B 0 afq−2 + 0 vfq−2 + tfq−1 + f + fq−1 ≥ 0 The using by assumption A1, we obtain ¤ £ ∗ B ∗fq |Afq = 1 vfq−1 u∗fq ∗f = ( 0 afq−1 + 1 vfq−1 + tfq + 0 bf 0 afq−2 + 1 vfq−2 + tfq−1 + 0 bf ) (9) This implies that following the standard restrictions in a semiparametric conditional moment restrictions we can now identify and estimate the rest of the parameters in the model. . 8 2.2 Moment Conditions In order to include the fact that we may have predetermined variables along with exogenous, we now assume that the vector of right hand side variables ufq f = 1 K q = 1 Q may include, time invariant variables wf plus other strictly exogenous and predetermined variables. In this matter, let ufq ≡ (wf ubfq umfq ) refer respectively to time-invariant, strictly exogenous and predetermined variable respectively. For each category, we introduce 0 w 0 )0 ub ≡ (ub ub ) and um ≡ (um um ) with the first the partitions wf ≡ (w1f 2f fq 1fq 2fq fq 1fq 2fq subsets denoting the variables that are uncorrelated to f Following Arellano and Bover(1995), these assumptions on the first model lead to the following moment restrictions: ∗m ∗m ∗ ∗b ∗ B[∗fq | wf∗ u∗b f1 ufQ uf1 ufq vf1 vfq−1 ] = 0, q ≤ Q (10) Note that these conditions imply a lack of serial dependence. Let us see if the equivalent condition in the transformed model holds ¯ ¤ ∗m ∗m ∗ ∗b ∗ ∗ B [e fq ¯wf∗ u∗b (11) f1 ufQ uf1 ufq vf1 vfq−1 Afq = 1 f ¯ ∗ ∗b ¤ ∗m ∗m ∗ ∗b ∗ ∗ ∗ = B [fq − fq ¯wf uf1 ufQ uf1 ufq vf1 vfq−1 Afq = 1 f Using the law of iterative expectations, we have that ¯ ¤ ∗m ∗m ∗ ∗b ∗ ∗ B [∗fq ¯wf∗ u∗b f1 ufQ uf1 ufq vf1 vfq−1 Afq = 1 f ¯ £ ¤ ∗ = B B [∗fq ¯Afq = 1 vfq−1 u∗fq ∗f ¯ ∗ ∗b ¤ ∗m ∗m ∗b ∗b ∗ ∗ ¯wf uf1 u∗b fq−1 ufq+1 ufQ uf1 ufq−1 vf1 vfq−2 ¯ ¤ ∗m ∗m ∗b ∗b ∗b ∗ ∗ = B [fq ¯wf∗ u∗b f1 ufq−1 ufq+1 ufQ uf1 ufq−1 vf1 vfq−2 (12) Then by substituting in the definition of fq into the second term of equation (11)we obtain, ¯ ¤ ∗m ∗m ∗ ∗b ∗ ∗ (13) B [fq ¯wf∗ u∗b f1 ufQ uf1 ufq vf1 vfq−1 Afq = 1 f ¯ £ ¤ ∗ ¯ ∗ ∗ ∗ = B B [fq Afq = 1 vfq−1 ufq f ¯ ∗ ∗b ¤ ∗m ∗m ∗b ∗b ∗ ∗ ¯wf uf1 u∗b fq−1 ufq+1 ufQ uf1 ufq−1 vf1 vfq−2 It is then obvious that equation (11) is just equal to equation(12) minus equation(13), which is zero. Finally, we have the following moment conditions, which relate to the transformed model, ¯ ¤ ∗m ∗m ∗ ∗b ∗ ∗ (14) B [e fq ¯wf∗ u∗b f1 ufQ uf1 ufq vf1 vfq−1 Afq = 1 f = 0 9 Then as in the standard literature, we assume that individual specific effect is independent of at least a part of the independent variables. This can be formAized as ¯ ¤ ∗m ∗m ∗b ∗ ∗ B [∗f ¯wf∗ u∗b (15) 1f1 u1fQ u1f1 u1fQ vf1 vfq−1 = 0 Combining equation(10) and equation(15) gives ¯ ¤ ∗m ∗m ∗b ∗ ∗ B [∗f + ∗fq ¯wf∗ u∗b 1f1 u1fQ u1f1 u1fq vf1 vfq−1 = 0 or equivalently ¯ £ ∗ ¤ ∗m ∗m ∗ ∗b ∗ ∗ B vfq − vfq−1 − u∗fq ¯wf∗ u∗b 1f1 u1fQ u1f1 u1fq vf1 vfq−1 = 0 (16) (17) As before, these moment conditions have no usefulness in their present form since this is a latent model and the selection equation will determine whether or not these are observed. ¯ ¤ ∗m ∗m ∗b ∗ ∗ (18) B [f ¯wf∗ u∗b 1f1 u1fQ u1f1 u1fQ vf1 vfq−1 Afq = 1 ¯ ¤ £ ∗m ∗m ∗ ¯ ∗ ∗b ∗b ∗ ∗ = B B [f wf u1f1 u1fQ u1f1 u1fQ vf1 vfq−1 |Afq = 1] = B [0 |Afq = 1] = 0 Combining equation(14) and equation(18) we obtain the feasible moment condition, ¯ ¤ ∗m ∗m ∗b ∗ ∗ B [f + e fq ¯wf∗ u∗b (19) 1f1 u1fQ u1f1 u1fq vf1 vfq−1 Afq = 1 = 0 or equivalently ¯ ∗ ∗b ¸ ¯ w u u∗b u∗m u∗m f 1f1 1fQ 1f1 1fq ¯ =0 B [vfq − vfq−1 − ufq − fq ¯ v ∗ v ∗ Afq = 1 f1 fq−1 (20) There are other moment conditions that one could derive by placing other restrictions on the variance over time or the possible pattern of serial correlation but we choose not to pursue them here. 3 Estimation As in the cross-section, semiparametric estimation could mean different degrees of suffixation, and relaxation of different assumptions from say a fully parametric 10 model. However, the √ aim of this paper is to make just enough assumption(reasonable) in order to secure K − lkpfpqbkv, those they are not as efficient as the estimates of a fully specified parametric model. We will be using a two step procedure, which is prevalent in the standard semiparametric literature. The two step procedure starts out by initially specifying the parametric parts of the model and by ordering the whole sample of Q K observations such that we have pairs of individual data with two consecutive periods of observed outcome, i.e. Afq = 1 Then, by using a step one estimator over the full sample of Q K observations, we estimate the selection equation. Once estimates of the selection equation are extracted, execution of a step two estimator over appropriate participation subsample of observations supplies the estimates of the structural(outcome) equation. As is well known, many step two estimators are able to identify only the slope coefficients of the structural equation. Therefore, an additional intercept estimator has to be applied to the participation subsample if an estimate of the intercept term of the structural equation is required. We will not pursue that estimate here and leave that for future work. As the step one and step two estimators are separate building-blocks, they may be combined in different ways. However, as the properties of the estimates of the desired structural parameters depend on the attributes of both estimators and their conjunction, a sensitive combination should be chosen. So instead of specifying properties that the first estimator should possess we will propose another that has that property. 3.1 Estimation of the Selection Equation The first step to estimating the selectivity model is to analyze the selection assignment mechanism, through which non-random samples emerge, and introduce the selectively bias into the structural equations. This connection between some explanatory variables and discrete assignment to one of a finite number of categories, here two, is embraced by the binary selection equations(3)over the full sample of observations. The two versions of the model can be linked through the propensity score Mfq ≡ B[afq | afq−1 vfq−1 tfq bf ] = M [afq | afq−1 vfq−1 tfq bf ] Equation (3), which is the most general form of the propensity score could be estimated nonparametrically by a multivariate kernel as B[afq | afq−1 vfq−1 tfq bf ] = Q P K P p=1p6=q g=1g6=f Q P K P agp e1 H p=1p6=q g=1g6=f 11 1 eH ³T 1fq −T1gp ³T e 1fq −T1gp e ´ ´ 1{T a −T a } 1fq 1gp 1{T a a nfq −T1gp } (21) ¢ ¡ a , T ≡ (v b a a ba where T1fq ≡ T1fq T1fq fq−1 tfq f ) T1fq ≡ (afq−1 tfq f ) where 1fq the subscript (resp. a) denotes continuous (resp. discrete) variables. Parameter e is the appropriately chosen bandwidth/smoothing parameter. This estimator is consistent under only weak smoothness assumptions on Cr|aq−1 vq−1 t Yet as expected this estimator’s generAity does not come along without shortcoming. The nonparametric estimates are susceptible to the curse of dimensionAity since the selection equation is likely to contain many regressors T1fq , leading to high inaccuracy that is particularly serious in as much as the succeeding estimation steps ground on those outcomes. Besides the low precision, the nonparametric approach is only able to identify B[afq | afq−1 vfq−1 tfq ] and not the distribution Cr|aq−1 vq−1 t , which contain useful information about the selection process. This later shortcoming, however, may be acceptable if not the selection process by itself is of interest but only its impact on the structural equation. Then the propensity score is sufficient for identification of the structural relationships, this is similar to the estimators of Ahnn and Powell (1993)and Robinson() assert on the bias of the nonparametric estimates of the selection equation in the cross section case. The selection equation(3) can be estimated by leaving the conditional distribution Cr|aq−1 vq−1 t unspecified and identify the selection equation via some additional conditions on f and rfq This class of estimators has as common feature the single index assumption on f and rfq As is the case in all binary choice model, for this model to be identified it must contain at least one continuous variable, here this would be satisfied by construction with vfq−1 , fulfilling that role. It also need a scale normAization, here we will set 0 = 1. With A2, the model becomes, afq = 1{ 0 b 0 afq−1 +vfq−1 +tfq + f +f +fq ≥0 } (22) Then, the selection propensity score Mfq ≡ B[afq | afq−1 vfq−1 tfq bf ] becomes ¤ £ (23) Mfq = M f + fq ≤ 0 afq−1 + vfq−1 + tfq + 0 bf 0 b = Cq ( 0 afq−1 + vfq−1 + tfq + f ) (24) where Cq () is the cumulative distribution function of f +fq assumed to be monotone increasing. Then, by inverting the distribution function, we have 0 afq−1 + 1 vfq−1 + tfq + 0 bf = Cq−1 (Mfq ) (25) Making the standard identification restrictions from the semiparametric model and normAizing 1 to one, we obtain the following partial linear pseudo regression; e+e 0 bf + C −1 (Mfq ) vfq−1 = f0 afq−1 + tfq 12 (26) where e = − , f0 = − 0 e = − e e 0 ), Tfq ≡ (afq−1 tfq ) and fq ≡ Cq−1 (Mfq ) Let us also denote by ≡ (f0 Assume first that Mfq is known and let nfI ≡ (n1I (Mfq ) nII (Mfq ))0 of approximating basis functions. Let us denote by nfq ≡ nq (Mfq ) N∗ ≡ (n11 n1Q −2 nK 1 nKQ −2 )0 , T ≡ (T11 T1Q −2 TK 1 TKQ −2 ),and ve ≡ (v11 v1Q −2 vK 1 vK Q −2 ) then using Frisch-Waugh theorem, we obtain an infeasible estimator of denoted b∗ b ≡ [T 0 J ∗ T ]−1 [T 0 J ∗ ve] (27) m m ¤ £ where Jm∗ ≡ F − N∗ (N∗0 N∗ )−1 N∗0 We can then transform the estimator in equation (27) by substituting the nonparametric estimator for Mfq in equation (21) into equation (27). This will then transform the infeasible estimator into a feasible estimator. 3.2 Estimation of the outcome Equation We will present a general GMM framework for the estimation of the outcome equation. The estimation will process from equation (20). First we denote fq = (Mefq ) where Mefq ≡ (Mfq Mfq ) Suppose for the moment that the Mfq0 p were known. Then following literature on series estimators ( see Andrews (1991) and Newey (1989)), we first approximate fq by a series of basis functions. That is, we let fq ≡ K X p=1 ep (Mefq )%p (28) where {ep (); p = 1 } is a set of basis function and %(K ) ≡ (%1 %κK )0 are unknown parameters. This then transformed equation (??) into " ¯ ∗ ∗b ¸ K X ¯ wf u1f1 u∗b u∗m u∗m 1fQ 1f1 1fq e ¯ ep (Mfq )%p ¯ ∗ ≈ 0 (29) B vfq − vfq−1 − ufq − v v ∗ Afq = 1 tfq tfq−1 p=1 f1 fq−1 for &K large enough,based on this we can define a GMM estimator. Let us now define some notation that will help us simplifies the subsequent presentation. Let ' 0 ≡ (0 00 )0 denote the true parameter vector that belongs to a subset compact subset, Γ of Rh+1 Ufq ≡ (vfq−1 ufq ) . Define the following moment functions.: 0 0 ∗ (vfq −Ufq0 '−(b Tfq b Tfq−1 )) q = 2 Q g = 1 q−1 j1fqg (' (K ) ) ≡ Afq afq−g vfq−g 13 (30) 0 b0 b0 j2fqg (' (K ) ) ≡ Afq afq−g u∗b 1fq−g (vfq −Ufq '−( Tfq Tfq−1 )) q = 2 Q g = 1 q−1 (31) b0 b0 j3fqg (' (K ) ) ≡ Afq afq+g u∗b 1fq+g (vfq −Ufq '−( Tfq Tfq−1 )) q = 2 Q g = 1 Q (32) 0 0 0 b b j4fqg (' (K ) ) ≡ Afq afq−g u∗m 1fq−g (vfq −Ufq '−( Tfq Tfq−1 )) 0 0 j5fq (' (K ) ) ≡ Afq afq wf∗ (vfq − Ufq0 ' − (b Tfq b Tfq−1 )) q = 2 Q g = 1 q−1 (33) q = 2 Q (34) be the vector of moment conditions. Let us define wfq as a 1 × nq of instruments for period q Note that because of the nature of panel data models, the number of instruments are going to be different each period. For ease of notation, let us also define the moment conditions as 0 0 jfq (0 ()) ≡ wfq0 (vfq − 0 vfq−1 − ufq − (b Tfq b Tfq−1 )) For convenience, let n instruments Wf as Wf2 0 0 Wf3 Wf = 0 (35) = max{n2 nQ } and define a (Q − 1) × n matrix of 0 0 0 WfQ (36) and a (Q − 1) × 1 vector of dependent variable as Vf = (vf2 vfQ )0 (37) a (Q − 1) × 1 vector of lagged dependent variables Vf−1 = (vf1 vfQ −1 )0 (38) 14 a (Q − 1) × m matrix of independent variables Uf = (u0f2 u0fQ )0 (39) a (Q − 1) × 1 vector of lagged dependent variables 0 0 0 0 (bTf ) = ((b Tf2 b Tf2 (b TfQ b TfQ )0 Then, we have the following orthogonAity conditions for estimating the outcome equation:jf (' ()) ≡ Wf0 (Vf − Vf−1 − Uf − (bTf )) ∀f = 1 K Let’s formally assume () ∈ Λ then the natural estimator is the GMM estimator of the form: # #0 " " K K X 1 X 1 jf (' (bTf )) Σ(U)−1 jf (' (bTf )) (40) inf K ∈Γ∈Λ K K f=1 f=1 −1 −1 a where Σ(U)−1 K is the weighting matrix, chosen such that m lim Σ(U)K = Σ(U) K →∞ nonstochastic matrix. Just as we did in the first stage we want to replace this with an approximation by a series estimator. To formally do that we first define a set of approximating functions,ΛK called a sieve and constructed it so it is dense in the underlying parameter space, i.e.Λ ∈ limK →∞ ΛK The our estimator becomes inf ∈Γ∈ΛK " # #0 " K K X 1 X 1 jf (' (bTf )) Σ(U)−1 jf (' (bTf )) K K f=1 K f=1 (41) This estimator is now a standard estimator as in Chen and Shen(1998) and some what similar to Ai and Chen (2001). How it is different in many regards. First unlike Ai and Chen(2001) however we do not have to estimate the conditional expectation, .jf () since we have it by definition of our model. 4 Asymptotic Properties of Selection Estimates In this section, we derive the large sample properties of the estimators defined in the previous section. Let k>k = [qob(>0 >)]1.2 for a matrix > Let Vf = (vfl vfQ )0 , Uf = (uf1 ufQ )0 tf = (tf1 tfQ )0 and af = (af1 afQ ) and Tf = (tf af ) We make the following assumptions: 15 Assumption 4.1: The vectors (af Vf Uf Tf ) satisfying (??)-(3) are independently and identically distributed across i, with finite fourth-order moments for each component. The cumulative distribution functions Cq (), q = 1 Q are strictly monotonic. | T a ) be the conditional density function of T given T a and Let %(T1fq 1 1 1fq a %0 (T1 ) the probability density for T1a Assumption 4.2: For each T1 = (T1 T1a ) ∈ T1 (i) %(T1 | T1a ) is bounded away from zero. (ii) %(T1 | T1a ) 0 (T1 T1a ) and Mq (T1 T1a ) are continuously differentiable to order p in T1 ∈ T 1 for q = 1 Q a (iii) The number of points of the support of T1a ∈ T1 is finite. Assumption 4.3: The kernel function h(r) has bounded support, is symmetric and continuously differentiable, and is of order p : Z h(r)ar = 1 Z ri h(r)ar 6= 0 if |i| 3 p i where ri = ri11 ri22 rjj for r = (r1 r2 rj ) and |i| = i1 + i2 + + ij for i = (i1 i2 ij ) a vector of nonnegative integers, and Z ri h(r)ar = 0 if |i| = p Assumption 4.4: The bandwidth sequence eK is chosen such that K 1.2 ej K . ln K → 2p ∞ and KeK → 0 as K → ∞ Let PT T = B(T − B ∗ (T | M ))B(T − B ∗ (T | M ))0 (42) Assumption 4.5: The matrix PT T is nonsingular. Let MT be a compact interval such that Mq (T ) ∈ MT for T ∈ T q = 1 Q Assumption 4.6: (i) Mq (T ) ∈ MT is continuously distributed with density bounded away from zero for all T ∈ T q = 1 Q (ii) (M ) is continuously differentiable of order fq max{p +1 p+1} and B ∗ (T | M ) is continuously differentiable of order ju{p p} 16 −1 Assumption 4.7: KI−2p → 0, and K 1.2 I5 [(Kej ln K + e2p K ) K ] → 0 as K → ∞ Assumption 4.1 describes the model and the data. Assumption 2 contains some smoothness and boundedness conditions on the distribution of the regressors in the selection equation. Assumption 3 states that the kernel function used in the estimation of the selection propensity score is of highest order p , this is standard in the nonparametric and semiparametric literature. This assumption along with Assumption 1 and the requirement on the bandwidth sequence in assumption 4.4, ensures the fast rate of uniform convergence of the selection propensity scores Mbfq and the existence of an asymptotic linear representation of certain weighted averages of Mbfq − Mfq Assumption 5 along with the monotonicity and stationarity conditions of assumption 1 are the identification conditions for and (0 ) respectively. Assumption 5 (i) rules out any deterministic relationship between (Mfq ) and Tfq The boundedness and smoothness conditions in assumption 6 are useful for controlling the bias of the series estimators. Assumption 7 restricts the rate of growth of the number of terms I in the series approximation, taking the first step kernel estimation into account. Although we could estimate the model without evoking the single index assumption, we will only derived the asymptotic properties of the model with the single index assumption which in reAity is a three step procedure but the first two steps will be treated as one step for convenient. Theorem 4.1: Under ° under°Assumptions 4.1 -4.7: ° ° (i) °bK − 0 ° = lm(1) √ (ii) K(bK − 0 ) ⇒ K(0 S ) where −1 − (a) S ≡ PT T S1 PT T 0 (b) S1 = B[8 PQ1f 8 1f ] (c) 8 1f ≡ q=2 (Tfq − B ∗ (Tfq | Mfq )(Ψa1fq 9 fq )0 fq (d) 9 fq ≡ afq − Mfq (e) Ψa1fq = //Mfqfq Proof. See Technical Appendix 5 Asymptotic Properties of the structural Estimates 17 Recall jf (' (Tf 0 ) Vf Uf ) B [jf ('0 0 (Tf ≡ Wf0 (Vf − Vf−1 − Uf − ( Tf Tf−1 )) 0 ) Vf Uf )] =0 (43) (44) √ b ∃ bK suchh that ³ K ³→ 0 at´a rate faster than ´i K Then, B jf ' Tf bK Vf Vf−1 Uf Wf = 0, where ≡ (' ) ∈ Υ ≡ Γ ⊗ Λ, where Γ is an infinite dimensional compact subspace of OH+1 and Λ is an infinite dimensional space. Let VK ≡ Γ⊗ΛK be a sequence of approximating spaces, such that {ΥK } is dense in Υ as K → ∞, that is for any ∈ Υ, there exists ΠK ∈ ΥK , such that a ( ΠK ) → 0 as K → ∞ where a is a pseudo distance. Next we formally define the second stage estimator. We first introduce some definitions, then present a set of sufficient conditions for consistency and Asymptotic normAity. Let ≡ ( ) and 0 ≡ (0 00 0 ) We define the (pathwise) directional derivatives of jf (' ) with respect to evaluated at 0 at the direction [ − 0 ]; j0 [ − 0 ] = −Wf0 Vf−1 [ − 0 ] − Wf0 Uf [ − 0 ] − Wf0 [ − 0 ] and for any 1 2 , we define a Fisher-like metric k1 − 2 k as q k1 − 2 k = B{j0 [ − 0 ]}0 Σ(U)−1 B{j0 [ − 0 ]} (45) (46) Here our pseudo distance a will be the distance induce by the Fisher-like metric. p p Definition 1 b is ; K −consistent ( under the Fisher-like metric) for if ; K kb − 0 k → 0 p 0 in probability , denoted as kb − 0 k = lm (1. ; K ) Assumption 5.1:. (Uf Tf ) ∈ a compact set with non-empty interior. Assumption 5.2: B[jf (' )] = 0 holds iff = 0 a sufficient condition for this to hold is 1- B [Wf0 ([Vf−1 Uf ] − B [(Vf−1 Uf ) |Tf Tf−1 ])] is nonsingular, i.e. for any < 6= 0 there is no measurable function c(Tf Tf−1 ) such that c(Tf 0 Tf−1 0 ) = [Vf−1 Uf ]0 < This is similar to the condition imposed by Newey (1999) and is the selection instrumental variable version of Robinson”(1988) identification condition for additive semiparametric regression. 18 Assumption 5.3:(i) Σ(U)K = Σ(U) + lm(; K ) uniformly over (Uf Tf ) ∈ (ii) there exist some positive constant 1 and 2 such that 1 ≤ jfj (Σ(U)) ≤ max (Σ(U)) ≤ 2 for all (Uf Tf ) ∈ (iii) there exist some positive constant 1 and 2 such that 1 ≤ jfj (Σl (U)) ≤ max (Σ(U)) ≤ 2 for all (Uf Tf ) ∈ (iv) for some positive value max (S o{j0 [ΠK >∗ ] ≤ Assumption 5.4: (i) B[Wf0 Uf ] 3 ∞ ,B[Wf0 Vf−1 ] B[Wf ] 3 ∞ for all , 2and sup so [j ( 0) {∈ΥK :k 0 −k≤} − j (0 0 )] ≤ ∂1 2 (47) for all small , 0 Assumption 5.5: (i) For any ∈ ΥK , there exists ΠK ∈ ΥK such that k − ΠK k = l(; K ) as K → ∞ p Denote zK ≡ {j[]0 Σ(U)−1 j[] : ∈ ΥK , k − 0 k ≤ ; K and κK ∈ (0 1] as a measure of the size of the sieve space: ½ ¾ Z κp √ 1 log K(zK B)aB ≤ L( K) (48) κK = inf κ ∈ (0 1] : 2 κ κ κ2 where K(zK B) is the minimum balls of radius B required to cover zK under the Fisher-like metric. (ii) κK = l(; K ) Here we are using the I2 metric entropy with bracketing to measure the size of a space. Denote kk2 as the I2 −norm on z for an element in z. Let L2 be T , the completion of z under kk2 For any given ° ° 0, if there exists P (T K) = ª © i r ° r ° i r i e1 e1 eK eK ⊂ L2 such that max °eg − eg ° ≤ T , and if for any e ∈ V there exists a g ∈ {1 K} with eig 1≤g≤K ≤e≤ K (T V ) = log (min {K : P (T K)}) erg 2 a e M , then (49) is defined as the bracketing I2 −metric entropy of the space z Assumption 5.1 restricts the condition regressors (Uf Tf ) to be bounded. This is not necessary for the results that follows since one can always trim large values as was done in the first stage. Assumption 5.2 is a global identification conditions these conditions can be stated in more primitive condition which we are working on at the moment. Assumption 5.3(i) requires that the estimator of the weighting matrix converges to the weighting matrix uniformly over the regressors at a rate faster than ; K . This assumption is not every restrictive and can be satisfied by many estimators. For example, the identity weighting matrix satisfies this condition. Assumption 5.3(ii) requires the weighting matrix to be bounded above and 19 below. These are standard assumption in the econometric literature, for example, these are commonly found in the weighted least squares literature. Assumption 5.4 are bounded moment conditions which are standard in all econometric model. Assumption 5.5 are conditions related to the sieve approximation of 0 ∈ Λ by ΠK 0 ∈ ΛK . Assumption 5.5(i) requires that the sieve approximation error must shrink to zero at a rate faster than ; K while assumption 5.5(ii) requires that the size of the sieve space ΛK should not be too large. See for example Fenton and Gallant(1996) for example of Hermite polynomials and Newey(1997) for spline and power series. Theorem 5.1 Under Assumptions 4.1 and 4.1-4.5 then ´´ ³ ³p ; K k0 ΠK 0 k kb K − 0 k = Lm max Proof. See Appendix. e denote the linear completion of the space Γ ×Λ under the FisherLet Λ ≡ Γ × Λ like metric kk and let h i denote the inner product induced by the norm kk on Λ A linear functional c : Λ → R is bounded (i.e. continuous) if and only if |c() − c(0 )| 3∞ k − 0 k k− 0 k,0∈Λ sup (50) Then following Chen and Shen(1998) among others used the Reisz representation theorem which states that for any bounded linear functional c : Λ → R, there exists a representor k>∗ k ≡ |c() − c(0 )| 3∞ k − 0 k k− 0 k,0∈Λ sup (51) Under some weak conditions and any linear the functional c (), one can established the following link of c(b ) − c(0 ) to the otherwise directional derivatives of the sample criterion function: K √ 1 X K(c (b ) − c( 0 )) = − √ jf 0 [> ∗ ]0 Σ(Uf )−1 jf (0 ) + lm(1) K f=1 (52) S o(c ) = B{jf 0 [>∗ ]0 Σ(Uf )−1 Σ0 (U)Σ(Uf )−1 jf 0 [>∗ ]} (53) √ ) − c( 0 )) is Then the Lindeberg-Levy central limit theorem implies that K(c (b asymptotically normally distributed with mean zero and variance S o(c): 20 with Σ0 (U) ≡ S√o[jf (0 )] We then apply this approach to derive the asymptotic ' − ' 0 ) Note that for any fixed non-zero ; ∈ Rh+1 ,c() = ; 0 ' distribution of K(b is bounded so we need to provide sufficient to conditions to ensure: (a) c() = ; 0 ' is bounded so we can compute its corresponding representor >∗ = (> ∗ >∗ ) ∈ Λ e and j [>∗ ] in the variance formula and the linkage equation(52) holds. ≡ Γ×Λ f 0 f ∗−0 . Note that W f can be exTo simplify notation let W = ∗ − 0 and W=e pressed as a linear combination of itself. Hence, the space is the same as the linear combination of itself, and we can always replace − 0 by −B(' − '0 ) in equation 45. Let AB(U) the matrix valued function AB(U) = −Wf0 Vf−1 [ − 0 ] − Wf0 Uf [ − 0 ] − Wf0 B[' − ' 0 ] (54) Then we have j0 [ − 0 ] = AB(U)(' − ' 0 ) (55) and Fisher-like norm k − 0 k = (' − ' 0 )0 B[AB(U)0 Σ(Uf )−1 AB(U)](' − ' 0 ) (56) Also note that B[AB(U)0 Σ(Uf )−1 AB(U)] is quadratic in B, therefore there exists a Q f such that B ∗ ≡ (B ∗1 B ∗! ) ∈ !g=1 W (57) B[AB(U)0 Σ(Uf )−1 AB(U) − B[A"∗ (U)0 Σ(Uf )−1 A"∗ (U)]] Q! f is positive semi-definite for any B ≡ (B 1 B ! ) ∈ g=1 W Then for c() = ; 0 ' we have k>∗ k ≡ |c() − c(0 )| = ; 0 B[A"∗ (U)0 Σ(Uf )−1 A" ∗ (U)]−1 ; k − k k− 0 k,0∈Λ 0 sup (58) Thus, c() = ; 0 ' is bounded if and only if B[A"∗ (U)0 Σ(Uf )−1 A" ∗ (U)] is finite positive-definite. Given this we can find the Riesz representor >∗ = (> ∗ > ∗ ) ∈ Λ e for the bounded linear functional c () = ; 0 ' as: ≡Γ×Λ ¡ ¢−1 ; (59) > ∗ = B[A"∗ (U)0 Σ(Uf )−1 A"∗ (U)] ¡ ¢−1 > ∗ = −B ∗ () B[A"∗ (U)0 Σ(Uf )−1 A"∗ (U)] ; (60) Hence ¡ ¢−1 jf 0 [> ∗ ] = A"∗ (U) B[A"∗ (U)0 Σ(Uf )−1 A" ∗ (U)] ; 21 (61) Assuming that the linkage equation √ 52 0 holds,0 and substituting 61 into the general variance formula, we obtain that K(; b ' − ; '0 ) is asymptotically distributed with mean zero and variance ; 0 S −1 ; where ¡ ¢−1 B[A" ∗ (U)0 Σ(Uf )−1 A"∗ (U)] × (62) S −1 ≡ (B{A"∗ (U)0 Σ(U)−1 Σl (U)Σ(U)−1 A"∗ (U)})× ¡ ¢ −1 0 −1 B[A" ∗ (U) Σ(Uf ) A"∗ (U)] √ Since ; is arbitrary, b ' is then K(b ' − ' 0 ) is asymptotically normally distributed with variance S −1 Note that Σl (U) is the variance of the moment condition which would into consideration the correction for the first stage equation. √ K(b ' − '0 ) ⇒ Theorem 5.2: Under Assumptions 5.1 and 6.1-6.5 then −1 K(0 S ) Proof. See Appendix. 6 Conclusion In this paper, we consider the problem of identification and estimation in panel sample selection models with binary selection rule when the latent equations could contain predetermined variables, lags of dependent variables and additive unobserved individual effects. The selection equation contains lags of its dependent variables and lags of the outcome dependent variables along with individual effects. Under the assumptions stationary and strict monotonicity of the selection equation error distribution function, we were able to derive a set of conditional moment restrictions which were used to construct a semiparametric GMM estimators that are √ K − lkpfpqbkq and asymptotically normal under a set of mild regularity conditions. An advantage of this approach is that it does not require any assumptions on the parametric form of the distribution of the unobservables conditional on the observed covariates and the initial conditions. It also estimate a more general model then the present models in the panel data literature but which has many uses. √ Finally unlike previous proposed estimation procedure it is K − lkpfpqbkq At present we are designing monte Carlo experiments which will be used to study the finite sample performance of our estimator. 7 7.1 Technical Appendix Proof of 1st Stage Estimator 22 Let denote a generic constant, which may take on different values in different situations. Let ° q ° ° # I ° ; q (I) = sup °/ nq (M )° 0 q (Tq ) |#q |= where: £ ¤ I (M ) 0 nqI (M ) = n1I (M1q ) nK Kq I (M ) where n I (M ) = >n I (M ) where > is nonsingular since a and nqI (M ) = nq∗ q∗ q nonsingular transformation would not affect nonparametric estimates Mq = (M1q MK q ) Eq = (Eq Eq−1 ) and Eq and Eq−1 are two nonnegative integers and q q / # nqI (M ) = / |# | nqI (M ) /Mq#q Also, for a measurable function c (Mq ) and a nonnegative integer rlet ¯ q ¯ ¯ ¯ |c (Mq )|r = max ¯/ # c (Mq )¯ 0 q (T ) q |# |≤r Mq ¯ ¯ q and |c (Mq )|r equals to infinity if / # c (Mq ) does not exist for some ¯Eq ¯ ≤ r and Let min (F) and max (F) denote the minimum and the maximum eigenvalues of a symmetric matrix F Let 0 f = 0 f1 0 f2 0 fQ −2 Lemma 1 Under Assumptions 1, 2, 3, 4, we have ¯ ¯ ³ ´ ¯ ¯ sup ¯Mbfq − Mfq ¯ 0 f = Lm (ln K)1.2 (Kej )−1.2 + ep f and ´ ³ Mbfq − Mfq 0 f = 1 (K −1)ej K Q P P q6=g g=f (agq − Mfq ) H ³T 1fq −T1gp eK ´ Q ¡ ¢ ¡ ¢ P a % Ta % T1fq |T1fq 0 1fq q6=g uniformly in f for q = 1 Q 23 1{T a =T a } 1fq 1gq ³ ´ +Lm K 1.2 Proof. The details are omitted here. The interested reader is referred to Newey (1994) or Chen (1998) Lemma 2 Under Assumptions 1 to 4, and suppose that (T ) is continuously dif of order p , then ferentiable in T1fq K K ´ ³ 1 X 1 X √ (Tfq ) Mbfq − Mfq 0 f = √ (Tfq ) 0 f >fq + lm (1) K f=1 K f=1 Proof. The details are omitted here. The reader is referred to Chen (1998). The following useful results are adapted from Newey (1993). If Assumption 7 holds, then for the polynomial series-based approximating functions n I (M ) and eh (M ) specified above, we have: a- For each I and h, there are nonsingular matrices > and£F such that n∗I (M ) =¤ I >n (M£) and eh∗ (M ) = Feh¤(M ), the smallest eigenvalues of B n∗I (M ) n I (M )0 0 (T ) and B eh∗ (M ) eh (M )0 0 (u) is bounded away from zero uniformly in I and H respectively. b- For each I and H respectively, there are ( T ) and ( u ) such that: ¯ ¯ ¯ ¯ ¯ (M ) − eh∗ (M ) 0 ¯ 0 (T ) 3 h−p 0 ¯ ¯ ¯ (M ) − eh∗ (M ) and ¯ ¯ (M ) − n∗0I (M ) ¯ ¯ (M ) − n∗0I (M ) ¯ ¯ −p+1 0 ¯ 0 (T ) 3 h 1 ¯ ¯ 3 I−p 0 ¯ ¯ 3 I−p+1 1 ° ° c- For each nonnegative a, max °/ r n∗I (M )° 0 (T ) ≤ I1+2a |r|=a Proof. ( of Theorem 1)Following Newey, Powell and Vella (2002), we can write that n I (M ) = n∗I (M ) since a nonsingular transformation would not affect nonparametric estimates based on power series expansion. 24 Let P = B [0 f nf nf ] where nf is the individual element of nI (M ), it then suffices to prove our results with P = F Recall that ¢ ¡ ¢ ¡ b = T 0 Jm T −1 T 0 Jm ve ¢ ¡ b = T 0 Jm T −1 T 0 Jm (T + ) where =( 1 0 1 K0K) = C −1 (M ) ¢ ¡ b = + T 0 Jm T −1 T 0 Jm ¢ ¡ b − = T 0 Jm T −1 T 0 Jm and ° ° ° °¡ ¢−1 0 ° °b ° ° 0 J T T J − = T ° ° ° m m ° Since kkis a continuous operator, we can use the continuous mapping theorem m and it is sufficient to show that (T 0 Jm T )−1 T 0 Jm → 0And by the Slutsky ¡ ¢−1 m −1 m theorem, that K1 T 0 Jm T → PT T (i) and K1 T 0 Jm → 0 (ii). (i) Following Chen (1989), we can show that 1 1 0 0 ∗ ∗−1 ∗0 T Jm T = T 0 Jm∗ T − (Π1 Π−1 2 Π1 − Π1 Π2 Π1 ) K K where Π1 = T N.K, Π∗1 = T N∗ .K, Π2 = N0 N.K, Π∗2 = N∗0 N∗ .K with N∗ = ∗ ∗ 0 ) and J ∗ = (F − N∗ (N∗0 N∗ )N∗0 ) (n1 0 1 n2∗ 0 2 nK K m ∗ = Π∗ Π∗−1 Also, Π3 = Π1 Π−1 and Π 3 1 2 2 From the Cauchy InequAity and Lemma 3, ° ° K °1 X ° ° ° Tf nf0 0 f ° kΠ1 k = ° °K ° f=1 !1.2 à !1.2 à K K X 1 1 X kTf k2 0 f knf k2 0 f ≤ K K f=1 f=1 ³ ´ = Lm I1.2 25 since 1 K K P f=1 kTf k2 0 f = Lm (1) and from Lemma 3, 1 K K P f=1 knf k2 0 f = Lm (I) In the same way, from Lemma 1 and 3, ° ° K °1 X ° ° ° Tf (nf − nf∗ )0 0 f ° kΠ1 − Π∗1 k = ° °K ° f=1 !1.2 à !1.2 à K K 1 X 1 X 2 ∗ 2 kTf k 0 f knf − nf k 0 f ≤ K K f=1 f=1 = Lm (G 1 ) ¶ µ with G 1 = max ; q1 G 0 and G 0 = (ln K)1.2 (Kej )−1.2 + ep 1≤q≤Q Moreover, kΠ1 − Π∗1 k ≥ kΠ1 k − kΠ∗1 k hence ³ ´ kΠ1 k ≤ kΠ∗1 k + Lm (G 1 ) = Lm I1.2 Similarly, we have that K 1 X ∗ kΠ2 − Π2 k ≤ knf − nf∗ k2 0 f + K f=1 à K 1 X knf − nf∗ k2 K f=1 kΠ2 − Π∗2 k = Lm (G 2 ) µ ¶2 µ ¶ 2 1.2 max ; q1 G 0 with G 2 = max ; q1 G 0 + I 1≤q≤Q !1.2 à K 1 X ∗ 2 knf k K f=1 !1.2 1≤q≤Q From Lemma 4 and the proof of Lemma 5 in Newey (1993), and the result astated above, one can show tm → 1 that: ° −1 ° ° ¡ ¢ ° °Π − Π∗−1 ° = °Π−1 Π∗−1 − Π−1 Π∗2 ° 2 2 2 2 2 ¡ ¢ ¡ ¢ ∗ max Π∗−1 kΠ2 − Π2 k ≤ max Π−1 2 2 = Lm (G 2 ) which implies that 26 1- 2- ° ° ° ¡ ¢° °(Π1 − Π∗ ) Π−1 − Π∗−1 ° ≤ kΠ1 − Π∗ k °Π−1 − Π∗−1 ° 1 1 2 2 2 2 = Lm (G 1 G 2 ) ° ° ∗ ∗−1 ° kΠ3 − Π∗3 k = °Π1 Π−1 2 − Π1 Π2 ¡ ¢ ¢ ¡ + (Π°1 − Π∗1 ) Π−1 + Π∗1 Π−1 Π∗−1 Π∗−1 with Π3 − Π∗3 = (Π¡1 − Π∗1¢) Π∗−1 2 2 − 2 2 − 2 ° ° ° ∗ k+kΠ∗ k °Π−1 − Π∗−1 °+kΠ − Π∗ k °Π−1 − Π∗−1 ° kΠ kΠ3 − Π∗3 k ≤ max Π∗−1 − Π 1 1 1 1 1 2 2 2 2 2 = Lm (G 3 ) with G 3 = G 1 + I1.2 G 2 + G 1 G 2 3° ° ° kΠ∗3 k = °Π∗1 Π∗−1 2 ¡ ¢ ∗ ≤ kΠ1 k max Π∗−1 2 ¡ ¢ = Lm I1.2 ¡ ¢ kΠ3 k ≤ kΠ3 − Π∗3 k + kΠ∗3 k = Lm I1.2 Similarly, ∗ ∗ ∗ ∗ ∗ ∗ kΠ3 Π01 − Π∗3 Π∗0 1 k ≤ k(Π3 − Π3 ) Π1 k + kΠ3 (Π1 − Π1 )k + k(Π3 − Π3 ) (Π1 − Π1 )k = Lm (G 4 ) = lm (1) 1.2 where G 4 = G 3 I + G 1 I1.2 + G 1 G 3 Therefore, 1 0 1 T Jm T = T 0 Jm∗ T − lm (1) K K By the Law of Large Numbers, £ ¤ 1 0 ∗ m T Jm T → B (T − B ∗ [T |M ])0 (T − B ∗ [T |M ]) = PT T K (ii) ° ³ ´° ° °1 0 ° T Jm Mb ° = lm (1) ° °K ° ° ³ ´° h ³ ´ i° ° ° ° °1 0 b b ° T Jm Mb ° = ° 1 T 0 Jm M −N ° ° °K ° °K ° ° °³ ³ ´ ´° ° 1 0 °° b − Nb ° °° ≤ ° M J T ° m °K ° °1.2 ° ³ ´ ° ° ° ° ° −1.2 ° 1 0 ° ° Mb − Nb° = K J T T ° m ° °K = K −1.2 I−p lm (1) → 0 as K → +∞ 27 Hence, ° ° °b ° − ° ° = lm (1) Next, we show that ¶−1 ³ ´ ´ µ1 √ ³ 1 0 √ T 0 JMb Mb K b− T JMb T = K K ³ ´ Mb converges in distribution to a So, now it suffices to show that √1K T 0 JM normal random variable with mean zero. We already have that: ³ ´ ³ ´ 1 1 1 √ T 0 Jm Mb = √ T 0 Jmb (M ) + √ T 0 Jmb a Mb − M Mb − M K K K ³ ³ ´´ 1 1 √ T 0 Jm (M ) = √ T 0 Jmb (M ) − Mb + lm (1) K K ³ ´ Mb around M Let ∗ia be the second term, that is a mean value expansion of Then, ° ° °1 0 ° ° T Jm (M ) + 1 T 0 Jmb ∗ia ° = lm (1) °K ° K Therefore, µ 1 1 0 √ T Jmb (M ) = − √ T 0 Jmb K K ¡ ¢ Note √1K T 0 Jmb − Jm ∗ia Let ∗ ia ¡ ¢ 1 − √ T 0 Jmb − Jm K ∗ ia ¶ ¡ ¢ 1 P21 = √ T 0 Jmb − Jm 1bm K ¡ ¢ 1 P22 = √ T 0 Jmb − Jm 2bm K For P21 , we have: ´ 1 ³ c0 − Π∗1 Π−1 N0 1bm Π1 Π−1 N P21 = √ 2 2 K · ³ ´0 ¢ 1 ¡ −1 c0 −1 b Π1 Π−1 N − N = √ − Π Π + Π Π N 1 1 1b m 2 2 2 K 28 + lm (1) ¸ 1b m ° 0 °N ° K Q −2 °X X ° ° ° = ° nfq 1b m ° ° ³ ´ ° ° b a1fq Mfq−1 − Mfq−1 0 fq ° ° f=1 q=2 ¶ µ °³ ´° ° ° ≤ max ; q0 (I) K (Q − 2) max ° Mbfq−1 − Mfq−1 ° 0 fq 1≤q≤Q −2 = Lm and °³ ´0 ° b−N ° N ° µµ ¶ ¶ max ; q0 (I) KG 0 and 1≤f≤K 1≤q≤Q −2 ° K Q −2 ° ° °X X ³ ´ ° ° ° ° ° (b nfq − nfq ) a1fq Mbfq−1 − Mfq−1 0 fq ° 1Mb ° = ° ° ° f=1 q=2 ¶ µ °³ ´° ° ° ≤ max ; q2 (I) K(Q − 4) max ° Mbfq−1 − Mfq−1 ° 0 fq 2≤q≤Q −2 = Lm Therefore, ° °c0 °N 1≤q≤Q −2 µµ 1≤f≤K ¶ ¶ 2 max ; q1 (I) KG 0 2≤q≤Q −2 °³ ° ° ´0 ° ° ° ° 0 ° b ° ° + °N b ° N − N ≤ ° b b ° 1M ° 1M 1M µ µµ ¶ µ ¶ ¶¶ max ; q0 (I) G 0 + max ; q1 (I) G 20 = Lm K 2≤q≤Q −2 2≤q≤Q −2 µµ ¶ µ ¶ ¶ 1 G K max ; q0 (I) G 0 + max ; q1 (I) G 20 kP21 k = Lm √ 3 2≤q≤Q −2 2≤q≤Q −2 K +I1.2 K; (I) G 2 q1 0 = lm (1) Similarly, we can show that P22 = lm (1), hence P2 = lm (1) We now consider P1 Note that P1 = P11 + P12 + P13 where 1 P11 = √ (T − B ∗ (T |M ))0 K Mb 29 1 P12 = √ JM B ∗ (T |M )0 1Mb K 1 P13 = √ (T − B ∗ (T |M ))0 (F − JM ) Mb K For P13 we have 1 kP13 k = √ (T − B ∗ (T |M ))0 (F − JM ) 1Mb °K ° ° 1 °° ° 0 ∗ ° ° b° ≤ ° √ (T − B (T |M )) (F − JM )° ° 1M K °³ ´°2 ° ° ≤ Lm K −1.2 I1.2 K (Q − 4) max ° Mbfq−1 − Mfq−1 ° 0 fq 1≤q≤Q −2 1≤f≤K ³ ´ = Lm I1.2 G 0 = lm (1) For P12 by the result stated from Newey (1993) c-, we have: ° ° ° ° °° 1 ° √ kP12 k = °JM B ∗ (T |M )0 ° ° ° K Mb ° ° ° °° 1 0 ∗ ° ° = JM B (T |M ) − N T ° ° √K ³ ´ ≤ Lm K 1.2 I−p G 0 = lm (1) ° ° ° Mb ° By Lemma 2, we can show that: P11 = = = = ¢ 1 ¡ √ (T − B ∗ (T |M )) 2Mb K K Q −2 1 XXh √ (Tfq − B ∗ (Tfq |Mfq Mfq−1 )) K f=1 q=2 ³ ´ i + a1fq Mbfq−1 − Mfq−1 0 fq K Q −2 1 XX √ (Tfq − B ∗ (Tfq |Mfq Mfq−1 )) ( K f=1 q=2 K 1 X √ ; 1f + lm (1) K f=1 30 a1fq ³ ´ Mbfq−1 − Mfq−1 a1fq > 1fq + a2fq > 2fq ) + lm (1) Consequently, we obtain √ ³ K bK − 0 ´ K 1 X −1 √ = PT ; 1f + lm (1) T K f=1 By the Linberg-fuller Central Limit theorem, we get the desired result. 7.2 Proof of 2nd Stage Estimator Proof. Proof ³ of´theorem h P 5.1 i h P i0 K −1 1 b b b Let IK K ≡ K1 K f=1 jf (' ( Tf )) (−Σ(U)K ) K f=1 jf (' ( Tf )) this is defined this way such that the minimization problem becomes a maximization problem with out loss of generAity. By definition of b K and the fact that ΠK 0 ∈ ΥK M [k0 − b K k ≥ uK ] " = M ∗ = M∗ ≤ M∗ sup {k 0 −k≥K ∈ΥK } " " sup # n ³ ´ ³ ´o ³ ´ ³ ´ IK bK − IK 0 bK ≥ IK b K bK − IK 0 bK {k 0 −k≥K ∈ΥK } {IK () − IK (0 ) − IK (0 ) − IK ()} ≥ IK (b K ) − IK (0 ) − IK (0 ) − IK () sup {k 0 −k≥K ∈ΥK } {IK () − IK (0 ) − IK (0 ) − IK ()} ≥ IK (ΠK 0 ) − IK (0 ) − IK ( 0 ) − IK () ≤ M1 + M2 + M3 # # with M1 ≡ M M3 ∗ " (uK )2 {IK ( 0 ) − IK (0 0 )} ≥ − sup 3 {k 0 −k≥K ∈VK } " # # ³ ´ ³ ´ (u )2 ¡2¢ K ≡ M IK 0 bK − IK ΠK 0 bK ≥ − L K 3 # " ³ ´ ³ ´ (u )2 K ≡ M IK 0 bK − IK ΠK 0 bK ≥ 3 31 (63) (64) (65) M2 {j ( 0 ) − j (0 0 )} {k0 −k≥K ∈ΥK } ³ ´ ≡ M∗ 2 1 H (0 ) − 3 (uK ) ≥ inf {k 0 −k≥K ∈ΥK } " ≤ M∗ sup 1 MK {j ( 0 ) − j (0 0 )} ≥ (uK )2 sup 3 {k 0 −k≥K ∈ΥK } # where H ( 0 ) ≡ K −1 K X B (jf ( 0 f=1 = B [j ( 0) − j ( 0) − j ( 0 )) 0 )] Then, by corollary 1 of Chen and Shen (1998, pp298), there exists constants ∂,∂ , 0 such that for any u ≥ 1 and any integer K : " 1 RK [j ( ) − j ( 0 )] ≥ (uK )2 sup M∗ 3 {k0 −k≥K ∈ΥK } i h ≤ n∂ exp −∂K (uK )2 # Then, i h 2 M2 ≤ n∂ exp −∂K (uK ) " (66) 2 # {IK ( 0 ) − IK (0 0 )} ≥ − (u3K ) {k 0 −k≥K ∈ΥK } ³ ´ ³ ´i ³ ´ h Next, we bound M1 : Since B IK 0 bK − IK ΠK 0 bK = H 0 ΠK 0 bK ≤ M1 ≡ M ∗ sup 2K and u ≥ 1 : 32 " ³ ´ ³ ´ h ³ ´ ³ ´i (u )2 K ≤ M IK 0 bK − IK ΠK 0 bK − B IK 0 bK − IK ΠK 0 bK ≥ 9 ° K h ³ ´ ³ ´i ° ° P ° " # ° ° b b 2 − j − j Π K K K ) (u 0 0 ° ° K ≤ B ° f=1 h ³ K ´ ³ ´i ° ° ° 9 ° −B j 0 bK − j ΠK 0 bK ° M1 Then, M1 ≤ ∂3 K −1.2 u−1 −1 K (67) Next, we bound M3 : M3 ≤ M ∗ " ³ IK 0 bK ´ ³ ´ h − IK ( 0 0 ) − B IK 0 bK − IK ( 0 ° ´ i ° K h ³ ° P ° " #− ° ° b j − j ( ) K 0 (uK )2 0 0 ° ° ≤ B ° f=1 h ³ K i ° ´ ° ° 9 b ° −B j 0 K − j (0 0 ) ° i (uK )2 ) ≥ 0 9 # This completes the proof. 1 We will first let K = K − 2 r∗ = ±> ∗ and ∗ = + k r∗ Then − ΠK ∗ = −K ΠK r∗ The fallowing lemmas will be useful later for proving asymptotic normAity and consistency of our estimator. Let denote oj [ − l ] = j() − jl [ − l ]0 K 1 X IK 0 [ − l ] ≡ jf 0 [>∗ ]0 Σ(Uf )−1 jf (0 ) = 0 K f=1 " # #0 " K K X 1 X 1 jf (' (bTf )) Σ(U)−1 jf (' (bTf )) IK [] ≡ K K K f=1 and f=1 O[ l ] ≡ IK [] − IK [l ] − IK 0 [ − l ] = IK [] 33 # Lemma 3 Under Assumption 5.1-5.6 and Theorem 1 we have K K 1 X 1 X 1 −1 ∗ 0 j [ΠK > ] Σ(Uf )K jf (0 ) = j [>∗ ]0 Σ(Uf )−1 jf (0 ) + lm( √ ) K f=1 f 0 K f=1 f 0 K Proof. The details are omitted here. The interested reader is referred to Ai and Chen(1999) Lemma 4 Under assumption 4.1-4.6 and 5.1-5.6, we have uniformly over ∈ ΛK ¡ ¢ IK jl [ − l ]0 Σ(U)−1 jl [ΠK r∗ ] = lm (k ) Proof. The details are omitted here. The interested reader is referred to Ai and Chen(1999) Lemma 5 Under assumptions 4.1-4.6, 5.1-5.6 we have O[ l ] − O[Π ∗ l ] − B{O[ l ] − O[Π ∗ l ]} = lm(2K ) Proof. The details are omitted here. The interested reader is referred to Ai and Chen(1999) Lemma 6 Under assumptions 4.1 -5.6, we have uniformly over ∈ ΛK 1 B[IK [] − IK [Π∗ ] = [kΠ ∗ − k2 + k − l k] + lm(2K ) 2 Proof. The details are omitted here. The interested reader is referred to Ai and Chen(1999 Proof. of Theorem 5.1) For H , 1 ³ p ´ ≤ M sup IK [] ≥ IK [Π] M kb − 0 k ≥ H2 ; K √ kb − 0 k≥H2 ≤ M ' K ∈ΛK sup √ kb − 0 k≥H2 ' K ∈ΛK IK [] − IK [0 ] ≥ IK [Π] − IK [0 ] To show that this go to zero , we notice that all conditions A.1-A.4 in Chen and Shen(1998) are trivially satisfied given Assumptions 5.1-5.5. Hence this tends to zero by the theorem 1 in Chen and Shen (1998). 34 Proof. of Theorem 5.2)Let b ∗ = b + K r∗ and ] = IK [ΠK b ∗ ] + IK 0 [b − ΠK b ∗ ] + O[(b l ] − O[ΠK b ∗ l ] IK [b = IK [ΠK b ∗ ] + IK 0 [b − ΠK b ∗ ] + O[(b l ] − O[ΠK b ∗ l ] +B{IK [b ] − IK [ΠK b ∗ ]} − B{O[(b l ] − O[ΠK b ∗ l ]} This implies that 1 1 ] = IK [ΠK b ∗ ] + [kΠb − l k] + IK 0 [b − ΠK b ∗ ] + lm( ) ∗ − 0 k2 + kb IK [b 2 K Looking at the second term on the right hand side of the above equation, we have ∗ − k2 + kb − l k2 + 2K kr∗ k2 kΠb ∗ − l k2 = kΠb +2 hΠb ∗ − ∗ b − l i + 22K hΠb ∗ − ∗ r∗ i +2K hb − l r∗ i Note that |hΠb ∗ − ∗ b − l i| = K |hΠr∗ − r∗ b − l i| ∗ ∗ ≤ K kΠr − r k × kb − l k = lm(2K ) and ∗ − ∗ r∗ i| = 2K |hΠr∗ − r∗ r∗ i| = lm(2K ) 2K |hΠb It follows then that kΠb ∗ − l k2 = lm(2K ) + kb − l k2 + 2K hb − l r∗ i and ] − IK [ΠK b ∗ ] − IK 0 [b − ΠK b ∗ ] = K hb − l r∗ i + lm(2K ) IK [b By definition of b , the IK [b ] − IK 0 [b − ΠK b ∗ ] ≥ 0 By Theorem 5.1 we have 1 kb − l k = lm( √K ) It follows that 0 ≤ IK 0 [b − ΠK b ∗ ] + K hb − l r∗ i + lm(2K ) 35 Note that b − ΠK b ∗ = −K ΠK r∗ Which means that and − ΠK b ∗ ] = −K IK 0 [ΠK r∗ ] IK 0 [b − l r∗ i + lm(2K ) 0 ≤ −K IK 0 [ΠK r∗ ] + K hb Therefore − l r∗ i + lm(2K ) 0 ≥ K IK 0 [ΠK r∗ ] − K hb Since this holds for r∗ = ±>∗ we obtain ¯ ¯ ∗ ¯ ¯IK [ΠK r∗ ] − hb − r i = lm(K ) l 0 This proves that ¯ ¯ 1 ¯IK [ΠK r∗ ] = hb − l r∗ i¯ + lm( √ ) 0 K Hence for any fixed non-zero ; ∈ RH+1 we have ; 0 (b ' K − '0 ) = − K K 1X 1 X 1 jf 0 [>∗ ]0 Σ(Uf )−1 jf (0 ) + √ k K K f=1 f=1 substituting for >∗ we obtain K √ ¡ ¢−1 1 X √ K(b ' K −' 0 ) = − B[A"∗ (U)0 Σ(Uf )−1 A" ∗ (U)] A"∗ (U)0 Σ(Uf )−1 jf ( 0 )+lm(1) K f=1 The theorem now follows from applying a standard CLT for i.i.d. data. References [1] Ahn H. and J.L. Powell(1993): Semiparametric estimation of censored Selection Models with a Nonparametric Selection Mechanism, Journal of Econometrics, 58 3- 29. [2] Ai, C and X. Chen(1999): Efficient Estimation of Models with Conditional Moment Restrictions Containing Unknown Functions, Department of Economics, NYU. 36 [3] Altug, S and R. A. Miller (1998): The Effect of Work Experience on Female Wages and Labour Supply, Review of Economic Studies, 45-85 [4] Anderson, T.W. and C. Hsiao (1981): Estimation of Dynamic Models with Error Components, Journal of American Statistical Association, Vol. 76, 598606. [5] Andrews, D W. K. (1987): “Consistency in Nonlinear Econometric Models: A Generic Uniform Law of Large Numbers,” Econometrica, 55, 1465-1472. [6] –––(1988): “ Laws of Large Numbers for Dependent Non-identically Distributed Random Variables,” Econometric Theory, 4, 458-467. [7] ______(1991): ” Asymptotic NormAity of Series estimators For Nonparametric and Semiparametric regression Models”, Econometrica, Vol. 59, 307-347. [8] _____(1992): “ Generic Uniform Convergence,” Econometric Theory, 8, 241-257. [9] ______(1993): “ Tests for Parameter Instability and Structural Change With Unknown Change Point,” Econometrica, 61, 821-856. [10] ______(1994a): “ Asymptotics for Semiparametric Econometric Models Via Stochastic Equicontinuity,” Econometrica, 62, 43-72. [11] ______(1994b): “ Empirical Process Methods in Econometrics,” Chapter 2 in Handbook of Econometrics, Vol. 4, New York: North Holland. [12] Arellano, M. and S. Bond (1991): Some Tests of Specification for Panel Data: Monte-Carlo Evidence and an Application to Employment Equations, Review of Economic Studies, Vol.56, 277-297. [13] Arellano, M. and O. Bover(1995): Another Look at the Instrumental Variable Estimation of Error Component Models, Jornal of Econometrics, 68, 29-51. [14] Arellano, M and R. Carrasco( 2002): Binary Choice Panel Data Models with Predetermined Variables, CEMFI, Madrid. [15] Arellano, M. and B. Honore (2001): Panel Data Models. Some Recent Developments. Handbook of Econometrics, in J. Heckman and E. Leamer (eds.), Handbook of Econometrics, Vol.5, Ch. 53, North Holland. 37 [16] Billingsley, P. (1995): Probability and Measure. New York: Wiley. [17] Bond, S. and C. Meghir (1994): Dynamic Investment Models and the Firm’s Financial Policy, Review of Economic Studies, Vol.61, 197-222. [18] Browning, M. (1992): Children, Household, Economic Behavior, Journal of Economic literature, Vol.30, 1434-1476. √ [19] Chen, S. (1998): K−consistent estimation of a panel Data Sample Selection Model, Department of Economics, The Hong Kong Unversity of Science and Technology. [20] Chen, S (1999): Semiparametric estimation of heteroscedastic Binary Choice Sample Selection Models Under Symmetry, The Hong Kong Unversity of Science and Technology. [21] Chen, X and X. Shen (1998): Sieve Extremum Estimates for Weakly Dependent Data, Econometrica,66, 289-314. [22] Davidson, J. (1994): Stochastic Limit Theory : An Introduction for Econometricians. New York : Oxford University Press. [23] Donald S.G. (1995): Two-Step Estimation of Heteroscedastic Sample Selection Models, Journal of Econometrics, 65, 347-380. [24] Fenton, V. and gallant(1996): Convergence Rate of SNP Density Estimators, Econometrica 64, 719-727. [25] Gayle, G-L and Robert A. Miller (2002), Life-cycle Fertility Behavior and Human Capital Accumulation, Working Paper, Department of Economics, University of Pittsburgh. [26] Gronau, R. (1974): Wage Comparisons - A Selectivity Bias, Journal of Political Economy, Vol.82, 1119-1144. [27] Heckman, J. (1976): Common Structures of Statistical Models of Truncations Sample Selection and Limited Dependent Variables, and a Simple Estimator for Such Models, Annals of Economic and Social Measurement, Vol.15, 475-492.. [28] Holtz-Eakin, D., Newey and H.S. Rosen (1988): Estimating Vector Autoregressions with Panel Data, Econometrica, Vol 56, 1371-1395. 38 [29] Hotz, V. Joseph and Robert A. Miller (1988): An Empirical Analysis of Life Cycle Fertility and Female Labour Supply, Econometrica, Vol. 56. no. 1, 91-118. [30] Honore, B. and E. Kyriazidou (1998): Panel Data Discrete Choice models With Lagged Dependent Variables, Princeton University. [31] Jennrich, R. I. (1969): ” Asymptotic properties of Nonlinear least Squares Estimators,” Annals of Mathematical Statistics, 40, 633-643. [32] Keane, M.and D. Runkle (1992): On the Estimation of Panel Data Models with Serial Correlation when Instruments are Not Strictly Exogenous, Journal of Business and Economic Statistics, Vol.10, 1-9. [33] Kyriazidou, E. (1997): Estimation of Panel Data Sample Selection Model, Econometrica, Vol.65, 1335-1364. [34] Kyriazidou, E.(1999): Estimation of Dynamic Panel Data Sample Selection Model, Unpublished Manuscript, Department of Economics, University of Chicago. [35] Hsiao, C.(1986): Analysis of Panel Data. Cambridge: Cambridge University Press. [36] Matyas, L. and P. Sevestre, editors: Econometrics of Panel Data. Kluwer Academic Public Press. [37] Newey W. (1988): Two-Step Series Estimation of sample Selection Models, Princeton University. [38] Newey, W, (1997): Convergence rates and asymptotic normAity for series estimators, Journal of Econometrics, Vol. 79, 147-168. [39] Newey, W., J.L. Powell and F. Vella(2002): Nonparametric estimation of Triangular Simultaneous equations models, Econometrica [40] Pollard, D. (1984): Convergence of Stochastic Processes. New York: SpringerVerlag. [41] Powell, J.L., J.H. stock and T.M. Stoker (1989): Semiparametric Estimation of weighted Average Derivatives, Econometrica 57, 1403-1430. 39 [42] Powell J.L.( 1994), Estimation of Semiparametric models, in Handbook of Econometrics, Vol. 4, 2444-2523, eds. R.F. Engle and D.L. McFadden, Amsterdam: North-Holland. [43] Runkle, D.(1991): Liquidity Constraints and Permanent Income Hypothesis: Evidence from Panel Data, Journal of Monetary Economics, Vol.97, 73-98. 40