Logit and Probit Models with Discrete Dependent Variables
Transcription
Logit and Probit Models with Discrete Dependent Variables
Logit and Probit Models with Discrete Dependent Variables Why Do We Need A Different Model Than Linear Regression? Appropriate estimation of relations between variables depends on selecting an appropriate statistical model. There are many different types of estimation problems in political science. Continuous variables where the experiment can be viewed as draws from a normal distribution. Continuous Variables where the experiment is draws from some other distribution. Continuous Variables where the distribution is truncated or censored. Discrete Variables - For example, we might model labor force participation, whether to vote for or against, purchase or not purchase, run for office or not run for office, etc. Models of this type are sometimes called qualitative response models, because the dependent variables are discrete, rather than continuous. There are several types of such models including the following. Type of Qualitative Response Models Qualitative dichotomy (e.g., vote/not vote type variables)- We equate "no" with zero and "yes" with 1. However, these are qualitative choices and the coding of 0-1 is arbitrary. We could equally well code "no" as 1 and "yes" as zero. Qualitative multichotomy (e.g., occupational choice by an individual)- Let 0 be a clerk, 1 an engineer, 2 an attorney, 3 a politician, 4 a college professor, and 5 other. Here the codings are mere categories and the numbers have no real meaning. Rankings (e.g., opinions about a politician's job performance)- Strongly approve (5), approve (4), don't know (3), disapprove (2), strongly disapprove (1). The values that are chosen are not quantitative, but merely an ordering of preferences or opinions. The difference between outcomes is not necessarily the same from 5 to 4 as it is from 2 to 1. Count outcomes. Dichotomous Dependent Variables There are various problems associated with estimating a dichotomous dependent variable under assumptions of a statistical experiment that draws from a normal distribution, i.e., using regression. Obviously the statistical experiment is not draws from a normal distribution, but from something called a Bernoulli distribution. Thus, estimation is likely to be inefficient. It is also theoretically inconsistent with the nature of the statistical experiment. The dependent variable is discrete and truncated on both ends at 0 and 1. This leads to a number of other serious problems. Consider first, a graph of the data in a typical sample of Bernoulli experiments. Linear Probability Model 1.2 1 0.8 Y, P(Y) 0.6 0.4 0.2 0 -100 -50 -0.2 0 -0.4 X 50 100 Note that a linear regression line through the actual data cuts through the data at the point of greatest concentration on each end. The residuals from this regression line will only be close to the regression line if the X variable is also Bernoulli distributed. This means that measures of fit or hypothesis tests involving the squared errors will be silly. The regression line will seldom lie near the data. Linear Probability Model 1.2 1 0.8 Y, P(Y) 0.6 0.4 0.2 0 -100 -50 -0.2 0 -0.4 X 50 100 Relatedly, this feature also means that the residuals from the linear model will be dichotomous and heteroskedastic, rather than normal, raising questions about hypothesis tests. When y=1, the residual will depend on X and be: When y=0, the residual will depend on X and be: This means that the residuals from the linear probability model will be heteroskedastic and have a dichotomous character. Note that the residuals change systematically with the values of X. This implies what it termed endogeneity. They are also not distributed normally. We could "fix" this problem by estimating the linear probability model using weighted least squares. However, the problem with this model runs deeper. We must be able to interpret results from this model as expected values of probabilities. However, the graph below suggests further problems. Observe that some of the probabilities lie above 1 and below zero. This is not consistent with the rules of probability. We could truncate the model at 0 and 1 to "fix" this problem. However, note that probability, according to this model, is alleged to change in linear fashion with changes in X. Yet, this may not be consistent with reality in many real world situations. For example, consider the probability of home ownership as a function of income. Suppose we have prospective buyers with income around 10k per year. If we change their income by 1k, how much does the probability that they will buy a home change? Suppose we have prospective buyers with income around 30k. If we change their income by 1k, how much does the probability that they will own a home change? Suppose we have prospective buyers with income around 80k. If we change their income by 1k, how much does the probability that they will own a home change? In practice, there are many situations where the probability of a yes outcome follows an S shaped distribution, rather than the linear distribution alleged by the linear probability model. Non-Linear Probability Models To begin, assume the appropriate statistical experiment. The statistical experiment is draws from a Bernoulli distribution. The probability model from the Bernoulli distribution is given: f ( y | p) p yi (1 p)1 yi where p is a parameter reflecting the probability that y=1. The issue then becomes how to specify the probability that y=1. We noted above that this probability often follows an S shaped distribution. In other words, the probability that y=1 remains small until some threshold is crossed, at which point it switches rapidly to remain large after the threshold. This suggests a cumulative density function. Two different cumulative density functions are commonly used in this situation: the cumulative standard normal distribution (probit) and the cumulative logistic distribution (logit). Probit- The cumulative standard normal density is given: t P(Y 1) 1 e 2 Z2 2 dt ( z ) z 1 2 X ki ... k X ki Logit- The cumulative logistic function for logit is grounded in the concept of an odds ratio. Let the log odds that y=1 be given: P ln 1 2 X ki ... k X ki z 1 P Then solving for the probability that y=1 we have: P ez 1 P P (1 P )e z e z Pe z P Pe z e z P (1 e z ) e z ez P 1 ez Probit/Logit 1.2 1 0.8 P(Y) Probit 0.6 Logit 0.4 0.2 0 -10 0 z 10 Choosing Between Logit/Probit- In the dichotomous case, there is no basis in statistical theory for preferring one over the other. In most applications it makes no difference which one uses. If we have a small sample the two distributions can differ significantly in their results, but they are quite similar in large samples. Various R2 measures have been devised for Logit and Probit. However, none is a measure of the closeness of observations to an expected value as in regression analysis. All are ad hoc. Hypothesis testing t or z test- We can test the significance of the individual coefficients simply using the point estimates and standard errors (square roots of the diagonal elements of the asymptotic covariance matrix of estimates). Form a z or t test by taking t N k ˆ k k 0 s ˆ k Confidence Intervals Interpretation Interpreting Dichotomous Logit and Probit Coefficients- The actual coefficients in a logit or probit analysis are limited in their immediate interpretability. The signs are meaningful, but the magnitudes may not be, particularly when the variables are in different metrics. Above all, note that you cannot interpret the coefficients directly in terms of units of change in y for a unit change in x, as in regression analysis. There are various approaches to imparting substantive meaning into logit and probit results, including: Probability Calculations Graphical methods First differences First Partial derivatives.