What is Statistics? Introduction Bayesian Methods in Mplus
Transcription
What is Statistics? Introduction Bayesian Methods in Mplus
What is Statistics? Introduction Bayesian Methods in Mplus Statistics is about uncertainty “To err is human, to forgive divine, but to include errors in your design is statistical” Mplus Users Group 27 October 2010 Leslie Kish, 1977 Presidential address A.S.A. 2 mcmc Uncertainty in Classical Statistics Uncertainty = sampling distribution Inference in Classical Statistics Estimate population parameter Imagine drawing an infinity of samples Distribution of ˆ over samples by ˆ We have only one sample What does 95% confidence interval actually mean? Estimate ˆ and its sampling distribution Estimate 95% confidence interval Over an infinity of samples, 95% of these contain the true population value But we have only one sample We never know if our present estimate ˆ and confidence interval is one of those 95% or not 3 4 Inference in Classical Statistics Uncertainty in Bayesian Statistics What does 95% confidence interval NOT mean? We have a 95% probability that the true population value is within the limits of our confidence interval We only have an aggregate assurance that in the long run 95% of our confidence intervals contain the true population value Uncertainty = probability distribution for the population parameter In classical statistics the population parameter has one single true value 5 Only we happen to not know it In Bayesian statistics we imagine a distribution of possible values of population parameter Each unknown parameter must have an associated probability distribution 6 1 Uncertainty in Bayesian Statistics Inference in Bayesian Statistics Each unknown parameter must have an associated probability distribution Before we have data: prior distribution After we have data: posterior distribution = f (prior + data) Posterior distribution used to find estimate for and confidence interval ˆ = mode, median, mean of posterior Posterior distribution to estimate and confidence interval confidence interval = central 95% region credibility interval, (percentile method) Posterior = f (prior + data) Prior distribution influences posterior Bayesian statistical inference depends partly on the prior Which does not depend on the data (in empirical Bayes it does…) 7 8 Components of Bayesian Inference Inference in Bayesian Statistics Prior Distribution – probability distribution to quantify uncertainty about unknown parameters Likelihood function – relates all variables into a full probability model Posterior Distribution – after using data to update the prior information about unknown parameters Bayesian statistical inference depends partly on the prior, so: which prior? Technical considerations Conjugate prior (posterior belongs to same distribution family as the prior) Proper prior (real probability distribution) Fundamental consideration Informative prior or ignorance prior? Total ignorance does not exist … all priors add some information to the data 9 10 Inference in Bayesian Statistics Bayes’ Theorem Posterior distribution is used to find estimate for and confidence interval ˆ = mode ( to maximum likelihood estimate) confidence interval = central 95% region Assumes a simple posterior distribution, so we can compute its characteristics In realistically complex models, the posterior is often intractable 11 Bayes' theorem relates the probability of the Hypothesis (H) and of the Data (D): Pr( H | D ) Pr( D | H ) Pr( H ) Pr D http://en.wikipedia.org/wiki/Bayes’ theorem 12 2 Likelihood Information about μ contained in the data is represented in the likelihood function Bayesian Inference in a Nutshell Prior distribution p(θ) on parameters θ Likelihood of data y given parameter values f(y| θ) Bayes’ theorem to put it all together p ( | y ) Data: N(5, 0.4) f ( y | ) p () f ( y) The posterior distribution is proportional to the likelihood × prior distribution 13 14 PriorPrior distribution Distribution (1) Posterior Distribution Bayesian analysis requires specification of a prior distribution for μ, representing the knowledge about μ before observing the data The posterior distribution combines the information in the prior and the likelihood; it represents knowledge about μ after observing the data Data: N(5, 0.4) Data: N(5, 0.4) Prior: N(8, 0.25) Prior: N(8, 0.25) Posterior: N(6.8, 0.15) 15 16 An Uninformative (Vague) Prior Posterior Distribution Bayesian analysis requires specification of a prior distribution for μ, representing the knowledge about μ before observing the data The posterior distribution combines the information in the prior and the likelihood and represents knowledge about μ after observing the data Data: N(5, 0.4) Data: N(5, 0.4) Prior: N(5, 2) Prior: N(5, 2) Posterior: N(5, 0.33) 17 18 3 “…in realistically complex models, the posterior is often intractable…” Computational Issues in Bayesian Statistics What Does ‘Intractable’ Mean? p ( | y ) f ( y) f ( y | ) p () f ( y) f ( y | ) p () d In complex models, the posterior is often intractable (impossible to compute exactly) Solution: approximate posterior by simulation To calculate the posterior distribution we must integrate a function which is generally very complex Simulate many draws from posterior distribution Compute mode, median, mean, 95% interval et cetera from the simulated draws 19 Bayesian Statistics Reaches Parts that other Statistics Do Not Reach… Why Bayesian Statistics? Can do some things that cannot be done in classical statistics Valid in small samples Complex models with complex constraints Estimation including missing data Maximum Likelihood is not 20 “Asymptotically we are all dead …” (Novick) Always proper estimates No negative variances E.g. include a model for the missingness Each missing data point is just another parameter to estimate… Multiple Imputation (MI) of complex data Estimation of scores for latent variables Which are all missing… 21 22 Simulating the Posterior Distribution Why Bayesian Statistics? Any disadvantages except computational burden? Yes: Prior information introduces bias Markov Chain Monte Carlo (MCMC) Biased estimates But hopefully more precise “In a corner of the forest, Dwells alone my Hiawatha Permanently cogitating, On the normal law of error Wondering in idle moments, Whether an increased precision Might perhaps be rather better, Even at the risk of bias If thereby one, now and then, Could register upon the target” Given a draw from a specific probability distribution, MCMC produces a new pseudorandom draw from that distribution Kendall, 1959, ‘Hiawatha designs an experiment’ Gibbs sampling Metropolis-Hastings …then Repeat, Repeat, Repeat… Distributions typically multivariate (Italics mine) 23 24 4 MCMC Issues: Burn In Sequence of draws Z(1) Z(2) ,…, MCMC Issues: Burn In Z(t) From target distribution f(Z) Even if Z(1) not from f(Z), the distribution of Z(t) is f(Z), as t So, for arbitrary Z(1), if t is sufficiently large, Z(t) is from target distribution f(Z) MCMC must run t iterations ‘burn in’ before we reach target distribution f(Z) Diagnostics But having good starting values helps MCMC must run t iterations ‘burn in’ before we reach target distribution f(Z) How many iterations are needed to converge on the target distribution? Examine graph of burn in Try different starting values Run several chains in parallel 25 26 MCMC Issues: Monitoring How many iterations must be monitored? Summing Up Depends on required accuracy Problem: successive draws are correlated Diagnostics Probability Prior Posterior Informative prior Non-informative prior MCMC methods Graph successive draws Compute autocorrelations Raftery-Lewis: nhat = minimum for quantile Brooks-Draper: nhat = minimum for mean Degree of belief What is known before observing the data What is known after observing the data (prior + data) Tool to include subjective knowledge Try to express absence of prior knowledge Posterior mainly determined by data Simulation (sampling) techniques to obtain the posterior distribution and all posterior summary measures 27 Bayesian Methods in Current Software Example: Confirmatory Factor Analysis BUGS MLwiN Mplus 6.1 NORM, Amelia R packages TITLE: CFA using ML on Holzinger/Swineford data DATA: FILE IS "Grant.dat"; VARIABLE: NAMES ARE visperc cubes lozenges paragrap sentence wordmean gender; USEVARIABLES ARE visperc cubes lozenges paragrap sentence wordmean; ANALYSIS: TYPE IS GENERAL; ESTIMATOR IS ML; MODEL: spatial BY visperc@1 cubes lozenges; verbal BY paragrap@1 sentence wordmean; OUTPUT: sampstat standardized; Bayesian inference Using Gibbs Sampling Very general, user must set up model Special implementation for multilevel regression Very general Multiple Imputation LearnBayes, R2Winbugs, MCMCpack 28 29 30 5 Selected Output, ML Estimation Chi-Square Test of Model Fit Value Degrees of Freedom P-Value Loglikelihood H0 Value H1 Value Selected Output, ML Estimation Information Criteria Number of Free Parameters 19 Akaike (AIC) 5188.256 Bayesian (BIC) 5244.814 Sample-Size Adjusted BIC 5184.692 RMSEA (Root Mean Square Error Of Approximation) Estimate 0.000 90 Percent C.I. 0.000 0.046 Probability RMSEA <= .05 0.957 CFI/TLI CFI 1.000 TLI 1.026 SRMR (Standardized Root Mean Square Residual) Value 0.024 3.663 8 0.8862 -2575.128 -2573.297 31 Bayesian CFA, Minimalist Setup 32 Bayesian CFA, Selected Output TESTS OF MODEL FIT TITLE: CFA using Bayes on Holzinger/Swineford data DATA: FILE IS "Grant.dat"; VARIABLE: NAMES ARE visperc cubes lozenges paragrap sentence wordmean gender; USEVARIABLES ARE visperc cubes lozenges paragrap sentence wordmean; ANALYSIS: TYPE IS GENERAL; ESTIMATOR IS BAYES; Choose Bayes MODEL: spatial BY visperc@1 cubes lozenges; verbal BY paragrap@1 sentence wordmean; OUTPUT: sampstat standardized tech8; Iterations on screen PLOT: TYPE IS PLOT2; Plots to monitor convergence Bayesian Posterior Predictive Checking using Chi-Square 95% Confidence Interval for the Difference Between the Observed and the Replicated Chi-Square Values -23.549 22.553 Posterior Predictive P-Value 0.500 Proportion observed chi-squares larger than replicated Information Criterion Number of Free Parameters 19 Deviance (DIC) Bayesian AIC 5187.886 Estimated Number of Parameters (pD) 17.703 33 Assessing Convergence of MCMC Chain on Correct Distribution 34 Autocorrelation Plot Burn-in: Mplus deletes first half of chain Run multiple chains (Mplus default 2) PSR statistic compares variances within chains to variance between chains Must be close to 1 Graphical evaluation Plots of chain 35 36 6 Trace Plot, Two Chains Posterior Distribution, Kernel Plot 37 38 Some Mplus Commands for Bayesian Estimation Let’s analyze! POINT is mode, median, mean CHAINS is 2 (4 is nicer) STVALUES is unperturbed, perturbed, ML ALGORITHM is Gibbs, MH PROCESSORS is 1 (2 generally faster) THIN is 1 (use every #th iteration) BCONVERGENCE is 0.05 (PSR criterion) Make stricter if convergence seems a problem I prefer 0.01 for more precision (more stable) 39 40 An example where Bayesian analysis is an improvement ATLAS Example, cntd. The intervention program ATLAS (Adolescent Training and Learning to Avoid Steroids) was administered to high school football players to prevent use of anabolic steroids. Data are from 861 high school football players. Example from Bengt Muthén, Bayesian Analysis In Mplus: A Brief Introduction (Incomplete Draft, Version 3, May 17, 2010). Data from MacKinnon, D.P., Lockwood, C.M., & Williams, J. (2004). Confidence limits for the indirect effect: 41 Distribution of the product and resampling methods. Multivariate Behavioral Research, 39, 99-128. The indirect effect of the intervention on nutrition via perceived severity of using steroids is the focus. The ML estimate is 0.02 (.011), p = 0.056ns 42 7 ATLAS Example, cntd. ATLAS Example, cntd. Bayesian Posterior Predictive Checking using Chi-Square 95% Confidence Interval for the Difference Between the Observed and the Replicated Chi-Square Values -4.579 5.914 Posterior Predictive P-Value 0.583 MODEL RESULTS (default settings) Posterior One-Tailed 95% C.I. Estimate S.D. P-Value Lower 2.5% Upper 2.5% New/Additional Parameters INDIRECT 0.018 0.012 0.010 0.002 0.045 MODEL RESULTS Posterior Estimate S.D. New/Additional Parameters INDIRECT 0.018 0.012 MODEL RESULTS: chains = 4; bconvergence = 0.01; Posterior One-Tailed 95% C.I. Estimate S.D. P-Value Lower 2.5% Upper 2.5% New/Additional Parameters INDIRECT 0.019 0.011 0.011 0.002 0.043 One-Tailed 95% C.I. P-Value Lower 2.5% Upper 2.5% 0.010 0.002 0.045 43 ATLAS Example, cntd. 44 ATLAS Example, setup Why is the estimate and a Because theBayes indirect effect significant does not have the ML estimate not? nice symmetric distribution! TITLE: Mediation using Bayesian analysis ATLAS data from MacKinnon et al.; DATA: file = mbr2004atlas.dat; VARIABLE: names = obs group severity nutrit; usevariables = group - nutrit; ANALYSIS: estimator = bayes; processors = 2; chains=4; bconvergence=0.01; MODEL: severity on group (a); nutrit on severity (b) group; MODEL CONSTRAINT: With Bayes currently no new(indirect); INDIRECT or VIA command indirect = a*b; OUTPUT: tech8 standardized; PLOT: type = plot2; 45 46 Prior Normal =0, 2=very large = 1010 How about the prior? Default priors in Mplus 6 Most software uses by default uninformative or vague priors (but all priors add some information) 47 48 8 Prior Normal =0, 2=very large = 1010 Prior Normal =0, 2=very large = 1010 Default priors in Mplus 6 Default priors in Mplus 6 49 50 Prior is Inverse Gamma (shape)=-1, (scale)=0 Default priors in Mplus 6 Let’s analyze! 51 Path Diagram CFA Sibling Data Two-level CFA, Sibling Data 37 families, 187 children Scores on 6 intelligence tests Multilevel structure: children nested in families 52 Between Example from Hox Multilevel Analysis, 1st Ed., 2002 Problematic because of small family level sample size Source: Van Peet, A.A.J. (1992). De potentieeltheorie van intelligentie. [The potentiality theory of intelligence] Amsterdam: University of Amsterdam, Unpublished Ph.D. Thesis 53 Within 54 9 ML estimation 2-level CFA ML estimation 2-level CFA MODEL RESULTS TITLE: two level factor analysis Van Peet data, using ML estimators; DATA: FILE IS ggkind.dat; VARIABLE: NAMES ARE famnr wordlist cards figures matrices animals occup; CLUSTER IS famnr; ANALYSIS: TYPE IS TWOLEVEL; ESTIMATOR IS ML; MODEL: %BETWEEN% general by wordlist cards figures matrices animals occup; %WITHIN% numeric by wordlist cards matrices; percept by occup animals figures; OUTPUT: SAMPSTAT STANDARDIZED; Residual Variances WORDLIST CARDS FIGURES MATRICES ANIMALS OCCUP Estimate S.E. Est./S.E. 1.598 3.871 2.315 -0.160 1.085 5.705 1.323 1.769 1.496 0.673 1.400 1.988 1.208 2.188 1.548 -0.237 0.775 2.870 Two-Tailed P-Value 0.227 0.029 0.122 0.813 0.438 0.004 55 Bayes estimation 2-level CFA 56 Bayes estimation 2-level CFA MODEL RESULTS TITLE: two level factor analysis Van Peet data, using ML estimators; DATA: FILE IS ggkind.dat; VARIABLE: NAMES ARE famnr wordlist cards figures matrices animals occup; CLUSTER IS famnr; ANALYSIS: TYPE IS TWOLEVEL; ESTIMATOR IS Bayes; STVALUES=ML; MODEL: %BETWEEN% general by wordlist cards figures matrices animals occup; %WITHIN% numeric by wordlist cards matrices; percept by occup animals figures; OUTPUT: SAMPSTAT STANDARDIZED; Residual Variances WORDLIST CARDS FIGURES MATRICES ANIMALS OCCUP Estimate Posterior One-Tailed S.D. P-Value 4.618 3.991 2.150 0.712 3.104 6.389 2.235 2.369 1.625 0.669 1.891 2.511 0.000 0.000 0.000 0.000 0.000 0.000 95% C.I. L 2.5% U2.5% 1.322 9.820 0.434 9.468 0.201 6.299 0.101 2.604 0.676 7.825 2.888 12.652 57 Specifying your own prior Examples with different priors Informative priors are used to incorporate prior knowledge Flexible in Mplus, but (Win)Bugs is more flexible and offers more tools Tihomir Asparouhov and Bengt Muthen Bayesian Analysis of Latent Variable Models using Mplus. (Version 3) August 11, 2010 Angels fear to tread here… (= be careful) In models with small variance parameters and small samples the posterior is often sensitive to choice of prior 58 File BayesAdvantages18.pdf on www.statmodel.com Especially the variance estimates Do sensitivity analysis (try different priors) Evans, Hastings, & Peacock (2000). Statistical Distributions. New York: Wiley 59 60 10 Specifying your own prior Specifying your own prior Parameters must be labeled Priors specified via these labels MODEL: F by y1-y4* (p1-p4); F@1; MODEL PRIORS: p1-p4 ~ N(0,5); Use same distribution, but give it different shape to represent prior information Parameters must be labeled Priors specified via these labels MODEL: F by y1-y4; F (var); MODEL PRIORS: var ~ IG(0.001,0.001); Use same distribution, but give it different shape to represent prior information 61 IG(.001,.001) is BUGS default variance prior 62 11 Important Distinctions Multiple Imputation in Mplus using Bayesian Estimation Missing Completely At Random (MCAR) Missing At Random (MAR) Mplus Users Group 27 October 2010 missing values are a random sample of all values (not related to any observed or unobserved variable) missing values are a random sample of all values conditional on other observed variables (not related to unobserved value, but related to other observed variables) Not Missing At Random (NMAR) missingness is related to unobserved (missing) value (Litle & Rubin, 1987, p14) mcmc Mplus Approaches to Incomplete Data Consequences for Analysis Missing Completely At Random (MCAR) Missing At Random (MAR) • Sensitive to Not Missing At Random (NMAR) misspecification • Requires ML MCAR or MAR: Ignorable Full Information ML estimation WLS estimation Bayesian estimation (missing data estimated) Any estimation method on Multiply Imputed data construct a model for both the observed data and the missingness process (difficult) 3 Single versus Multiple Imputation Fine, assumes MAR Fine, assumes MAR Mplus can combine results automatically Mplus 6.* can generate MI datasets 4 Multiple Imputation: Imputation Step Imputation = fill the holes in the data Fine, assumes MCAR usually with best possible estimate followed by standard analysis overestimates sample size, thus underestimates error Multiple Imputation (MI) = do this m times ? with randomly chosen estimate from distribution of possible estimates followed by m standard analyses the m outcomes are then combined the variation of the m imputations restores the error ! ? ? ! ? ? ! ! Var 1 … p Case 1 … n Fine, assumes MAR May require numerical integration (slow/impossible) analyze all observed data and ignore missing data MAR is ignorable if proper model & estimation is used NMAR: Nonignorable/Informative 2 ! ! ! ! ... ! Create m different imputed data sets 5 ! ! ! ! ! ! 6 1 Multiple Imputation: Analysis Step How can we Create Imputations? ! ! ! ! ! ! ! ! ! ... Parametric method Mplus approach specify a model for complete data for each missing data point: ! ! ! ! ! ! Nonparametric method Do standard complete data analysis m times group similar cases into adjustment cells for each missing data point Combine results Parametric method Mplus approach specify a model for complete data for each missing data point: 8 How Many Imputations? estimate predictive distribution of the missing data impute with a random value from this distribution Thus Bayesian estimation is used An estimator based on m < imputations has efficiency 1 1 m with = fraction missing information missing values are multiple times imputed by taking random draws from their posterior distribution Which model? collect non-missing cases from same adjustment cell impute with value from randomly selected non-missing case 7 How can we Create Imputations? estimate predictive distribution of the missing data impute with a random value from this distribution Mplus 6.1 default is full covariance matrix Mplus 6.0 default is H0 model specified note that fraction missing data 9 10 How Many Imputations? About 5 is Often Enough! Example of Missing Data Analysis m .1 .3 3 97 91 86 81 77 GPA data (Hox, 2002, 2010) 200 College students GPA measured on 6 occasions Time varying covariate: Job 5 98 94 91 88 85 No of hours worked in off-campus job Time invariant covariates: HighGPA, Sex 10 99 97 95 93 92 Variant with GPA variables incomplete 20 100 99 98 97 96 .5 .7 .9 Graham JW, Olchowski AE, Gilreath TD. (2007) How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, 8, 206-213. Advise m > 10 MAR, missingness depends on previous GPA measure Artificial data, SPSS files GPA2 and GPA2Mis 11 12 2 Latent Curve Model for GPA Mplus Setup, Incomplete Data TITLE: Growth model for incomplete GPA data, Hox 2010 DATA: FILE IS "gpa2mis.dat"; VARIABLE: NAMES ARE student sex highgpa gpa1 - gpa6; USEVARIABLES ARE gpa1 - gpa6; MISSING ARE gpa1 – gpa6 (9); ANALYSIS: ! TYPE IS MISSING; ESTIMATOR IS ML; MODEL: interc slope | gpa1@0 gpa2@1 gpa3@2 gpa4@3 gpa5@4 gpa6@5; OUTPUT: SAMPSTAT STANDARDIZED; 13 Mplus Setup, Incomplete Data TITLE: Growth model for incomplete GPA data, Hox 2010 DATA: FILE IS "gpa2mis.dat"; VARIABLE: NAMES ARE student sex highgpa gpa1 - gpa6; USEVARIABLES ARE gpa1 - gpa6; MISSING ARE gpa1 – gpa6 (9); ANALYSIS: Estimator is Bayes; Processors = 2; Bconvergence = 0.01; MODEL: interc slope | gpa1@0 gpa2@1 gpa3@2 gpa4@3 gpa5@4 gpa6@5; OUTPUT: TECH8 STANDARDIZED; PLOT: TYPE IS PLOT2; 15 Mplus Setup, Incomplete Data Multiple Imputation of 10 Data Sets TITLE: Growth model for incomplete GPA data, Hox 2010 DATA: FILE IS "gpa2mis.dat"; No Model: VARIABLE: imputation from NAMES ARE student sex highgpa gpa1 - gpa6; saturated model USEVARIABLES ARE gpa1 - gpa6; MISSING ARE gpa1 – gpa6 (9); ANALYSIS: Estimator is Bayes; Processors = 2; Bconvergence = 0.01; TYPE = Basic; DATA IMPUTATION: Ndatasets=10; Impute gpa1 - gpa6; Save = gpa2imp*.dat; 14 Estimated Means and Variances Intercept Mean Intercept Variance Slope Mean Slope Variance Complete ML 2.60 Incomplete Incomplete ML Bayes 2.61 2.61 0.04 0.03 0.04 0.11 0.10 0.10 0.003 0.004 0.005 16 Mplus Setup, Incomplete Data Multiple Imputation of 10 Data Sets TITLE: Growth model for incomplete GPA data, Hox 2010 DATA: FILE IS "gpa2mis.dat"; Imputation model must Imputation from VARIABLE: be at least as complex NAMES ARE student sex highgpa gpa1 - gpa6; specified model as the analysis model! USEVARIABLES ARE gpa1 - gpa6; MISSING ARE gpa1 – gpa6 (9); ANALYSIS: Estimator is Bayes; Processors = 2; Bconvergence = 0.01; MODEL: interc slope | gpa1@0 gpa2@1 gpa3@2 gpa4@3 gpa5@4 gpa6@5; DATA IMPUTATION: Ndatasets=10; Impute gpa1 - gpa6; Save = gpa2imp*.dat; OUTPUT: TECH8 STANDARDIZED; PLOT: TYPE IS PLOT2; 17 OUTPUT: TECH8 STANDARDIZED; PLOT: TYPE IS PLOT2; 18 3 Mplus Setup, Incomplete Data Multiple Imputation Analysis Estimated Means and Variances Analysis of 10 MI datasets TITLE: Growth model for incomplete GPA data, Hox 2010 DATA: FILE IS "gpa2implist.dat"; TYPE = Imputation; VARIABLE: NAMES ARE gpa1 - gpa6; ANALYSIS: ESTIMATOR = ML; MODEL: interc slope | gpa1@0 gpa2@1 gpa3@2 gpa4@3 gpa5@4 gpa6@5; OUTPUT: SAMPSTAT STANDARDIZED; 19 Intercept Mean Intercept Variance Slope Mean Slope Variance Multiple Imputation versus Likelihood Based Procedures Incomplete Incomplete Incomplete ML Bayes MI (ML) 2.61 2.61 2.61 0.03 0.04 0.03 0.10 0.10 0.10 0.004 0.005 0.004 20 Background Reading Incomplete Data ML procedures + efficient - model specific - complicated + if ML estimation slow/impossible use Bayes MI procedures + general, uses standard complete data techniques (which need not be Likelihood-based) + possible to use auxiliary data in Imputation step - complicated 21 McKnight, P.E., McKnight, K.M., Sidani, S. & Figueredo, A.J. (2007). Missing data, a gentle introduction. London: Guilford Press. Very nice Schafer, J.L., & Graham, J.W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147–177. Very readable Schafer, J.L. (1997). Analysis of incomplete multivariate data. London: Chapman & Hall. Great, but very technical 22 4