Workshop On Multivariate Analysis Today Programme and Book of
Transcription
Workshop On Multivariate Analysis Today Programme and Book of
WOMAT Workshop On Multivariate Analysis Today Programme and Book of abstracts Scientific organisers: Frank Critchley (OU), Bing Li (Penn State), Hannu Oja (Turku) Local organisers: Sara Griffin, Tracy Johns, Radka Sabolova, Germain Van Bever Contents Programme 3 Talk abstracts 5 Yanyuan Ma: A Validated Information Criterion (VIC) to Find the Structural Dimension . . . . 5 Jo˜ao Branco: High dimensionality: the trouble with Mahalanobis distance . . . . . . . . . . . . . 5 Tim Cannings: Random projection ensemble classification . . . . . . . . . . . . . . . . . . . . . . 5 Kjersti Aas: Pair-copula constructions–even more flexible than copulas . . . . . . . . . . . . . . . 6 Sara Fontanella: A Bayesian approach to sparse latent variables modelling: Factor Analysis and Multidimensional Item Response Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Shahin Tavakoli: Dynamics of DNA Minicircles in Motion via Fourier Analysis of Functional Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Lutz Duembgen: New Algorithms for M -Estimation of Multivariate Scatter and Location . . . . 8 Jim Smith: Chain event graphs for discrete multivariate processes . . . . . . . . . . . . . . . . . 8 John Kent: Some new perspectives on partial least squares . . . . . . . . . . . . . . . . . . . . . 8 Poster abstracts 10 Comparison of statistical methods for multivariate outliers detection . . . . . . . . . . . . . . . . 10 On point estimation of the abnormality of a Mahalanobis distance . . . . . . . . . . . . . . . . . 11 Sparse Linear Discriminant Analysis with Common Principal Components . . . . . . . . . . . . . 12 Recovering Fisher linear discriminant subspace by Invariate Coordinate Selection . . . . . . . . . 13 Hilbertian Fourth Order Blind Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2 Programme 9:00 Yanyuan Ma (University of South Carolina): A Validated Information Criterion to Find the Structural Dimension 9:30 Jo˜ao Branco (CEMAT, IST, Lisbon): High dimensionality: the trouble with Mahalanobis distance 10:00 Tim Cannings (Cambridge): Random projection ensemble classification 10:30 Coffee & Poster Session 11:00 Kjersti Aas (Norwegian Computing Centre): Pair-copula constructions– even more flexible than copulas 11:30 Sara Fontanella (The Open University): A Bayesian approach to sparse latent variables modelling: Factor Analysis and Multidimensional Item Response Theory 12:00 Shahin Tavakoli (Cambridge): Dynamics of DNA Minicircles in Motion via Fourier Analysis of Functional Time Series 12:30 Lutz Duembgen (Bern): New Algorithms for M -Estimation of Multivariate Scatter and Location 13:00 Lunch and Poster Session 14:30 Jim Smith (Warwick): Chain event graphs for discrete multivariate processes 15:00 John Kent (Leeds): Some new perspectives on partial least squares 15:30 Roundtable Discussion 16:00 Tea and Departures 3 Talk abstracts A Validated Information Criterion to Find the Structural Dimension Yanyuan Ma, University of South Carolina E-mail: yanyuanma@stat.sc.edu A crucial component in performing sufficient dimension reduction is to determine the structural dimension of the reduction model. We propose a novel information criterion-based method to achieve this purpose, whose special feature is that when examining the goodness-of-fit of the current model, we need to obtain model evaluation by using an enlarged candidate model. Although the procedure does not require estimation under the enlarged model with dimension k + 1, the decision on how well the current model with dimension k fits relies on the validation provided by the enlarged model. This leads to the name validated information criterion, calculated as VIC(k). The method is different from existing information criteria based model selection methods. It breaks free from the dependence on the connection between dimension reduction models and their corresponding matrix eigenstructures, which heavily relies on a linearity condition that we no longer assume. Its consistency is proved and its finite sample performance is demonstrated numerically. (Joint work with Xinyu Zhang.) High dimensionality: the trouble with Mahalanobis distance Jo˜ao Branco, Ana M. Pires, CEMAT, Instituto Superior Tecnico, Lisbon The recent massive production of high-dimensional data has brought great difficulties and concomitant challenges to statistics since its usual methods were not designed to cope with such kind of data. High dimensionality triggers the curse of dimensionality and unexpected behaviour of some statistical tools may surprise even those aware of the intricacies of multidimensional spaces with a large number of dimensions. We look at the Mahalanobis distance, a tool that is crucial to the functioning of the traditional multivariate statistical methods, and see how it progresses as p approaches n and when it is greater than n. Can the Mahalanobis distance keep the fundamental role in high-dimensional spaces as it does in low dimensional spaces (p n)? And if it does not what are the consequences? We will attempt to answer these questions. Random projection ensemble classification Timothy I. Cannings and Richard J. Samworth, Statistical Laboratory, University of Cambridge We introduce a very general method for high-dimensional classification, based on careful combination of the results of applying an arbitrary base classifier to random projections of the feature vectors into a lower-dimensional space. In one special case presented here, the random projections are divided into nonoverlapping blocks, and within each block we select the projection yielding the smallest estimate of the test error. Our random projection ensemble classifier then aggregates the results of applying the base classifier on the selected projections, with a data-driven voting threshold to determine the final assignment. We provide theoretical understanding to justify the methodology, and a simulation comparison with several other popular high-dimensional classifiers reveals its excellent finite-sample performance. 5 Pair-copula constructions–even more flexible than copulas Kjersti Aas, Norwegian Computing Centre A copula is a multivariate distribution with standard uniform marginal distributions. While the literature on copulas is substantial, most of the research is still limited to the bivariate case. However, some years ago hierarchical copula-based structures were proposed as an alternative to the standard copula methodology. One of the most promising of these structures is the pair-copula construction (PCC). The PCC modeling scheme is based on a decomposition of a multivariate density into a cascade of pair copulae, applied on original variables and on their conditional and unconditional distribution functions. Each pair copula can be chosen arbitrarily and the full model exhibit complex dependence patterns such as asymmetry and tail dependence. In this talk I will give an introduction to pair-copula constructions and apply the methodology to a 19-dimensional financial data set. A Bayesian approach to sparse latent variables modelling: Factor Analysis and Multidimensional Item Response Theory Sara Fontanella, N. Trendafilov, P. Valentini, L. Fontanella In the last decades, sparse modeling has inspired many studies in different research fields, such as statistics, machine learning and bioinformatics. Its importance is due to the following main advantages: first, it enhances the interpretability of the results; second, it reflects reality, as any real-world system is sparse and third, predictive performance is improved, since the sparsity helps prevent overfitting. In this work, we consider sparse modeling in the context of two multivariate statistical techniques: Factor Analysis (FA) and Multidimensional Item Response Theory (MIRT). They are strongly related to each other in terms of modeling despite the different types of data they are applied to. FA is a well-known model-based multivariate technique used to describe observed continuous variables by means of a smaller set of latent factors. Item response theory (IRT) models the probability for a correct response (to a test, questionnaire, etc) as function of disjoint sets of parameters, related respectively to the person and the item. MIRT is its multidimensional extension. Both FA and MIRT suffer from solution/factor indeterminacy. In particular, the main issue to be addressed is the rotational invariance of the final solution: for a given set of data, any orthogonal transformation of the matrix of parameters would produce the same covariance structure. In this context, we show that the sparsity plays a double role: on one side it improves the interpretability of the results, while, on the other side, it allows to overcome the rotational indeterminacy. To this end, we follow a Bayesian approach to sparse modeling. The prior belief in sparsity is modeled by a sparse-inducing prior distribution on the parameters. In this context, a popular choice is to apply spike and slab priors, which present several computational advantages. A spike and slab prior assumes that the parameters of interest are mutually independent with a two-point mixture distribution made up of a degenerate distribution at zero (the spike), to provide strong shrinkage near zero and a uniform flat distribution (the slab), to allow signals to escape strong shrinkage. The performances of the considered methods are evaluated through simulation studies. 6 Dynamics of DNA Minicircles in Motion via Fourier Analysis of Functional Time Series Shahin Tavakoli, Statslab, University of Cambridge We consider the problem of studying the dynamics of DNA minicircles that are vibrating in solution. At a large scale, DNA minicircles are modelled as elastic rods, and the problem of understanding their dynamics can be recasted into the problem of estimating the second order structure of a stationary functional time series (FTS). We tackle this problem by a frequency domain approach, where we estimate the spectral density operators (or spectra) of the DNA minicircle. We then carry out hypothesis tests to compare the spectra of two specific DNA minicircles. The comparison is broken down to a hierarchy of stages: at a global level, we compare the spectral density operators of the two DNA minicircles, across frequencies and curvelength, based on a Hilbert-Schmidt criterion; then, we localize any differences to specific frequencies; and, finally, we further localize any differences along the length of the DNA minicircles, i.e. in physical space. A hierarchical multiple testing approach guarantees control of the averaged false discovery rate over the selected frequencies. In this sense, we are able to attribute any differences to distinct dynamic (frequency) and spatial (curvelength) contributions. Keywords. Functional Data Analysis; Spectral Analysis; DNA Minicircle; Molecular Dynamics; Multiple Testing. 7 New Algorithms for M -Estimation of Multivariate Scatter and Location Lutz Duembgen, Bern We present new algorithms for M-estimators of multivariate scatter and location and for symmetrized M-estimators of multivariate scatter. The new algorithms are considerably faster than currently used fixed-point and other algorithms. The main idea is to utilize local parametrizations of scatter matrices via matrix exponentials with a corresponding second order Taylor expansion of the target functional and to devise a partial Newton-Raphson procedure. In connection with symmetrized M-estimators we work with incomplete U-statistics to accelerate our procedures initially. This talk is based on joint work with Klaus Nordhausen (Turku) and Heike Schuhmacher (Bern). Chain event graphs for discrete multivariate processes Jim Smith, Warwick Statistical models of multivariate discrete processes often need to express various hypotheses about how events might unfold and associated hypotheses about the symmetries within these unfoldings. A natural way to express such hypotheses is via a statistical model on a finite set of atoms structured around collections of different probability trees with different symmetries. One such family is the class of Chain Event Graphs. This family contains the class of discrete Bayes Nets as a very special case. It can be shown that most inferential techniques used for Bayesian Networks readily translate to this new family because of thier modular form. Furthermore because different models in the class can be associated with families of polynomials, the inferential implications of one hypothesis against another can be elegantly analysed. In this talk I will present some recent results associated with CEGs and the challenges they bring to effective model choice. This is joint work with two PhD students, Christiane Gorgen and Rodrigo Collazo. Some new perspectives on partial least squares John Kent, Department of Statistics, University of Leeds Partial least squares a regularization technique in high-dimensional multiple regression analysis. It has sometimes had a somewhat dubious reputation in mainstream statistics. Part of the reason seems to be that the methodology was originally proposed in terms of an algorithm, and only later was it noticed that it can be viewed as an attempt to fit a particular statistical model, the Krylov model. In this talk we describe how the Krylov model can be formulated most simply in the setting of inverse regression and how the PLS estimator can be viewed as an approximate MLE for this model. We then describe some comparisons with the exact MLE under this model. 8 Poster abstracts Comparison of statistical methods for multivariate outliers detection Aurore Archimbaud1 , Klaus Nordhausen2 & Anne Ruiz-Gazen1 1 2 Gremaq (TSE), Universit´e Toulouse 1 Capitole, E-mail: aurore.archimbaud@ut-capitole.fr anne.ruiz-gazen@tse-fr.eu Department of Mathematics and Statistics, University of Turku, E-mail: klaus.nordhausen@utu.fi In this poster, we are interested in detecting outliers, like for example manufacturing defects, in multivariate numerical data sets. Several non-supervised methods that are based on robust and non-robust covariance matrix estimators exist in the statistical literature. Our first aim is to exhibit the links between three outliers detection methods: the Invariant Coordinate Selection method as proposed by Caussinus and Ruiz-Gazen (1993) and generalized by Tyler et al. (2009), the method based on the Mahalanobis distance as detailed in Rousseeuw and Van Zomeren (1990), and the robust Principal Component Analysis (PCA) method with its diagnostic plot as proposed by Hubert et al. (2005). Caussinus and Ruiz-Gazen (1993) proposed a Generalized PCA which diagonalizes a scatter matrix relative to another: V1 V2−1 where V2 is a more robust covariance estimator than V1 , the usual empirical covariance estimator. These authors compute scores by projecting V2−1 -orthogonally all the observations on some of the components and high scores are associated with potential outliers. We note that computing euclidean distances between observations using all the components is equivalent to the computation of robust Mahalanobis distances according to the matrix V2 using the initial data. Tyler et al. (2009) generalized this method and called it Invariant Coordinate Selection (ICS). Contrary to Caussinus and Ruiz-Gazen (1993), they diagonalize V1−1 V2 which leads to the same eigen elements but to different scores that are proportional to each other. As explained in Tyler et al. (2009), the method is equivalent to a robust PCA with a scatter matrix V2 after making the data spherical using V1 . However, the euclidean distances between observations based on all the components of ICS corresponds now to Mahalanobis distances according to V1 and not to V2 . Note that each of the three methods leads to a score for each observation and high scores are associated with potential outliers. We compare the three methods on some simulated and real data sets and show in particular that the ICS method is the only method that permits a selection of the relevant components for detecting outliers. Keywords. Invariant Coordinate Selection; Mahalanobis distance; robust PCA. Bibliography [1] Caussinus, H. and Ruiz-Gazen, A. (1993), Projection pursuit and generalized principal component analysis, In New Directions in Statistical Data Analysis and Robustness (eds S. Morgenthaler, E. Ronchetti and W. A. Stahel), 35–46, Basel: Birkh¨auser. [2] Hubert, M., Rousseeuw, P. J. and Vanden Branden, K. (2005), ROBPCA: a new approach to robust principal component analysis, Technometrics, 47(1), 64–79. [3] Rousseeuw, P. J. and Van Zomeren, B. C. (1990), Unmasking multivariate outliers and leverage points, Journal of the American Statistical Association, 85(411), 633–639. [4] Tyler, D. E., Critchley, F., D¨ umbgen, L. and Oja, H. (2009), Invariant coordinate selection, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(3), 549–592. 10 On point estimation of the abnormality of a Mahalanobis distance Fadlalla G. Elfadaly1 , Paul H. Garthwaite1 & John R. Crawford2 1 The Open University University of Aberdeen Email: Fadlalla.Elfadaly@open.ac.uk 2 When a patient appears to have unusual symptoms, measurements or test scores, the degree to which this patient is unusual becomes of interest. For example, clinical neuropsychologists sometimes need to assess how a patient with some brain disorder or a head injury is different from the general population or some particular subpopulation. This is usually based on the patient’s scores in a set of tests that measure different abilities. Then, the question is “What proportion of the population would give a set of test scores as extreme as that of the patient?” The abnormality of the patient’s profile of scores is expressed in terms of the Mahalanobis distance between his profile and the average profile of the normative population. The degree to which the patient’s profile is unusual can then be equated to the proportion of the population who would have a larger Mahalanobis distance than the individual. This presentation will focus on forming an estimator of this proportion using a normative sample. The estimators that are examined include plugin maximum likelihood estimators, medians, the posterior mean from a Bayesian probability matching prior, an estimator derived from a Taylor expansion, and two forms of polynomial approximation, one based on Bernstein polynomial and one on a quadrature method. Simulations show that some estimators, including the commonly-used plug-in maximum likelihood estimators, can have substantial bias for small or moderate sample sizes. The polynomial approximations yield estimators that have low bias, with the quadrature method marginally to be preferred over Bernstein polynomials. Moreover, simulations of the median estimators have a nearly zero median error. This latter estimator has much to recommend it when unbiasedness is not of paramount importance, while the quadrature method is recommended when bias is the dominant issue. Keywords. Bernstein polynomials; Mahalanobis distance; median estimator; quadrature approximation; unbiased estimation. 11 Sparse Linear Discriminant Analysis with Common Principal Components Tsegay G. Gebru & Nickolay T. Trendafilov Department of Mathematics and Statistics, The Open University, UK Linear discriminant analysis (LDA) is a commonly used method for classifying a new observation into one of g-populations. However, in high-dimensional classification problems the classical LDA has poor performance. When the number of variables is much larger than the number of observations, the withingroup covariance matrix is singular which leads to unstable results. In addition, the large number of input variables needs considerable reduction which nowadays is addressed by producing sparse discriminant functions. Here, we propose a method to tackle the (low-sample) high-dimensional discrimination problem by using common principal components (CPC). LDA based on CPC is a general approach to the problem because it does not need the assumption of equal covariance matrix in each groups. We find sparse CPCs by modifying the stepwise estimation method proposed by Trendafilov (2010). Our aim is to find few important spare discriminant vectors which are easily interpretable. For numerical illustrations, the method is applied on some known real data sets and compared to other methods for sparse LDA. Bibliography [1] Trendafilov, N.T. Stepwise estimation of common principal components. Computational Statistics and Data Analysis 54:3446-3457, 2010. 12 Recovering Fisher linear discriminant subspace by Invariate Coordinate Selection Radka Sabolov´a1,2 , H. Oja3 , G. Van Bever1 & F. Critchley1 . 1 MCT Faculty, The Open University, Milton Keynes 2 Email: radka.sabolova@open.ac.uk 3 Turku University It is a remarkable fact that, using any pair of scatter matrices, invariant coordinate selection (ICS) can recover the Fisher linear discriminant subspace without knowing group membership, see [5]. The subspace is found by using two different scatter matrices S1 and S2 and joint eigendecomposition of one scatter matrix relative to another. In this poster, we focus on the two group normal subpopulation problem and discuss the optimal choice of such a pair of scatter matrices in terms of asymptotic accuracy of recovery. The first matrix is fixed as the covariance matrix while the second one is chosen within a one-parameter family based on powers of squared Mahalanobis distance, indexed by α ∈ R. Special cases of this approach include Fourth Order Blind Identification (FOBI, see [1]) and Principal Axis Analysis (PAA, see [4]). The use of two scatter matrices in discrimination was studied by [2] and later elaborated in [3], who proposed generalised PCA (GPCA) based on a family of scatter matrices with decreasing weight functions of a single real parameter β > 0. They then discussed appropriate choice of β, while concentrating on outlier detection. Their form of weight function and the consequent restriction to β > 0 implies downweighting outliers. On the other hand, in our approach, considering any α ∈ R allows us also to upweight outliers. Further, we may, in addition to the outlier case, study mixtures of subpopulations. Theoretical results are underpinned by an extensive numerical study. The UK-based authors thank the EPSRC for their support under grant EP/L010429/1. Bibliography [1] Cardoso, J.-F. Source Separation Using Higher Moments Proceedings of IEEE international conference on acoustics, speech and signal processing 2109-2112. [2] Caussinus, H. and Ruiz-Gazen, A. Projection pursuit and generalized principal component analyses New direction in Statistical Data Analysis and Robustness 35-46. [3] Caussinus, H., Fekri, M., Hakam, S. and Ruiz-Gazen, A. A monitoring display of multivariate outliers Computational Statistics & Data Analysis, 2003, 44, 237–252. [4] Critchley, F., Pires, A. and Amado, C. Principal Axis Analysis technical report, Open University, 2006. [5] Tyler, D., Critchley, F., Dumbgen, L. and Oja, H. Invariant Co-ordinate Selection J. R. Statist. Soc. B., 2009, 71, 549–592. 13 Hilbertian Fourth Order Blind Identification Germain Van Bever1,2 , B. Li3 , H. Oja4 , R. Sabolov´a1 & F. Critchley1 . 1 MCT Faculty, The Open University, Milton Keynes 2 Email: germain.van-bever@open.ac.uk 3 Penn State University 4 Turku University In the classical Independent Component (IC) model, the observations X1 , · · · , Xn are assumed to satisfy Xi = ΩZi , i = 1, . . . , n, where the Zi ’s are i.i.d. random vectors with independent marginals and Ω is the mixing matrix. Independent component analysis (ICA) encompasses the set of all methods aiming at unmixing X = (X1 , . . . , Xn ), that is estimating a (non unique) unmixing matrix Γ such that ΓXi , i = 1, . . . , n, has independent components. Cardoso ([1]) introduced the celebrated Fourth Order Blind Identification (FOBI) procedure, in which an estimate of Γ is provided, based on the regular covariance matrix and a scatter matrix based on fourth moments. Building on robustness considerations and generalizing FOBI, Invariant Coordinate Selection (ICS, [2]) was originally introduced as an exploratory tool generating an affine invariant coordinate system. The obtained coordinates, however, are proved to be independent in most IC models. Nowadays, functional data (FD) are occurring more and more often in practice, and relatively few statistical techniques have been developed to analyze this type of data (see, for example [3]). Functional PCA is one such technique which focuses on dimension reduction with very little theoretical considerations. We propose an extension of the FOBI methodology to the case of Hilbertian data, FD being the go-to example used throughout. When dealing with distributions on Hilbert spaces, two major problems arise: (i) the scatter operator is, in general, non-invertible and (ii) there may not exist two different affine equivariant scatter functionals. Projections on finite dimensional subspaces and Karhunen-Lo`eve expansions are used to overcome these issues and provide an alternative to FPCA. More importantly, we show that the proposed construction is Fisher consistent for the independent components of an appropriate Hilbertian IC model and enjoy the affine invariance property. This work is supported by the EPSRC grant EP/L010429/1. Keywords. Invariant Coordinate Selection; Functional Data; Symmetric Component Analysis; Independent Component Analysis. Bibliography [1] Cardoso, J.-F. (1989), Source Separation Using Higher Moments Proceedings of IEEE international conference on acoustics, speech and signal processing 2109-2112. [2] Tyler, D., Critchley, F., Dumbgen, L. and Oja, H. (2009) Invariant Co-ordinate Selection J. R. Statist. Soc. B., 71, 549–592. [3] Ramsay, J. and Silverman, B.W. (2006) Functional Data Analysis 2nd edn. Springer, New York. 14