Learning orthogonal projections for Isomap
Transcription
Learning orthogonal projections for Isomap
Neurocomputing 103 (2013) 149–154 Contents lists available at SciVerse ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Learning orthogonal projections for Isomap Yali Zheng a, Bin Fang a,n, Yuan Yan Tang b, Taiping Zhang a, Ruizong Liu a a b School of Computer Science, Chongqing University, Chongqing, China University of Macau, Macau, China a r t i c l e i n f o abstract Article history: Received 14 April 2012 Received in revised form 11 September 2012 Accepted 16 September 2012 Communicated by Liang Wang Available online 23 October 2012 We propose a dimensionality reduction technique in this paper, named Orthogonal Isometric Projection (OIP). In contrast with Isomap, which learns the low-dimension embedding, and solves problem under the classic Multidimensional Scaling (MDS) framework, we consider an explicit linear projection by capturing the geodesic distance, which is able to handle new data straightforward, and leads to a standard eigenvalue problem. We consider the orthogonal projection, and analyze the properties of orthogonal projection, and demonstrate the benefits, in which Euclidean distance, and angle at each pair in high-dimensional space are equivalent to ones in low-dimension, such that both global and local geometric structure are preserved. Numerical experiments are reported to demonstrate the performance of OIP by comparing with a few competing methods over different datasets. & 2012 Elsevier B.V. All rights reserved. Keywords: Manifold learning Isomap Orthogonal projection 1. Introduction The reason that researchers are interested in reducing dimensions of digital data is the existence of digital information redundancy. Extracting efficient features from data can improve object classification and recognition, and simplify the visualization of data. One main category of the dimensionality reduction techniques is data embedding, which is seeking for a representation of data in low dimension to benefit the data analysis, such as Multidimensional Scaling (MDS) [1] and Isomap [4]. MDS is a classic data embedding technique, and considers preserving the pairwise distance to obtain the low dimension configuration. The other is the projection method, which is explicitly learning linear or nonlinear projections between high dimension and low dimension over training sets, to reduce the dimensionality of the coming data by applying projections. Principal component analysis (PCA) can be used as a projection method, which learns a linear projection by maximizing the variance of data in low dimension. PCA is identical to Classical Multidimensional Scaling if Euclidean distance is used [1], but PCA learns the projection. Linear Discriminant Analysis (LDA) maximizes the ratio of between-class variance to the within-class variance to determine an explicit projection as well. From the contemporary point of view, Isomap and Locally Linear Embedding (LLE) [3] are pioneers of manifold learning methods. Manifold learning theory was introduced into dimensionality reduction field in early 20 century, which assumed that a low-dimension manifold is embedded in high-dimension data. Roweis and Saul [3] proposed a nonlinear dimensionality reduction method (LLE), which n Corresponding author. Tel.: þ86 23 65112784. E-mail address: fb@cqu.edu.cn (B. Fang). 0925-2312/$ - see front matter & 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.neucom.2012.09.015 aimed at preserving the same local configuration of each neighborhood in low dimensional space as in high dimensional space. He and Niyogi [2] found an optimal linear approximations to eigenfunctions of the nonlinear Laplacian Eigenmap, called Local Preserving Projection (LPP), and gave the justification in the paper with the Graph Laplacian theory. Cai et al. [5] proposed a variation of Laplacianface (LPP)—Orthogonal Laplacianface (OLPP), which iteratively computed the orthogonal eigenvectors to compose the projection matrix. Kokiopoulou and Saad [6] analyzed and compared LPP and OLPP, and proposed an Orthogonal Neighborhood Preserving Projections (ONPP). It can be thought of as an orthogonal version of LLE, but projections are learned explicitly as a standard eigenvalue problem. However, on one hand, manifold learning was doubted by researchers since being introduced into dimensionality reduction [15] from topological stability. On the other hand, researchers pay lots of attention to discovery and develop a various of manifold learning methods to deal with problems in manifold learning. Geodesic distance computing suffers from loops when constructing a neighborhood graph, Lee and Verleysen [12] designed a nonlinear dimensionality reduction technique to break loops on the manifold. In paper [13], Chois related Isomap to Mercer kernel machines to consider the projection property, and eliminated critical outliers for topological stability. The authors reviewed dimensionality reduction techniques in [16]. This paper is an extension of our work [8]. We are motivated by Isomap [4] and Isometric Projection [9], and propose a linear projection method, called Orthogonal Isometric Projection, which is a variation of Isometric Projection. Cai proposed Isometric Projection [9] addressed the same purpose as ours. However, in this paper, we constrain the projection is orthogonal which differs from Cai’s, and solve a standard eigenvalue problem; we also analyze the properties of orthogonal projection in manifold learning, and demonstrate why 150 Y. Zheng et al. / Neurocomputing 103 (2013) 149–154 orthogonal projection works better. The main difference between our method and Cai’s orthogonal version of Isometric Projection is that we intuitively build a reasonable objective function, and solve the optimization in a standard eigenvalue problem, while Cai solved problem in a generalized eigenvalue problem. We extend our method to the supervised cases and sparse Orthogonal Isometric Projection. We test our method on different datasets. In the following section we will briefly introduce Cai’s Isometric Projection to demonstrate the advantages of our algorithm. 2. A brief review of Isomap and Isometric Projection 2.1. Isomap Isomap was proposed by Tenenbaum et al. [4] and is one of most popular manifold learning techniques. It aims to obtain a Euclidean embedding of points such that the geodesic distance in high dimensional space gets close to Euclidean distance between each pair of points. The mathematical formulation is X min ðdG ðxi ,xj ÞdE ðyi ,yj ÞÞ2 ð1Þ preserve the geodesic distance on a neighborhood graph, and learn a linear projection under the general Isomap framework, but has a different and reasonable constraint that projections are orthogonal. We will demonstrate why orthogonal projections work better in the next section. 3.1. The objective function of OIP Under the Isomap framework, it minimizes the objective function in Eq. (2). In math tðDG Þ ¼ CWC=2, and C is the centering matrix defined by C ¼ In 1=N eN eTN , where eN ¼ ½1, . . . ,1TN , W is a Dijkstra distance matrix based on K nearest neighbor graph over all data points. Let f ðVÞ ¼ JtðDG ÞX T VV T XJ2 , we are seeking for a linear projection: min f ðVÞ V Let S ¼ tðDG Þ, which is a known neighborhood graph constructed from the given dataset. We have f ðVÞ ¼ trððSX T VV T XÞT ðSX T VV T XÞÞ ¼ trðST SST ðX T VV T XÞðX T VV T XÞT S þðX T VV T XÞT X T VV T XÞ ¼ trðST SÞ þtrððX T VV T XÞT ðX T VV T XÞ2ST ðX T VV T XÞÞ i,j ¼ trððX T VV T XÞT 2ST ÞðX T VV T XÞ þ trðST SÞ dG is the geodesic distance, dE is the Euclidean distance, and in a matrix form: So, the objective function Eq. (5) is equivalent to minJtðDG ÞtðDE ÞJ2 min ð2Þ DG is the geodesic distance matrix, DE is the Euclidean distance matrix, t is an operation which converts the Euclidean distance into an inner product form. The problem is solved under the MDS framework. Isomap makes an assumption that a manifold existing in high dimension space, and applies the geodesic distance to measure the similarity of each point pair. However, if insufficient samples are given or the data is noised, the intrinsic geometry of the data is difficult to be captured by constructing the neighborhood graph. ð5Þ ð6Þ trðV T XððX T VV T XÞT 2ST ÞX T VÞ s:t: VTV ¼ I Let M ¼ XðX T X2ST ÞX T , then the problem becomes to min trðV T MVÞ s:t: VTV ¼ I which leads to a standard eigenvalue problem 2.2. Isometric Projection MV ¼ lV Cai et al. [9] extended Isomap algorithm to learn a linear projection by solving a spectral graph optimization problem. Suppose that Y ¼ V T X, they minimized the objective function minJtðDG ÞX T VV T XJ2 trðV T X tðDG ÞX T VÞ V s:t: T V XX V ¼ I which is equivalent to a generalized eigenvalue problem the edge between i and j weighted by Gaussian kernel is defined 2 as eJxi xj J . We summarize the algorithm in Algorithm 1. T X tðDG ÞX V ¼ lXX V To solve the problem efficiently in computation cost, Cai also applied the regression in [9,14] over Y and X called spectral regression (SR). Y is computed first, which is the eigenvector of tðDG Þ, then a ¼ arg min a m X ðaT xi yi Þ2 þ aJaJ2 First we need to construct the neighborhood graph G. As all we know, there are two options: if xj A Nðxi Þ, Nðxi Þ is the k nearest neighbor of xi, if Jxi xj J2 r E, E is a small number as a threshold. T T 3.2. The algorithm of OIP ð3Þ To make the problem tractable, they imposed a constraint V T XX T V ¼ I, and rewrote the minimization problem as arg max ð7Þ V is determined by eigenvectors of M corresponding to p smallest eigenvalues. ð4Þ i¼1 The condition V T XX T V ¼ I constrained that low-dimension embedding of points is orthogonal, namely, Y T Y ¼ I. 3. Orthogonal Isometric Projection The main idea of Orthogonal Isometric Projection is to seek an orthogonal mapping over the training dataset so as to best Algorithm 1. Orthogonal Isometric Projection. 1: Construct a neighborhood graph G over the data points. Compute the Dijkstra distances matrix Wði,jÞ ¼ dG ði,jÞ over the graph between every point pair. dG ði,jÞ is the shortest path of i and j, otherwise dG ði,jÞ ¼ inf. 2: Compute tðDG Þ ¼ CWC=2, C is the centering matrix, and C ¼ In 1=N eN eTN . 3: Compute the eigenvectors of M ¼ XðX T X2tðDG ÞT ÞX T . V is determined by eigenvectors of M associated with p smallest eigenvalues. Y. Zheng et al. / Neurocomputing 103 (2013) 149–154 4. Why orthogonal projections better 5. Extensions of OIP Cai et al. proposed OLPP [5] after LPP [7] by composing the orthogonal constraints. Kokiopoulou and Saad extended LLE [3] to ONPP [6], showed the success of orthogonal projection over various datasets. It seems that orthogonal extensions always work better than the original methods. However, the existing algorithms add orthogonal constraints either to make the problems tractable in the optimization or to seek an explicit linear projection. They did not analyze the properties of orthogonal projections itself, and did not find out the reasons’ causing the better results. We discover the properties of orthogonal projections, which explain the reasons of orthogonal algorithms working reasonable and better. 5.1. Supervised case Lemma 1. Suppose that xi A Rm is a point in high dimensional space, V is a m p projection matrix, and VV T ¼ I, yi A Rp is the linear projection of xi in low dimension, yi ¼ V T xi , then Jyi yj J2 ¼ Jxi xj J2 151 We can extend OIP to a supervised case. In supervised cases, the class labels are available. OIP modifies appropriately, and produces a projection which carries geometric information as well as discriminate information. We modify the objective function as follows, which differs from the original Isomap (Eq. (1)) X min ðoij dM ðxi ,xj ÞdE ðyi ,yj ÞÞ2 i,j oij is a weight, which compacts points xi and xj if they share the same label, and expands xi and xj if they belong to different classes. In other words, if xi and xj come from the same class, we would like to have the projections of xi and xj closer in lowdimension space, namely, compacting by setting oij Z0, oij r1. If xi and xj are from different classes, we would like to force the projections further by adding oij Z1. In a matrix form, we minimize the objective function: minJA tðDG ÞtðDY ÞJ2 Proof. The Euclidean distance between yi and yj A tðDG Þ means ½Aij ½tðDG Þij . Obviously, we can solve the problem in the same way as the unsupervised case, when we just take S as A tðDG Þ instead of t in Eq. (5). Jyi yj J2 ¼ ðyi yj ÞT ðyi yj Þ ¼ ðV T xi V T xj ÞT ðV T xi V T xj Þ 5.2. Sparse OIP ¼ ðV T ðxi xj ÞÞT ðV T ðxi xj ÞÞ ¼ ðxi xj ÞT VV T ðxi xj Þ ¼ Jxi xj J2 & ð8Þ Lemma 2. Suppose that xi A Rm is a point in high dimensional space, V is a m p projection matrix, and VV T ¼ I, yi A Rp is the linear projection of xi in low dimension, yi ¼ V T xi , then angleðyi ,yj Þ ¼ angleðxi ,xj Þ Proof. Since the inner product between yi and yj equal to xi and xj /yi ,yj S ¼ yTi yj ¼ ðV T xi ÞT ðV T xj Þ ¼ xTi VV T xj ¼ /xi ,xj S ð9Þ Our problem essentially solves a standard optimization in Eq. (7), which also can be incorporated into a regression framework in the way of Sparse PCA [11]. The optimal V is the eigenvectors with respect to the maximum eigenvalues of Eq. (7). Since M is a T real symmetric matrix, so M can be decomposed into X~ X~ . ~ Suppose the rank of X is r, and SVDðX Þ is ~ V~ T X~ ¼ U~ S it is easy to verify that the column vectors in U~ are the eigenvectors of T X~ X~ . Let Y ¼ ½y1 ,y2 , . . . ,yr nr , each row vector is a sample vector in rdimensional subspace, and V ¼ ½v1 ,v2 , . . . ,vr mr . Therefore, the projective functions of OIP are solved by the linear systems: X T vp ¼ yp , p ¼ 1,2, . . . ,r vp is the solution of the regression system The angle between vectors yi and yj is /yi ,yj S angleðyi ,yj Þ ¼ arccos Jyi J Jyj J ¼ arccos /yi ,yj S Jyi 0y J Jyj 0y J /xi ,xj S ¼ arccos Jxi 0x J Jxj 0x J ¼ angleðxi ,xj Þ 0y denotes a p dimension zero vector and 0x denotes a m dimension zero vector, yi, yj, xi and xj are non-zero vectors. & We can see that the Euclidean distance and the angle between each data pair in high-dimensional space have been preserved in the low dimensional space, respectively, when orthogonal projections are applied. We believe that more microscopic characteristics of data points are kept during orthogonally projecting, more macroscopic properties of the data will be preserved, and less information will be lost from utilizing dimensionality reduction techniques. vp ¼ arg min v n X ðvT xi yip Þ2 i¼1 where yip is the ith element of yp. Similar to Zou et al. [11], we can get the sparse solutions by adding L1 regularization vp ¼ arg min v n X ðvT xi yip Þ2 þ aJvJ1 i¼1 where vp is the pth column of vp. The regression can be solved efficiently using the LARS algorithm. 6. Experiments We evaluate our algorithm on Reuters-21578 and USPS, which are downloaded from the public website,1 and compare with LDA, IP, IPþ SR, OLPP and ONPP, and demonstrate the average accuracy and average error rates in the section. We randomly sample 25 times from the datasets as the training sets with varying rates from 20% to 80%, and the rest are used for testing. We map points 1 http://www.zjucadcg.cn/dengcai/Data/TextData.html 152 Y. Zheng et al. / Neurocomputing 103 (2013) 149–154 Table 1 Comparison on Reuters-21578. Training/kNN LDA IP OLPP ONPP OIP 20% k¼ 1 k¼ 5 79.53 70.78 79.57 70.79 79.75 71.03 78.24 71.22 72.65 71.26 74.71 71.40 73.60 72.09 69.89 7 2.17 81.487 0.32 80.87 70.39 30% k¼ 1 k¼ 5 80.397 1.06 80.327 1.12 80.117 0.69 80.017 0.79 73.41 71.47 75.44 71.09 76.94 7 0.82 73.07 71.35 82.967 0.63 82.13 7 0.61 40% k¼ 1 k¼ 5 79.93 71.59 79.87 71.60 82.077 0.94 81.16 71.25 73.40 71.13 75.29 70.95 78.52 7 1.18 75.08 72.19 83.767 0.34 83.29 7 0.34 50% k¼ 1 k¼ 5 79.69 72.12 79.65 72.13 83.047 1.23 82.59 71.26 73.48 70.81 75.32 70.65 80.11 70.84 76.72 7 1.01 83.83 7 0.35 83.967 0.70 60% k¼ 1 k¼ 5 80.067 1.84 80.097 1.86 78.79 71.34 78.057 1.66 73.61 70.93 75.04 70.70 80.63 70.4 77.35 7 0.54 84.627 0.63 84.51 7 0.44 70% k¼ 1 k¼ 5 77.87 73.48 77.99 73.44 79.45 70.71 77.84 70.56 72.77 70.85 73.50 71.10 81.71 7 0.84 77.68 7 1.31 84.947 0.96 84.73 7 0.84 80% k¼ 1 k¼ 5 79.14 71.57 79.23 71.65 80.167 0.63 78.88 71.18 71.70 71.88 72.15 71.36 81.94 7 1.53 77.28 7 1.37 85.09 70.71 85.387 0.78 0.22 LDA IP ONPP OLPP OIP 0.21 Average error rate 0.2 0.19 0.18 0.17 Fig. 2. USPS example. 0.16 0.15 20 30 40 50 60 70 80 90 100 Dimensions Fig. 1. Dimensions vs. average classification error on Reuters-21578 dataset. in test sets by projections learned from training sets, and apply the nearest neighbor (k¼1) and k nearest neighbor (k¼5) to determine categories labels. In KNN, we decide the label of the testing sample by voting from the K nearest neighbor. We also demonstrate the dimensions versus average error rate by half training and half testing. Assume that li is the ground truth, bi is the label assigned after dimensionality reduction by methods, Ntest is the number of test samples Acc ¼ N 1X Nj¼1 PNtest dðli ,bi Þ N test i¼1 and N ¼25 in our experiment, d is the Dirac delta function. The original Reuters-21578 corpus belongs to 135 categories with multiple labels, and the documents with multiple labels are ignored. The data left in Reuters-21578 dataset for our experiment has 8293 samples with 18,933 features, and comes from 65 categories. Table 1 shows the comparison results of our method with LDA, IP, OLPP, ONPP on Reuters-21578. The bolds are from our method, 7 shows the standard derivation. With the number of training samples, the classification precision increases as what we expect Fig. 1 shows with the number of dimensions increases, the average classification error decreases. USPS is a well-known handwritten digits corpus from US postal service. It contains normalized gray scale images of size 16 16, and totally 9298 samples with 256 features. Fig. 2 shows some examples of USPS dataset. A human error rate estimated to be 2.37% shows that it is a hard task over USPS dataset [10]. We show the classification precision from our method with the results from LDA, IP, IP þSR, ONPP in Table 2. Our method outperforms all other methods. Fig. 3 shows the average error with the number of dimensions on the USPS dataset. 7. Conclusion In this paper, we propose an Orthogonal Isometric Projection by solving a standard eigenvalue problem. We discover two properties of orthogonal projections, microscopic characteristics (Euclidean distance and angle) have been kept through the projection, which explain why orthogonal techniques work better than original ones. And our method also can be extended to supervise cases and the regression framework. However, dimensionality reduction is still a challenging problem. The parameters tuning of manifold learning algorithms is sort of troubles for reaching the best performance of the algorithms. And manifold learning essentially needs a large and unbiased data for the neighbor graph. If samples are not available enough, would the Y. Zheng et al. / Neurocomputing 103 (2013) 149–154 153 Table 2 Comparison on USPS. Training/kNN LDA IP IP þSR ONPP OIP 20% k¼ 1 k¼ 5 89.13 70.34 90.49 70.48 92.11 7 0.39 90.65 70.54 93.90 7 0.50 93.55 7 0.48 91.30 7 0.45 90.77 7 0.56 95.107 0.32 94.027 0.20 30% k¼ 1 k¼ 5 90.39 70.32 91.73 70.31 93.61 7 0.33 95.02 70.39 94.96 7 0.36 94.93 7 0.31 92.79 7 0.48 92.16 7 0.42 95.937 0.22 94.85 70.17 40% k¼ 1 k¼ 5 91.18 70.36 92.01 70.31 94.48 7 0.28 94.01 70.28 95.69 7 0.36 95.51 7 0.29 93.20 7 0.51 92.73 7 0.50 96.407 0.20 95.49 70.34 50% k¼ 1 k¼ 5 91.54 70.32 92.71 70.33 94.85 7 0.42 94.77 7 0.37 95.91 7 0.34 96.01 7 0.25 93.33 7 0.57 93.03 7 0.55 96.657 0.26 95.96 70.25 60% k¼ 1 k¼ 5 91.97 70.32 92.90 70.37 95.21 7 0.31 94.92 7 0.25 96.14 7 0.35 96.13 7 0.34 93.93 7 0.48 93.68 7 0.58 97.017 0.25 96.16 70.28 70% k¼ 1 k¼ 5 92.02 70.49 93.29 70.42 95.61 7 0.35 95.15 7 0.42 96.59 7 0.32 96.34 7 0.37 93.98 7 0.56 93.50 7 0.61 97.177 0.23 96.42 70.25 80% k¼ 1 k¼ 5 92.19 70.58 93.22 70.62 95.97 7 0.40 95.83 7 0.29 96.74 7 0.40 96.79 7 0.37 94.26 7 0.57 93.81 7 0.72 97.357 0.34 96.49 70.33 0.13 LDA IP IP+SR ONPP OIP 0.12 0.11 Average error rate 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 20 30 40 50 60 70 80 90 100 Dimensions [6] E. Kokiopoulou, Y. Saad, Orthogonal neighborhood preserving projections: a projection-based dimensionality reduction technique, IEEE Trans. Pattern Anal. Mach. Intell. 29 (12) (2007) 2143–2156. [7] X. He, S. Yan, Y. Hu, P. Niyogi, H.J. Zhang, IEEE Trans. Pattern Anal. Mach. Intell. 27 (3) (2005) 328–340. [8] Yali Zheng, Yuan Yan Tang, Bin Fang, Taiping Zhang, Orthogonal Isometric Projection, in: Proceedings of IEEE International Conference of Pattern Recognition, 2012. [9] D. Cai, X. He, J. Han, Isometric projection, in: Proceedings of the National Conference on Artificial Intelligence, 2007, pp. 528–533. [10] I. Chaaban, M. Scheessele, Human Performance on the USPS Database, Technique Report, Indiana University South Bend, 2007. [11] H. Zou, T. Hastie, R. Tibshirani, Sparse principal component analysis, J. Comput. Graphical Stat. 15 (2) (2006) 265–286. [12] J.A. Lee, M. Verleysen, Nonlinear dimensionality reduction of data manifolds with essential loops, Neurocomputing 67 (2005) 29–53. [13] H. Choi, S. Choi, Robust kernel Isomap, Pattern Recognition 40 (3) (2007) 853–862. [14] D. Cai, X. He, J. Han, SRDA: an efficient algorithm for large-scale discriminant analysis, IEEE Trans. Knowl. Data Eng. 20 (1) (2008) 1–12. [15] M. Balasubramanian, E.L. Schwartz, The isomap algorithm and topological stability, Science 295 (2002) 7. [16] L. Maaten, E. Postma, J. Herik, Dimensionality reduction: a comparative review, Comput. Inf. Sci. 10 (2) (2007) 1–35. Fig. 3. Dimensions vs. average classification error on USPS dataset. manifold learning work? Can we generate new samples from the insufficient data? These problems are our future work. Acknowledgments Yali Zheng is a Ph.D. candidate in School of Computer Science, Chongqing University, China. She was a visiting student at University of Pittsburgh and Carnegie Mellon University in USA (10/2008-09/2011). She is interested in dimensionality reduction, computer vision, and computational photography. She is also an IEEE student member. This work is supported by National Science Foundations of China (61075045, 60873092, 90820306, 61173129) and the Fundamental Research Funds for the Central Universities (ZYGX2009X013). References [1] W.S. Torgerson, Multidimensional scaling: I. Theory and method, Psychometrika 17 (4) (1952) 401–419. [2] X. He, P. Niyogi, Locality preserving projections, in: Proceedings of Advances in Neural Information Processing Systems, 2003, pp. 153–160. [3] S.T. Roweis, L.K. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science 290 (2000) 2323–2326. [4] J.B. Tenenbaum, V. Silva, J.C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science 290 (2000) 2319–2323. [5] D. Cai, X. He, J. Han, H.J. Zhang, Orthogonal Laplacian faces for face recognition, IEEE Trans. Image Process. 15 (11) (2006) 3608–3614. Bin Fang received the B.S. degree in Electrical Engineering from Xi’an Jiaotong University, Xi’an, China, the M.S. degree in Electrical Engineering from Sichuan University, Chengdu, China, and the Ph.D. degree in Electrical Engineering from the University of Hong Kong, Hong Kong, China. He is currently a Professor in School of Computer Science, Chongqing University, China. His research interests include computer vision, pattern recognition, medical image processing, biometrics applications, and document analysis. He has published more than 100 technical papers and is an Associate Editor of the International Journal of Pattern Recognition and Artificial Intelligence. He is a Senior Member of IEEE. 154 Y. Zheng et al. / Neurocomputing 103 (2013) 149–154 Yuan Yan Tang received the B.S. degree in Electrical and Computer Engineering from Chongqing University, Chongqing, China, the M.S. degree in Electrical Engineering from the Graduate School of Post and Telecommunications, Beijing, China, and the Ph.D. degree in Computer Science from Concordia University, Montreal, Canada. He is currently a Professor in the School of Computer Science at Chongqing University, and a Chair Professor at the Faculty of Science and Technology in the University of Macau and Adjunct Professor in Computer Science at Concordia University. He is an Honorary Professor at Hong Kong Baptist University, an Advisory Professor at many institutes in China. His current interests include wavelet theory and applications, pattern recognition, image processing, document processing, artificial intelligence, parallel processing, Chinese computing and VLSI architecture. He has published more than 300 technical papers and is the author/co-author of 23 books/bookchapters on subjects ranging from electrical engineering to computer science. He serviced as a General Chair, Program Chair and Committee Member for many international conferences. He was the General Chair of the 18th International Conference on Pattern Recognition (ICPR’06). He is the Founder and Editor-in-Chief of International Journal on Wavelets, Multiresolution, and Information Processing (IJWMIP) and Associate Editors of several international journals related to Pattern Recognition and Artificial Intelligence. He is an IAPR Fellow and IEEE Fellow. He is the chair of pattern recognition committee in IEEE SMC. Taiping Zhang received the B.S. and M.S. degrees in Computational Mathematics, and Ph.D. degree at School of Computer Science in Chongqing University, Chongqing, China, in 1999, 2001, and 2010, respectively. He is currently an Associate Professor at School of Computer Science in Chongqing University, and a visiting Research fellow at the Faculty of Science and Technology, University of Macau. His research interests include pattern recognition, image processing, machine learning, and computational mathematics. He published extensively in IEEE Transactions on Image Processing, IEEE Transactions on Systems, Man, and Cybernetics, Part B (TSMC-B), Pattern Recognition, Neurocomputing, etc. He is a member of IEEE. Ruizong Liu Received his B.S. degree in project management from Nanjing University of Technology, Nanjing, China, in 2004. He received his Ph.D. Degree at School of Computer Science in Chongqing University. He is postdoc at the Faculty of Science and Technology, University of Macau, supervised by Yuan Yang Tang. His research interests include machine learning, pattern recognition, computer vision, and cognitive science.