Tesis Doctoral
Transcription
Tesis Doctoral
Tesis Doctoral Nuevas Técnicas de Clasificación de Imágenes Hiperespectrales New Techniques for Hyperspectral Image Classification Autora: Inmaculada Garcı́a Dópido DEPARTAMENTO DE TECNOLOGÍA DE LOS COMPUTADORES Y DE LAS COMUNICACIONES Conformidad del Director: Antonio Plaza Miguel Fdo.: 2013 ii Resumen La principal contribución del presente trabajo de tesis doctoral viene dada por el diseño e implementación de nuevas técnicas de clasificación de imágenes hiperespectrales de la superficie terrestre, obtenidas de forma remota mediante sensores aerotransportados o de tipo satélite. En particular, en la presente tesis doctoral se integran por primera vez en la literatura técnicas para desmezclado espectral y de clasificación de forma combinada para mejorar el proceso de interpretación de dichas imágenes. La clasificación y el desmezclado constituyen dos campos muy activos en el análisis de imágenes hiperespectrales. Por un lado, las técnicas de clasificación se enfrentan a varios problemas debidos a la alta dimensionalidad de las imágenes y la escasez de muestras etiquetadas, lo que dificulta los procesos de clasificación supervisada o semi-supervisada que son los más ampliamente utilizados en este campo (en particular, aquellos basados en un aprendizaje activo por parte del usuario). Por otra parte, el problema de la mezcla en imágenes hiperespectrales es bastante relevante, debido a que la resolución del sensor no es lo suficientemente alta para que en un único pixel solamente se encuentre presente un material. En este sentido, las técnicas para desmezclado intentan caracterizar los diferentes materiales presentes en cada pixel. En el presente trabajo de tesis doctoral, se ha integrado la información adicional que proporcionan las técnicas para desmezclado en el proceso de clasificación con el fin de obtener métodos más eficientes adaptados al uso de imágenes hiperespectrales, y que además resulten eficientes en términos computacionales. Para validar los nuevos métodos de clasificación propuestos, se utilizan imágenes proporcionadas por sensores de observación remota de la tierra tales como AVIRIS (Airborne Visible Infra-Red Imaging Spectrometer) de NASA o ROSIS (Reflective Optics Spectrographic lmaging System) de la Agencia Alemana del Espacio (DLR). iv Abstract The main contribution of the present thesis work is the design and implementation of new techniques for classification of remotely sensed hyperspectral images, collected by airborne or spaceborne Earth observation instruments. Specifically, in this thesis work we explore the integration (for the first time in the literature) of techniques for spectral unmixing and hyperspectral image classification in synergistic fashion, with the ultimate goal of improving the analysis and interpretation of hyperspectral imaging by taking advantage of the advanced properties of both analysis techniques in combined fashion. It should be noted that spectral unmixing and classification have been two very active areas in hyperspectral imaging, but these techniques have been rarely exploited in synergistic fashion. On the one hand, classification techniques face problems related with the extremely high dimensionality of the hyperspectral data and the limited number of training samples available a priori, which makes it difficult to perform supervised or semi-supervised classification (particularly, those based on active learning techniques). On the other hand, the mixture problem is very relevant in hyperspectral imaging, mainly due to the fact that the spatial resolution often cannot separate between different materials participating in a pixel which results in a predominance of mixed pixels in this kind of images. As a result, hyperspectral images are dominated by mixed pixels and unmixing techniques are crucial for a correct interpretation and exploitation of the data. In this thesis, we have explored the integration of unmixing and classification and particularly explored the possibility that unmixing approaches offer to provide an additional source of information in the classification process, with the ultimate goal of obtaining more accurate methods for the analysis of hyperspectral scenes without increasing significantly the computational complexity of the process. In order to validate the new classification methods developed in the present thesis work, we resort to hyperspectral images provided by standard and widely used instruments such as NASA’s Airborne Visible Infra-Red Imaging Spectrometer (AVIRIS) or the Reflective Optics Spectrographic Imaging System (ROSIS) operated by the German Aerospace Agency (DLR). vi Acknowledgement I would like to thank the supervisor of this thesis, Antonio Plaza, for his great patience, encouragement and support over the years, and for spending a lot of his valuable time helping me. It has been a pleasure for me to work with him. I would like to thank Professor Paolo Gamba for his collaboration in some of the developments presented in this thesis and also for providing the ROSIS data over Pavia University, Italy, along with the training and test sets. I also gratefully acknowledge his great help and support during a research visit to the University of Pavia, Italy, funded by Fondazione Cariplo. This research stay was instrumental in order to conclude some of the developments presented in this thesis. I would also like to thank Devis Tuia for his collaboration. I would like to thank Professor D. Landgrebe for making the AVIRIS Indian Pines hyperspectral data set available to the community and Prof. Melba Crawford at Purdue University is also gratefully acknowledged for making available the AVIRIS Kennedy Space Center to the community. I would also like to thank the collaboration of Alberto Villa, Jun Li, Prashanth Marpu, Maciel Zortea, José Manuel Bioucas Dias and Jon Atli Benediktsson, who also contributed to some of the developments presented in this thesis work. I would like to thank my colleagues and friends in HyperComp: Javier, Gabriel, Sergio Sánchez, Daniel, Sergio Bernabé, Abel, Jorge, Nacho, Mahdi and Ben who denitely deserve my sincere acknowledgments. I would also like to thank my friends from my village and from Cáceres. Last but not least, I would like to thank my parents (Antonia and Juan Antonio), who have trusted me and given me emotional support all my life; my sisters (Mari Loli and Esther) and my little brother (Juan Antonio) who have supported and encouraged me to continue. And finally, I would like to make a special appreciation to Jesús, who has been a fundamental part of this thesis for his help, understanding and for making me laugh every day. This thesis work has been developed under the European Community’s Marie Curie Research Training Networks Programme under reference MRTN-CT-2006-035927, Hyperspectral Imaging Network (HYPER-I-NET). Funding from the Portuguese Science and Technology Foundation, project PEstOE/EEI/LA0008/2011, and from the Spanish Ministry of Science and Innovation (CEOS-SPAIN project, reference AYA2011-29334-C02-02) is also gratefully acknowledged. It was also supported in part by the Icelandic Research Fund and the University of Iceland Research Fund. The development of the thesis has also received support from the Spanish Ministry of Science and Innovation (HYPERCOMP/EODIX project, reference AYA2008-05965-C04-02). Funding from Junta de Extremadura (local government) under project PRI09A110 is also gratefully acknowledged. viii Contents 1 Introduction 1.1 Context and motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Main contributions of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Unmixing Prior to Supervised Classification Images 2.1 Summary . . . . . . . . . . . . . . . . . . . . . 2.2 Introduction . . . . . . . . . . . . . . . . . . . . 2.3 Unmixing-based feature extraction . . . . . . . 2.3.1 Unmixing chain #1 . . . . . . . . . . . 2.3.2 Unmixing chain #2 . . . . . . . . . . . 2.3.3 Unmixing chain #3 . . . . . . . . . . . 2.3.4 Unmixing chain #4 . . . . . . . . . . . 2.4 Experimental results . . . . . . . . . . . . . . . 2.4.1 Hyperspectral data sets . . . . . . . . . 2.4.2 Experiments . . . . . . . . . . . . . . . 2.5 Final observations and future directions . . . . 1 1 5 6 of Remotely Sensed Hyperspectral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 13 13 15 15 16 16 17 18 18 19 24 3 A Comparative Assessment of Unmixing-Based Feature Extraction Techniques 3.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 A new unmixing-based feature extraction technique . . . . . . . . . . . . . . . . . . . . 3.3.1 Linear spectral unmixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Unsupervised unmixing-based feature extraction . . . . . . . . . . . . . . . . . 3.3.3 Supervised unmixing-based feature extraction . . . . . . . . . . . . . . . . . . . 3.4 Hyperspectral data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 AVIRIS Salinas Valley . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 ROSIS Pavia University . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Feature extraction techniques used in the comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 27 28 29 30 31 32 33 35 35 37 37 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CONTENTS 3.6 3.5.2 Supervised classification system and experimental setup . . . . . . . . . . . . . . . 39 3.5.3 Analysis and discussion of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Final observations and future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4 Semi-Supervised Self-Learning for Hyperspectral Image Classification 4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Semi-supervised learning . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Self-learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Experiments with AVIRIS Indian Pines data set . . . . . . . . . . . 4.4.2 Experiments with ROSIS Pavia University data set . . . . . . . . . . 4.5 Summary and future directions . . . . . . . . . . . . . . . . . . . . . . . . . 5 A New Hybrid Strategy Combining Semi-Supervised Unmixing 5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Proposed approach . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Considered spectral unmixing chains . . . . . . . . 5.3.2 Proposed hybrid strategy . . . . . . . . . . . . . . 5.3.3 Active learning . . . . . . . . . . . . . . . . . . . . 5.4 Experimental results . . . . . . . . . . . . . . . . . . . . . 5.4.1 Balance between classification and unmixing . . . 5.4.2 Results for AVIRIS Indian Pines . . . . . . . . . . 5.4.3 Results for ROSIS Pavia University . . . . . . . . 5.5 Summary and future directions . . . . . . . . . . . . . . . 6 Conclusions and Future Research Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 49 49 52 52 54 57 58 66 71 . . . . . . . . . Classification and Spectral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 77 77 79 79 81 82 82 83 83 88 88 93 A Publications 97 A.1 International journal papers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 A.2 Peer-reviewed international conference papers. . . . . . . . . . . . . . . . . . . . . . . . . 98 Bibliography 101 x List of Figures 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Increase in spectral resolution of remotely sensed data. Concept of mixed pixels in hyperspectral image. . . . Flowchart illustrating the organization of this thesis. . Summary of contributions in Chapter 2. . . . . . . . . Summary of contributions in Chapter 3. . . . . . . . . Summary of contributions in Chapter 4. . . . . . . . . Summary of contributions in Chapter 5. . . . . . . . . 2.1 Unmixing-based feature extraction chains #1 (spectral endmembers) and #2 (spatialspectral endmembers). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unmixing-based feature extraction chain #3 (chain #4 replaces endmember extraction with averaging of the signatures associated to each labeled class in the training set). . . . (a) False color composition of the AVIRIS Indian Pines scene. (b) Ground truth-map containing 15 mutually exclusive land-cover classes (right). . . . . . . . . . . . . . . . . . . Best classification results for AVIRIS Indian Pines (using SVM classifier with Gaussian kernel, trained with 10 percentage of the available samples per class). . . . . . . . . . . . The comparative classification results per class for AVIRIS Indian Pines (using SVM classifier with Gaussian kernel, trained with 10 percentage of the available samples per class) with MNF and Unmixing Chain#4. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 2.3 2.4 2.5 3.1 3.2 3.3 3.4 3.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . 3 . 6 . 7 . 8 . 9 . 10 Block diagram illustrating an unsupervised clustering followed by MTMF (CMTMFunsup ) technique for unmixing-based feature extraction. . . . . . . . . . . . . . . . . . . . . . . . Block diagram illustrating a supervised clustering followed by MTMF (CMTMFsup ) technique for unmixing-based feature extraction. . . . . . . . . . . . . . . . . . . . . . . . (a) False color composition an AVIRIS hyperspectral image comprising several agricultural fields in Salinas Valley, California. (b) Ground truth-map containing 15 mutually exclusive land-cover classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Photographs taken at the site during data collection. . . . . . . . . . . . . . . . . . . . . . (a) False color composition of the ROSIS Pavia scene. (b) Ground truth-map containing 9 mutually exclusive land-cover classes. (c) Training set commonly used for the ROSIS Pavia University scene. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 17 18 23 24 32 33 35 36 37 LIST OF FIGURES 3.6 Classification results for the AVIRIS Indian Pines scene (obtained using an SVM classifier with Gaussian kernel, trained with 5% of the available samples). . . . . . . . . . . . . . . 3.7 Classification results for the AVIRIS Salinas Valley scene (obtained using an SVM classifier with Gaussian kernel, trained with 2% of the available samples). . . . . . . . . . . . . . . 3.8 Classification results for the ROSIS Pavia University scene (obtained using an SVM classifier with Gaussian kernel, trained with 50 pixels of each available ground-truth class). 3.9 Components extracted by MNF from the ROSIS Pavia University scene (ordered from left to right in terms of amount of information). . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10 Components extracted by the CMTMFunsup feature extraction technique from the ROSIS Pavia University scene (in no specific order). . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 4.2 4.3 4.5 4.9 4.4 4.6 4.7 4.8 5.1 A graphical example illustrating how spatial information can be used as a criterion for semi-supervised self-learning in hyperspectral image classification. . . . . . . . . . . . . . . OA (as a function of the number of unlabeled samples) obtained for the AVIRIS Indian Pines data set using the MLR (right) and probabilistic SVM (left) classifier, respectively. Estimated labels were used in all the experiments, i.e., lr = 0. . . . . . . . . . . . . . . . . Classification maps and OA (in the parentheses) obtained after applying the MLR classifier to the AVIRIS Indian Pines data set by using 10 labeled training samples and 750 unlabeled samples, i.e., ln = 160, un = 750 and lr = 0. . . . . . . . . . . . . . . . . . . . . . . . . . . OA (as a function of the number of unlabeled samples) obtained for the AVIRIS Indian Pines data set using the MLR classifier with BT sampling by using 5 labeled samples per class (in total 80 samples). Two cases are displayed: the one in which all unlabeled samples are estimated by the proposed approach (i.e., lr = 0) and the optimal case, in which true labels are used whenever possible (i.e., lr = ur ). . . . . . . . . . . . . . . . . . OA (as a function of the number of unlabeled samples) obtained for the ROSIS Pavia University data set using the MLR classifier with BT sampling by using 100 labeled samples per class (in total 900 samples). Two cases are displayed: the one in which all unlabeled samples are estimated by the proposed approach (i.e., lr = 0) and the optimal case, in which true labels are used whenever possible (i.e., lr = ur ). . . . . . . . . . . . . . Classification maps and OA (in the parentheses) obtained after applying the probabilistic SVM classifier to the AVIRIS Indian Pines data set by using 10 labeled training samples and 750 unlabeled samples, i.e., ln = 160, un = 750 and lr = 0. . . . . . . . . . . . . . . . OA (as a function of the number of unlabeled samples) obtained for the ROSIS Pavia University data set using the MLR (right) and probabilistic SVM (left) classifier, respectively. Estimated labels were used in all the experiments, i.e., lr = 0. . . . . . . . . Classification maps and OA (in the parentheses) obtained after applying the MLR classifier to the ROSIS Pavia University data set (in all cases, ln = 90 and lr = 0). . . . . . . . . . Classification maps and OA (in the parentheses) obtained after applying the probabilistic SVM classifier to the ROSIS Pavia University data set (in all cases, ln = 90 and lr = 0). . 45 46 47 48 48 56 59 64 65 71 73 74 75 76 Flowchart of the unmixing-based chain designated as strategy 1. . . . . . . . . . . . . . . 80 xii LIST OF FIGURES 5.2 5.3 5.4 5.5 5.6 5.7 Flowchart of the unmixing-based chain designated as strategy 2. . . . . . . . . . . . . . . Flowchart of the unmixing-based designated as strategy 3. . . . . . . . . . . . . . . . . . . OA (as a function of the number of unlabeled samples) obtained for the AVIRIS Indian Pines data set by different classifiers. BT is the semi-supervised classifier where unlabeled samples are selected using breaking ties. RS is the semi-supervised classifier where unlabeled samples are selected using random sampling. Finally, Strategy 1 to Strategy 4 denote the semi-supervised hybrid classifier integrating classification and spectral unmixing (with α = 0.75), where unlabeled samples are selected using BT. . . . . . . . . . Classification maps and OAs (in the parentheses) obtained after applying different classifiers to the AVIRIS Indian Pines data set. In all cases the number of labeled samples was 10, and the number of unlabeled samples (used in the semi-supervised strategies: BT, RS, Strategy 1, Strategy 2, Strategy 3 and Strategy 4) was set to 300. . . . . . . . . . . . OA (as a function of the number of unlabeled samples) obtained for the ROSIS Pavia University data set by different classifiers. BT is the semi-supervised classifier where unlabeled samples are selected using breaking ties. RS is the semi-supervised classifier where unlabeled samples are selected using random sampling. Finally, Strategy 1 to Strategy 4 denote the semi-supervised hybrid classifier integrating classification and spectral unmixing (with α = 0.75), where unlabeled samples are selected using BT. . . . . Classification maps and OAs (in the parentheses) obtained after applying different classifiers to the ROSIS Pavia University data set. In all cases the number of labeled samples was 10, and the number of unlabeled samples (used in the semi-supervised strategies: BT, RS, Strategy 1, Strategy 2, Strategy 3 and Strategy 4) was set to 300. . . xiii 80 81 86 87 90 91 LIST OF FIGURES xiv Table Index 1.1 List of acronyms used in this thesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1 Classification accuracies (percentage) and standard deviation obtained after applying the consider ed SVM classification system (with Gaussian and polynomial kernels) to three different types of features (original, reduced and unmixing-based) extracted from the AVIRIS Indian Pines and Kennedy Space Center scenes (ten randomly chosen training sets). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Statistical differences evaluated using McNemar’s test (polynomial kernel). . . . . . . . . 22 2.2 3.1 3.2 3.3 Number of pixels in each ground-truth class in the four considered hyperspectral images. The number of training and test pixels used in our experiments can be derived from this table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 OA and AA (in percentage) obtained by the considered classification system for different hyperspectral image (AVIRIS Indian Pines and AVIRIS Kennedy Space Center) scenes using the original Spectral information, unsupervised feature extraction techniques, and supervised feature extraction techniques. Only the best case is reported for each considered feature extraction technique (with the optimal number of features in the parentheses) and the best classification result across all methods in each experiment is highlighted in bold typeface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 OA and AA (in percentage) obtained by the considered classification system for different hyperspectral image scenes (AVIRIS Salinas Valley and ROSIS Pavia University) Using the Original Spectral information, unsupervised feature extraction techniques, and supervised feature extraction techniques. Only the best case is reported for each considered feature extraction technique (with the optimal number of features in the parentheses) and the best classification result across all methods in each experiment is highlighted in bold typeface. 42 TABLE INDEX 4.1 4.2 4.3 4.4 4.5 OA, AA, individual classification accuracies [statistic obtained using the MLR probabilistic classifier when applied to the AVIRIS Indian Pines hyperspectral data set, with 10 labeled samples per class (160 samples in total) and un = 750 unlabeled training samples. It is applied two active learning techniques (MS and BT), and the supervised case is also reported. lr denotes the number of true labels available in Du (used to implement an optimal version of each sampling algorithm). The standard deviations are also reported for each test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA, AA, individual classification accuracies [and kappa statistic obtained using the MLR probabilistic classifier when applied to the AVIRIS Indian Pines hyperspectral data set, with 10 labeled samples per class (160 samples in total) and un = 750 unlabeled training samples. It is applied two active learning techniques (MBT and nEQB), and the random sampling case (RS) is also reported. lr denotes the number of true labels available in Du (used to implement an optimal version of each sampling algorithm). The standard deviations are also reported for each test. . . . . . . . . . . . . . . . . . . . . . . . . . . . OA, AA, individual classification accuracies [kappa statistic obtained using the probabilistic SVM classifier when applied to the AVIRIS Indian Pines hyperspectral data set, with 10 labeled samples per class (160 samples in total) and un = 750 unlabeled training samples. It is applied two active learning techniques (MS and BT), and the supervised case is also reported. lr denotes the number of true labels available in Du (used to implement an optimal version of each sampling algorithm). The standard deviations are also reported for each test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA, AA, individual classification accuracies [and kappa statistic obtained using the probabilistic SVM classifier when applied to the AVIRIS Indian Pines hyperspectral data set, with 10 labeled samples per class (160 samples in total) and un = 750 unlabeled training samples. It is applied two active learning techniques (MBT and nEQB), and the random sampling case (RS) is also reported. lr denotes the number of true labels available in Du (used to implement an optimal version of each sampling algorithm). The standard deviations are also reported for each test. . . . . . . . . . . . . . . . . . . . . . . . . . . . OA, AA, individual classification accuracies [%], and kappa statistic obtained using the MLR probabilistic classifier when applied to the ROSIS University of Pavia hyperspectral data set by using 10 labeled samples per class (in total 90 samples) and un = 700 unlabeled training samples. It is applied two active learning techniques (MS and BT), and the supervised case is also reported. lr denotes the number of true labels available in Du (used to implement an optimal version of each sampling algorithm). The standard deviations are also reported for each test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi 60 61 62 63 67 TABLE INDEX 4.6 4.7 4.8 5.1 5.2 5.3 5.4 OA, AA, individual classification accuracies [%], and kappa statistic obtained using the MLR probabilistic classifier when applied to the ROSIS University of Pavia hyperspectral data set by using 10 labeled samples per class (in total 90 samples) and un = 700 unlabeled training samples. It is applied two active learning techniques (MBT and nEQB), and the random sampling case (RS) is also reported. lr denotes the number of true labels available in Du (used to implement an optimal version of each sampling algorithm). The standard deviations are also reported for each test. . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 OA, AA, individual classification accuracies [%], and kappa statistic obtained using the probabilistic SVM classifier when applied to the ROSIS University of Pavia hyperspectral data set by using 10 labeled samples per class (in total 90 samples) and un = 700 unlabeled training samples. It is applied two active learning techniques (MS and BT), and the supervised case is also reported. lr denotes the number of true labels available in Du (used to implement an optimal version of each sampling algorithm). The standard deviations are also reported for each test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 OA, AA, individual classification accuracies [%], and kappa statistic obtained using the probabilistic SVM classifier when applied to the ROSIS University of Pavia hyperspectral data set by using 10 labeled samples per class (in total 90 samples) and un = 700 unlabeled training samples. It is applied two active learning techniques (MBT and nEQB), and the random sampling case (RS) is also reported. lr denotes the number of true labels available in Du (used to implement an optimal version of each sampling algorithm). The standard deviations are also reported for each test. . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 OA [%] obtained for different values of parameter α in the analysis of the AVIRIS Indian Pines hyperspectral data set with 5 labeled samples per class. The four considered spectral unmixing strategies are compared. The total number of iterations are given in the parentheses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA [%] obtained or different values of parameter α in the analysis of the ROSIS Pavia University scene with 5 labeled samples per class. The four considered spectral unmixing strategies are compared. The total number of iterations are given in the parentheses. . . . OA, AA [%], and kappa statistic obtained using different classifiers when applied to the AVIRIS Indian Pines hyperspectral data set. The standard deviation is also reported in each case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA, AA [%], and kappa statistic (in the parentheses) obtained using the MLR classifier when applied to the ROSIS Pavia University hyperspectral data set. The total number of labeled samples in each ground-truth class is given in the parentheses. . . . . . . . . . . . xvii 84 84 85 89 TABLE INDEX xviii Chapter 1 Introduction 1.1 Context and motivations The work developed in this thesis is part of the actual research lines of the Hyperspectral Computing Laboratory (HyperComp) research group at the Department of Technology of Computers and Communications, University of Extremadura. In this work, we develop new efficient algorithms for the analysis of remotely sensed hyperspectral data by integrating concepts of (supervised and semisupervised) classification and (unsupervised) spectral unmixing [1]. Figure 1.1: Increase in spectral resolution of remotely sensed data. Introduction Hyperspectral images are an extension of the concept of digital image. Figure 1.1 graphically illustrates the significant increase in the spectral resolution available for remote sensing data, from panchromatic (comprising only color bands), to multispectral, hyperspectral and even ultraspectral, where these concepts progressively increase not only the number of spectral bands collected by the imaging instrument, but also the spectral resolution (modern instruments not only have more bands, but the bands are also more narrow or close to each other). Particularly, hyperspectral images comprise hundreds of narrow spectral bands. As a result, each pixel in a hyperspectral image is not only formed by a single discrete value, but by a wide range of values for different spectral measurements recorded by a sensor or measuring instrument. The collection of all the wavelength values (one per spectral band) which are associated to a given pixel is called a spectral signature (see Fig. 1.2) [2]. As a result, we can understand a hyperspectral image as a collection of spectroscopic measurements that provide very detailed of information on the properties of the materials appearing on the scene. The number and variety of processing tasks in hyperspectral remote sensing is enormous [3]. However, the majority of algorithms can be organized according to the following specific tasks: • Dimensionality reduction consists of reducing the dimensionality of the input hyperspectral scene to facilitate subsequent processing tasks [4]. • Target and anomaly detection consist of searching the pixels of a hyperspectral data cube for rare (either known or unknown) spectral signatures [5]. • Change detection consists of finding the significant (i.e., important to the user) changes between two hyperspectral scenes of the same geographic region [6]. • Classification consists of assigning a label (class) to each pixel of a hyperspectral data cube [3]. • Spectral unmixing consists of estimating the fraction of the pixel area covered by each material present in the scene [7]. This thesis is mainly focused on the integration of two of the aforementioned techniques: classification and spectral unmixing. These are two active areas of research in hyperspectral data interpretation. Here, we explore the possibility of using spectral unmixing concepts to complement supervised and semisupervised classification techniques. This study represents a novel contribution in the hyperspectral imaging community, in which these techniques have been traditionally applied in a separate fashion. On the one hand, spectral unmixing is a fast growing area in which many algorithms have been recently developed to retrieve pure spectral components (endmembers) and to determine their abundance fractions in mixed pixels [7]. Specifically, in hyperspectral imaging there has been a lot of interest to address the problem of mixed pixels, which arise when distinct materials are combined into a homogeneous or intimate mixture. This occurs independently of the spatial resolution of the sensor [8]. In order to mitigate the impact of mixed pixels, several endmember extraction [7, 9, 10] and spectral unmixing approaches [11, 12, 13] have been developed in the literature under the assumption that a single pixel vector may comprise the response of multiple underlying materials. 2 1.1 Context and motivations Figure 1.2: Concept of mixed pixels in hyperspectral image. On the other hand, hyperspectral image classification has also been a very active area of research in recent years [14]. Given a set of observations (i.e., possibly mixed pixel vectors), the goal of classification is to assign a unique label to each pixel so that it is well-defined by a given class [4]. Several techniques have been successfully used to perform hyperspectral data classification, particularly supervised techniques such as kernel methods, which can deal effectively with the Hughes phenomenon [15, 16]. However, supervised classification is generally a difficult task due to the unbalance between the high dimensionality of the data and the limited availability of labeled training samples in real analysis scenarios [14]. These labeled samples generally are difficult and expensive to obtain. This has fostered the development of semisupervised techniques able to exploit unlabeled training samples that can be obtained from a (limited) set of labeled samples without significant effort/cost. At this point, it is important to emphasize that the analysis of hyperspectral images is not an easy task; this is due to the great variability of hyperspectral signatures and the high dimensionality of the data. Another problem that may arise in the analysis of such scenes (as mentioned before) is the intrinsic nature of the pixels, which may be highly mixed. The most traditional approach in the literature to describe the phenomenon of the mixture at sub-pixel levels is the linear mixture model [8]. As opposed to nonlinear unmixing, which generally require detailed information about physical properties that may not be always available, linear spectral unmixing consists of identifying the pure spectral components or endmembers. When the pure spectral signatures are identified, the proportion of each material in each pixel can be estimated. Abundances provide additional information about the composition of each 3 Introduction pixel; if this information is used in a correct way, it may complement the results provided by traditional “hard” classification techniques. For example, Fig. 1.2 illustrates the concept of spectral mixing using a toy example. As Fig. 1.2 shows, it is very likely that the pixel labeled as “vegetation” actually corresponds to several types of vegetation, or even to a mixture of soil and vegetation. Also, the pixel labeled as “atmosphere” may be affected by atmospheric interferers that also participate in the mixture, such as clouds, which absorb only part of the radiation. This example clearly illustrates that the consideration of classification techniques alone may introduce errors in the pixel characterization from a macroscopical point of view, in particular, if the pixels are assumed to be homogeneously formed by a predominant substance. To address these issues, the present thesis work focuses on the development of synergistic approaches for joint classification and spectral unmixing. Even if we adhere to a classification convention in which each pixel is assigned a single class, it is our feeling that spectral unmixing can assist in such process by providing an additional source of information in order to estimate the most accurate classification label to each pixel in the scene. The specific topics that we will discuss in this thesis work can be summarized as follows: • The second chapter of the thesis is related with the high spectral dimensionality associated to hyperspectral scenes, which calls for the development of effective feature extraction approaches that can extract the most relevant features used for classification purposes. The innovative approach explored in this chapter is related to the possibility of using unmixing prior to both supervised and semi-supervised classification. In this context, we explore different ways to obtain the abundances of each pure material in the hyperspectral image and to use this information to assist the classification process. • The third chapter of the thesis expands on these concepts and evaluates several spectral unmixing chains that can be used to extract features based on abundance fractions for subsequent classification using different strategies. The evaluation is conducted using a set of highly representative hyperspectral scenes, and drawing comparisons to other state-of-the-art approaches. • The fourth chapter of the thesis is related to addressing the problem concerning the limited number of labeled training samples that can be traditionally found in practice, which affects the design of supervised classification strategies. The procedure used to collect labeled samples is very expensive and difficult. For this purpose, new semi-supervised classification techniques (some of them based on active learning) have been developed. This area has undergone a significant evolution in terms of the models adopted in recent years. An extensive review of techniques for semi-supervised learning is available in [17]. Our proposed strategy does not require a large number of labeled samples because the classifier is trained with both labeled and unlabeled samples which are generated without extra cost. The unlabeled samples are obtained by the classifier automatically. • Finally, in the fifth chapter of this thesis we develop new strategies that synergistically combine hyperspectral unmixing and classification in order to exploit both sources of information in complementary nature, thus overcoming the limitations of using these techniques in separate fashion. The result is a new framework that integrates hyperspectral unmixing into the classification process, with the possibility to control the relative weight of unmixing with regards to classification 4 1.2 Objectives and vice-versa. We show that spectral unmixing concepts can help in the semi-supervised classification process. The chapter provides ample experimental evidence supporting our claims. 1.2 Objectives The main objective of this thesis is to develop new and efficient techniques that integrate concepts of classification and spectral unmixing, combining their advantages in synergistic fashion while minimizing the disadvantages associated with the separate application of each technique. In order to achieve this general objective, several specific objectives have also been accomplished: 1. To study existing techniques for classification (supervised and semi-supervised) of remotely sensed hyperspectral data sets, with focus on semi-supervised and active learning techniques, evaluating their advantages and disadvantages. 2. To study existing techniques for spectral unmixing in order to evaluate their advantages and disadvantages in hyperspectral image analysis and interpretation. 3. To develop new techniques for hyperspectral image classification, based on the integration of supervised and semi-supervised classification techniques and linear spectral unmixing techniques, and with the ultimate goal of analyzing the advantages that can be obtained from the joint exploitation of these approaches. 4. To evaluate the new classification techniques developed in this work, which result from the combination of traditional classification concepts and also spectral unmixing concepts. In the context of semi-supervised classification, we also analyze existing active learning techniques in order to intelligently select the most informative training samples in the classification process. 5. To design, implement and validate new processing chains based on the integration of techniques for classification and spectral unmixing, thus allowing a thorough comparative study of the new techniques using real hyperspectral data sets obtained by different sensors, such as the Airborne Visible Infra-Red Imaging Spectrometer (AVIRIS), operated by NASA’s Jet Propulsion Laboratory, or the Reflective Optics Spectrographic Imaging System (ROSIS), operated by the German Aerospace Center (DLR). 6. To provide a set of recommendations of use (best practice) for the new classification techniques developed, after carefully evaluating and assessing their performance in terms of classification accuracy using standard metrics such as the overall accuracy (OA), average accuracy (AA) and kappa index [4]. 5 Introduction 1.3 Main contributions of the thesis Figure 1.3: Flowchart illustrating the organization of this thesis. The main contributions of the thesis are summarized in Fig. 1.3. The thesis is structured in a set of chapters which are inter-related as described in Fig. 1.3. As shown in the figure, all the newly developed techniques are in-between spectral unmixing and classification and designed in a way that exploits features from either or from both techniques simultaneously. In the following, we provide a description of the different chapters in which we have structured the present thesis work: 6 1.3 Main contributions of the thesis Figure 1.4: Summary of contributions in Chapter 2. * In Chapter 2 (see Fig. 1.4), we explore the possibility of using spectral unmixing as a way to perform feature extraction prior to classification of hyperspectral data, thus addressing the unbalance between the (limited) number of training samples and the (high) spectral dimensionality of the data. * Chapter 3 is a follow-up to the previous chapter in which different spectral unmixing chains are explored in order to determine the specific unmixing chain (and the number of features) that should be retained by spectral unmixing prior to different classification scenarios, providing recommendations about the best possible use of different chains in different application contexts. The contributions of this chapter are summarized in Fig. 1.5. * Chapter 4 describes a new semi-supervised self-learning approach for the classification of hyperspectral images using unlabeled training samples. The new unlabeled samples are generated using spatial information, and active learning approaches are used to select the most informative samples for the classification process. Fig. 1.6 provides a summary of the contributions in this chapter. 7 Introduction Figure 1.5: Summary of contributions in Chapter 3. * Chapter 5 discusses the integration of spectral unmixing and classification in order to design a new semi-supervised framework using active learning concepts. Several unmixing chains are used for this purpose, including information about mixed pixels in order to incorporate this information into the classification process. Specifically, the idea of integrating classification and spectral unmixing in simultaneous fashion is explored in this chapter. This strategy is summarized in Fig. 1.7. To conclude this chapter, Table 1.1 provides a list of all the acronyms that have been used throughout the thesis document. Hereinafter, these acronyms will be used instead of the full terms for simplicity. 8 1.3 Main contributions of the thesis Figure 1.6: Summary of contributions in Chapter 4. 9 Introduction Figure 1.7: Summary of contributions in Chapter 5. 10 1.3 Main contributions of the thesis Table 1.1: List of acronyms used in this thesis. AA ANC ASC AVIRIS BT CEM CMTMFsup CMTMFunsup DAFE DBFE DLR FCunsup FCLSU HySime ICA JADE LIBSVM LORSAL MAP MBT MLR MNF MS MTMF MTMFavg MTMFsup MTMFunsup nEQB NWFE OA OSP PCA RBF ROSIS SVM SA SMLR SNR TSVM VCA VD Acronyms Average Accuracy [18] Abundance Non-negativity Constraint [19] Abundance Sum-to-one Constraint [19] Airbone Visible Infra-Red Imaging Spectrometer [20] Breaking Ties [21] Constrained Energy Minimization [22] Supervised Clustering followed by Mixture-Tuned Matched Filtering [23] Unsupervised Clustering followed by Mixture-Tuned Matched Filtering [23] Discriminant Analysis for Feature Extraction [14] Decision Boundary Feature Extraction [14] German aerospace center [Online: www.dlr.de/en/] Unsupervised Fuzzy Clustering [24] Fully Constrained Linear Spectral Unmixing [19] Hyperspectral Subspace Identification by Minimum Error [25] Independent Component Analysis [26] Joint Diagonalization of Eigenmatrices [27] Library of SVM [Online: http://www.csie.ntu.edu.tw/∼cjlin/libsvm/] Logistic Regression via variable Splitting and Augmented Lagrangian [28] Maximum A Posteriori [29] Modified Breaking Ties [29] Multinomial Logistic Regression [30] Minimum Noise Fraction [31] Margin Sampling [32] Mixture-Tuned Matched Filtering [33] Averaged Mixture-Tuned Matched Filtering [34] Supervised Mixture-Tuned Matched Filtering [34] Unsupervised Mixture-Tuned Matched Filtering [34] Normalized Entropy Querying by Bagging [35] Non-parametric Weighted Feature Extraction [14] Overall Accuracy [18] Orthogonal Subspace Projection [36] Principal Component Analysis [37] Gaussian Radial Basis function [16] Reflective Optics Spectrographics Imaging System [38] Support Vector Machine [15] Spectral Angle [8] Sparse Multinomial Logistic Regression [39] Signal-to-Noise Ratio [2] Transductive Support Vector Machine [40, 41] Vertex Component Analysis [42] Virtual Dimensionality [43] 11 Introduction 12 Chapter 2 Unmixing Prior to Supervised Classification of Remotely Sensed Hyperspectral Images 2.1 Summary Supervised classification of hyperspectral images is a very challenging task due to the generally unfavorable ratio between the number of spectral bands and the number of training samples available a priori, which results in the Hughes phenomenon. For this purpose, several feature extraction methods have been investigated in order to reduce the dimensionality of the data to the right subspace without significant loss of the original information that allows for the separation of classes. In this chapter, we explore the use of spectral unmixing for feature extraction prior to supervised classification of hyperspectral data using SVM. The proposed feature extraction strategy has been implemented in the form of four different unmixing chains, and evaluated using two different scenes collected by NASA Jet Propulsion Laboratory’s AVIRIS. Experiments suggest competitive results, but also show that the definition of the unmixing chains plays an important role in the final classification accuracy. Moreover, differently from most feature extraction techniques available in literature, the features obtained using linear spectral unmixing are potentially easier to interpret due to their physical meaning.1 2.2 Introduction In many studies, hyperspectral analysis techniques are divided into full-pixel and mixed-pixel classification techniques [8, 37, 44], where each pixel vector defines a spectral signature or fingerprint that uniquely characterizes the underlying materials at each site in a scene. Full-pixel classification techniques assume that each spectral signature comprises the response of one single underlying material. Often, this is not a realistic assumption. If the spatial resolution of the sensor is not fine enough to 1 Part of this chapter has been published in: I. Dopido, M. Zortea, A. Villa, A. Plaza and P. Gamba, Unmixing Prior to Supervised Classification of Remotely Sensed Hyperspectral Images, IEEE Geoscience and Remote Sensing Letters, vol. 8, no. 4, pp. 760-764, July 2011 [JCR(2011)=1.560]. Unmixing Prior to Supervised Classification of Remotely Sensed Hyperspectral Images separate different pure signature classes at a macroscopic level, these can jointly occupy a single pixel, and the resulting spectral signature will be a composite of the individual pure spectral, often called endmembers in hyperspectral imaging terminology [45]. Let us denote a remotely sensed hyperspectral scene with n bands by I, in which each pixel is represented by a vector X = [x1 , x2 , · · · , xn ] ∈ ℜn , where ℜ denotes the set of real numbers in which the pixel’s spectral response xk at sensor channels k = 1, . . . , n is included. Under the linear mixture model assumption, each pixel vector can be modeled using: X= p ∑ Φz · Ez + n, (2.1) z=1 where Ez denotes the spectral response of endmember z, Φz is a scalar value designating the fractional abundance of the endmember z at the pixel X, p is the total number of endmembers, and n is a noise vector. Two physical constrains can be imposed into the model described in (3.1): ANC, i.e., Φz ≥ 0, ∑p and ASC, i.e., z=1 Φz = 1 [19]. Several machine learning techniques have been applied, under the full-pixel assumption, to extract relevant information from hyperspectral data. The good classification performance exhibited by SVM [15, 44, 46] using spectral signatures as input features, can be improved by applying suitable feature extraction strategies able to reduce the dimensionality of the data to a subspace without losing the original information [47, 48]. We consider three traditional feature extraction techniques addressing the aforementioned issues: • PCA is an orthogonal linear transformation which projects the data into new coordinate system, such that the greatest amount of variance of the original data is contained in the first principal components [37]. The resulting components are uncorrelated. • MNF differs from PCA in the fact that MNF ranks the obtained components according to their SNR [31]. • ICA tries to find components as statistically independent as possible, minimizing all the dependencies in the order up to fourth [26]. There are several strategies that can be adopted to define independence (i.e., minimization of mutual information, maximization of non-Gaussianity, etc.). In this chapter, among several possible implementations, we have chosen JADE [27] which provides a good tradeoff between performance and computational complexity, when used for dimensionality reduction of hyperspectral images. However, all these methods maximize the information contained in the first transformed components, relegating variations of less significant size to low order components. If such low order components are not preserved, small classes may be affected. The inclusion of spatial features such as morphological profiles can be used to address this issue [38, 47, 49]. In this chapter, we explore an alternative strategy focused on the use of spectral unmixing for feature extraction prior to classification. Previous efforts in this direction were presented in [50, 51], but the analysis of whether spectral unmixing can replace standard feature extraction transformations remains an unexplored topic. Although classification techniques often neglect the impact of mixed pixels in the 14 2.3 Unmixing-based feature extraction provision of a set of final class labels, widely used benchmark data sets in the literature –i.e., the AVIRIS Indian Pines scene– are known to be dominated by mixed pixels, even if the associated ground-truth information is only available in full-pixel form. Hence, the use of spectral unmixing presents distinctive features with regards to other approaches such as PCA, MNF or ICA. First, it provides additional information for classification in hyperspectral analysis scenarios with moderate spatial resolution, since the sub-pixel composition of training samples can be used as part of the learning process of the classifier. Second, the components estimated by spectral unmixing can be physically explained as the abundances of spectral endmembers. Third, spectral unmixing does not penalize classes which are not relevant in terms of variance or SNR. Here, we design different unmixing processing chains with the goal of addressing three specific research questions: 1. Is spectral unmixing a feasible strategy for feature extraction prior to classification? 2. Does the inclusion of spatial information at the endmember extraction stage lead to better classification results? 3. Is it really necessary to estimate pure spectral endmembers for classification purposes? We have structured the remainder of this chapter as follows. Section 2.3 describes the considered spectral unmixing chains. Section 2.4 presents different experiments specifically designed to address the research questions above and provide a comparison between the proposed unmixing-based strategy and other feature extraction approaches in the literature. Section 2.5 concludes with some remarks and future research avenues. 2.3 2.3.1 Unmixing-based feature extraction Unmixing chain #1 In this subsection we describe our first approach to design an unmixing-based feature extraction chain which can be summarized by the flowchart in Fig. 2.1. First, we estimate the number of endmembers, p, directly from the original n-dimensional hyperspectral image I. For this purpose, we use in this chapter two standard techniques widely used in the literature such as the HySime method [25] and the VD concept [43]. Once the number of endmembers p has been estimated, we apply an automatic algorithm to extract a set of endmembers from the original hyperspectral image [9]. Here, we use OSP technique [36] which has been shown in previous work to provide a very good trade-off between the signature purity of the extracted endmembers and the computational time to obtain them. Preliminary experiments conducted with other endmember extraction techniques, such as VCA [42] and N-FINDR [52], have shown very similar results in terms of classification accuracy. Finally, linear spectral unmixing (either unconstrained or constrained) can be used to estimate the abundance of each endmember in each pixel of the scene, providing a set of p abundance maps. Then, standard SVM classification is performed on the stack of abundance fractions using randomly selected training samples. 15 Unmixing Prior to Supervised Classification of Remotely Sensed Hyperspectral Images Figure 2.1: Unmixing-based feature extraction chains #1 (spectral endmembers) and #2 (spatial-spectral endmembers). 2.3.2 Unmixing chain #2 In this subsection we introduce a variation of the unmixing-based feature extraction chain which includes spatial preprocessing prior to endmember extraction in order to guide the endmember searching process to those areas which are more spatially homogeneous. This approach is represented in Fig. 2.1. The spatial preprocessing strategy adopted in this work is described in detail in [53]. As in the previous chain, the features resulting from the proposed (spatially enhanced) unmixing process are used to train an SVM classifier with a few randomly selected labeled samples. The classifier is then tested using the remaining labeled samples. 2.3.3 Unmixing chain #3 Our main motivation for introducing a third unmixing-based feature extraction chain is the fact that the estimation of the number of endmembers p in the original image is a very challenging issue. Fig. 2.2 describes a new chain in which the endmembers are extracted from the set of available (labeled) training samples instead of from the original image. This chain introduces two important variations: 1) first, as a simplification to the challenging estimation problem, the number of endmembers to be extracted is set as the total number of different classes, c, in the training set; and 2) the endmember searching 16 2.3 Unmixing-based feature extraction Figure 2.2: Unmixing-based feature extraction chain #3 (chain #4 replaces endmember extraction with averaging of the signatures associated to each labeled class in the training set). process is conducted only on the training set, which reduces computational complexity. However, the number of endmembers in the original image, p, is probably different than c, the number of labeled classes. Therefore, in order to unmix the original image we need to address a partial unmixing problem (in which not all endmembers may be available a priori ). A successful technique for this purpose is MTMF [33], also known as CEM [22], which combines linear spectral unmixing and statistical matched filtering. From matched filtering, it inherits the ability to map a single known target without knowing the other background endmember signatures. From spectral mixture modeling, it inherits the leverage arising from the mixed pixel model and the constraints on feasibility. 2.3.4 Unmixing chain #4 The fourth unmixing chain tested in our experiments [54] represents a slight variation of the unmixing chain #3 in which the spectral signatures used for unmixing purposes are not obtained via endmember extraction but through averaging of the spectral signatures associated to each labeled class in the training set. To keep the number of estimated components low, only one component is allowed for each class. This averaging strategy produces c signatures, each representative of a labeled class, which are then used to partially unmix the original hyperspectral scene using MTMF. 17 Unmixing Prior to Supervised Classification of Remotely Sensed Hyperspectral Images (a) (b) Figure 2.3: (a) False color composition of the AVIRIS Indian Pines scene. (b) Ground truth-map containing 15 mutually exclusive land-cover classes (right). 2.4 Experimental results 2.4.1 Hyperspectral data sets 2.4.1.1 AVIRIS Indian Pines The data set used in our experiments was collected by the AVIRIS sensor over the Indian Pines region in Northwestern Indiana in 1992. This scene, with a size of 145 lines by 145 samples, was acquired over a mixed agricultural/forest area, early in the growing season. The scene comprises 202 spectral channels in the wavelength range from 0.4 to 2.5 µm, nominal spectral resolution of 10 nm, moderate spatial resolution of 20 meters by pixel, and 16-bit radiometric resolution. After an initial screening, several spectral bands were removed from the data set due to noise and water absorption phenomena, leaving a total of 164 radiance channels to be used in the experiments. For illustrative purposes, Fig. 2.3 (a) shows a false color composition of the AVIRIS Indian Pines scene, while Fig. 2.3 (b) shows the ground-truth map available for the scene, displayed in the form of a class assignment for each labeled pixel, with 16 mutually exclusive ground-truth classes. These data, including ground-truth information, are available online2 , a fact which has made this scene a widely used benchmark for testing the accuracy of hyperspectral data classification algorithms. 2.4.1.2 AVIRIS Kennedy Space Center The data set was collected by the AVIRIS sensor over the Kennedy Space Center3 , Florida, on March 1996. The portion of this scene used in our experiments has dimensions of 292 × 383 pixels. After removing water absorption and low SNR bands, 176 bands were used for the analysis. The spatial 2 http://dynamo.ecn.purdue.edu/biehl/MultiSpec 3 Available online:http://www.csr.utexas.edu/hyperspectral/data/KSC/ 18 2.4 Experimental results resolution is 20 meters by pixel. 12 ground-truth classes where available, where the number of pixels in the smallest class is 134 while the number of pixels in the largest class is 761. 2.4.2 Experiments 2.4.2.1 Experiment 1. Use of unmixing as a feature extraction strategy In this experiment, we use the AVIRIS Indian Pines and Kennedy Space Center data sets to analyze the impact of imposing ANC and ASC in abundance estimation prior to classification. For the AVIRIS Indian Pines image, we construct ten small training sets by randomly selecting 5%, 10% and 15% of the ground-truth pixels. For the AVIRIS Kennedy Space Center, since the size of the smaller classes is bigger, we decided to reduce the training sets even more and selected 1%, 3% and 5% of the available ground-truth pixels. Then, the three considered types of input features (original, reduced and unmixing-based) are built for the selected training samples and used to train an SVM classifier in which two types of kernels: polynomial and Gaussian are used. The SVM was trained with each of these training subsets and then evaluated with the remaining test set. Each experiment was repeated ten times, and the mean and standard deviation accuracy values were reported. Kernel parameters were optimized by a grid search procedure, and the optimal parameters were selected using 10-fold cross-validation. The LIBSVM library4 was used for experiments. Table 2.1 summarizes the OAs obtained after applying the considered SVM classification system (with polynomial and Gaussian kernels) to the features extracted after applying the unmixing chain #1 (see Fig. 2.1) to the AVIRIS scenes. The dimensionality of the input data, as estimated by a consensus between the HySime and the VD methods, was p = 18 for the Indian Pines scene, and p = 15 for the Kennedy Space Center scene. The chain #1 was implemented using two different linear spectral unmixing algorithms [19]: unconstrained and fully constrained; due to better accuracy and faster computation, only results for the unconstrained case are presented. The results after applying the classification system to the original spectral features, and to those extracted using PCA, MNF and ICA are also reported. As shown by Table 2.1, the classification accuracy is correlated with the training set size (the larger the training set, the higher the classification accuracy). The good generalization ability exhibited by SVMs is demonstrated by the classification results reported for the original spectral information, even with very limited training sets. The fact that MNF is more effective than PCA and ICA for feature extraction purposes is also remarkable, since MNF has been more widely used in the context of spectral unmixing rather than classification. Most importantly, Table 2.1 also reveals that the use of unmixing chain #1 as feature extraction strategy cannot improve the classification results provided by PCA, MNF, ICA or the original spectral information. This is because endmember extraction is generally sensitive to outliers and anomalies, hence a strategy for directing the endmember searching process to spatially homogeneous areas could improve the final classification results. 4 http://www.csie.ntu.edu.tw/∼cjlin/libsvm/ 19 Image Type of feature Original spectral information PCA MNF ICA Chain #1 Chain #2 Chain #3 Chain #4 Image Type of feature Original spectral information PCA MNF ICA Chain #1 Chain #2 Chain #3 Chain #4 20 # of features 176 15 15 15 15 15 12 12 # of features 202 18 18 18 18 18 16 16 AVIRIS Indian Pines Polynomial kernel Gaussian kernel 5% 10% 15% 5% 10% 15% 75.23±1.23 81.55±0.86 83.58±0.78 75.78±1.06 82.11±0.43 84.49±0.53 77.07±1.46 81.66±0.88 83.11±0.52 77.12±1.29 81.68±0.61 82.96±0.58 82.97±1.93 87.41±0.31 88.38±0.57 84.04±0.75 87.66±0.52 89.34±0.43 76.63±1.27 81.00±0.71 82.94±0.36 76.92±0.72 81.27±0.61 82.95±0.71 74.56±1.04 79.20±1.12 80.97±0.50 74.65±0.99 79.45±0.40 80.91±0.39 71.93±0.96 77.58±0.92 79.31±0.33 72.31±0.98 77.36±0.72 79.17±0.28 81.32±0.84 85.56±0.84 86.83±0.55 81.78±0.62 86.12±0.66 87.40±0.82 82.36±1.09 86.87±0.59 87.97±0.57 82.72±1.04 87.59±0.57 88.92±0.80 AVIRIS Kennedy Space Center Polynomial kernel Gaussian kernel 1% 3% 5% 1% 3% 5% 70.97±3.32 82.53±1.63 85.71±1.40 72.26±2.42 82.91±1.38 85.50±1.35 73.52±3.69 83.26±1.26 86.11±1.16 74.66±2.94 82.54±1.70 86.28±1.46 77.01±3.77 86.85±2.19 89.59±1.89 77.94±3.48 87.43±2.11 90.01±1.52 70.09±2.91 80.28±1.73 84.59±1.50 70.39±1.58 80.79±1.60 84.58±1.58 69.41±2.64 78.62±1.58 82.84±1.17 69.02±5.40 79.08±1.46 83.53±1.25 67.91±3.98 78.61±3.56 84.26±1.41 68.56±4.70 83.86±1.89 83.86±1.22 74.28±3.23 85.37±1.30 87.88±1.57 75.02±4.13 84.92±1.97 88.47±1.38 76.10±2.49 86.38±1.40 87.84±1.28 77.53±2.58 86.57±0.97 87.72±1.13 Table 2.1: Classification accuracies (percentage) and standard deviation obtained after applying the consider ed SVM classification system (with Gaussian and polynomial kernels) to three different types of features (original, reduced and unmixing-based) extracted from the AVIRIS Indian Pines and Kennedy Space Center scenes (ten randomly chosen training sets). Unmixing Prior to Supervised Classification of Remotely Sensed Hyperspectral Images 2.4 Experimental results 2.4.2.2 Experiment 2. extraction stage Impact of including spatial information at the endmember In this experiment we apply the unmixing chain #2 for feature extraction prior to classification. As shown by Table 2.1, spatial preprocessing prior to endmember extraction cannot lead to improved classification results with regards to the chain #1 and the original spectral information. This is due to the spectral similarity of the most spatially representative classes in our considered scenes. For instance, in the AVIRIS Indian Pines scene the corns and soybeans were very early in their growth cycle at the time of data collection, which resulted in low coverage of the soil (≈ 5%) [14]. Given this low canopy ground cover, the variation in spectral response among different classes is very low and spatial information cannot significantly increase discrimination between different classes. In order to address this issue, a possible solution is to conduct the endmember extraction process in supervised fashion, taking advantage of the information contained in the available labeled samples in order to guarantee that a highly representative endmember is selected per each class. 2.4.2.3 Experiment 3. Impact of endmember purity on the final classification results In a supervised endmember extraction framework, our first experiment is based on applying the unmixing chain #3 to select endmembers only from the available training samples. Apart from reducing computational complexity (which in this case involves a search for c endmembers in the pixels belonging to the training set), table 2.1 reveals that this strategy improves the classification results reported for the chains #1 and #2. However, in order to make sure that only one endmember per labeled class is used for unmixing purposes, we also apply unmixing chain #4 in which spectral averaging of the available training pixels in each class is conducted in order to produce a final set of c spectral signatures. Despite averaging of endmembers can lead to degradation of spectral purity, it can also reduce the effects of noise and/or average out the subtle spectral variability of a given class, thus obtaining a more representative endmember for the class as a whole. This is illustrated by the classification results for unmixing chain #4 in Table 2.1, which outperform those reported for most other tested methods except MNF. This indicates that, in a supervised unmixing scenario, the use of spectrally pure signatures is not as important as the choice of signatures which are representative of the available training samples. Table 2.2 shows the statistical differences (average value of ten comparisons) between traditional dimensionality reduction methods and the unmixing chains #3 and #4, computed using McNemar’s test [55] for the case of the polynomial kernel. The differences are statistically significant at a confidence level of 95% if |Z| > 1.96. For each couple of compared feature extraction chains, we report also how many times each chain wins/ties/loses after comparing the thematic maps obtained using the same training set. If the value of Z reported for each entry of Table 2.2 is positive and larger than 1.96, the first compared chain wins. By convention, the comparison is always performed with the first chain in a line of Table 2.2 and the second chain in a column of Table 2.2. It can be noticed that unmixing chains #3 and #4 always perform significantly better than PCA and ICA. MNF performs better than chain #3, while the differences with chain #4 are in general not significant. To conclude this section, Fig. 2.4 displays the best classification results (out of 10 runs) obtained after applying the SVM –trained with 10% of the available training samples– to each feature extraction 21 AVIRIS Indian Pines PCA MNF ICA AVIRIS Kennedy Space Center PCA MNF ICA 5% Chain #3 Chain #4 -9.52 (0/0/10) -11.88 (0/0/10) 5.22 (8/1/1) 2.05 (6/2/2) -10.45 (0/0/10) -12.85 (0/0/10) 1% Chain #3 Chain #4 -2.09 (1/2/7) -5.73 (0/2/8) 3.08 (6/3/1) -0.58 (3/4/3) -5.32 (0/0/10) -7.90 (0/0/10) 10% Chain #3 Chain #4 -9.24 (0/0/10) -12.40 (0/0/10) 6.13 (10/0/0) 1.72 (5/4/1) -17.37(0/0/10) -20.26(0/0/10) 3% Chain #3 Chain #4 -2.61 (1/2/7) -5.14 (0/0/10) 3.48 (6/2/2) 0.78 (5/1/4) -7.14 (0/0/10) -8.82 (0/0/10) 15% Chain #3 Chain #4 -8.86(0/0/10) -11.40 (0/0/10) 5.23 (10/0/0) 1.35 (3/7/0) -9.28 (0/0/10) -11.87 (0/0/10) 5% Chain #3 Chain #4 -3.78 (0/1/9) -2.56 (0/4/6) 2.42 (6/2/2) 4.04 (7/3/0) -5.08 (0/1/9) -5.29 (0/0/10) Table 2.2: Statistical differences evaluated using McNemar’s test (polynomial kernel). Unmixing Prior to Supervised Classification of Remotely Sensed Hyperspectral Images 22 2.4 Experimental results (a) Ground-truth (b) Original image (84.27%) (c) PCA (83.33%) (d) MNF (89.41%) (e) Chain #1 (81.29%) (f) Chain #2 (79.64%) (g) Chain #3 (87.99%) (h) Chain #4 (89.26%) Figure 2.4: Best classification results for AVIRIS Indian Pines (using SVM classifier with Gaussian kernel, trained with 10 percentage of the available samples per class). 23 Unmixing Prior to Supervised Classification of Remotely Sensed Hyperspectral Images (i) Class accuracies (MNF vs. Chain #4) Figure 2.5: The comparative classification results per class for AVIRIS Indian Pines (using SVM classifier with Gaussian kernel, trained with 10 percentage of the available samples per class) with MNF and Unmixing Chain#4. strategy considered for the AVIRIS Indian Pines scene. As shown by Fig. 2.5, both MNF and the chain #4 provide the best classification scores, with less confusion in heavily mixed classes. grass/trees, and reasonable confusion between spectrally very similar classes such as corn and corn-min, or between soybeans-notill and soybeans-min. 2.5 Final observations and future directions In this chapter, we have investigated several strategies to extract relevant features from hyperspectral scenes prior to classification. For classification scenarios using SVMs trained with relatively small subsets of labeled samples, our experimental results reveal that MNF greatly improves accuracies when compared to the more well-known PCA and ICA transformations, used as an unsupervised feature reduction tool 24 2.5 Final observations and future directions prior to classification. Due to the reduced dimensionality, classification using both MNF and PCA subspaces generally improved the OA when compared to using all the original pixel’s spectral signature. Results indicate that the proposed unmixing-based feature extraction chains can provide an alternative strategy to PCA or MNF by incorporating information about the (possibly) mixed nature of the training samples during the learning stage, with the potential advantage of improved interpretability of features due the physical nature of the extracted abundance maps. Although final classification accuracies are likely to be dependent on the particular data set considered, the chains tested suggest higher accuracies with respect to traditional methods, such as PCA and ICA, and comparable accuracies related to MNF. Further research is needed to define an optimality criterion to design unmixing chains as a feature reduction tool for classification purposes. A start point might be the chain #4 which indicates that, in the context of a supervised unmixing scenario, the use of spectrally pure signatures is not as important as the choice of signatures which are highly representative of the available training samples. 25 Unmixing Prior to Supervised Classification of Remotely Sensed Hyperspectral Images 26 Chapter 3 A Comparative Assessment of Unmixing-Based Feature Extraction Techniques 3.1 Summary Over the last years, many feature extraction techniques have been integrated in processing chains intended for hyperspectral image classification. In the context of supervised classification, it has been shown that the good generalization capability of machine learning techniques such as the SVM can still be enhanced by an adequate extraction of features prior to classification, thus mitigating the curse of dimensionality introduced by the Hughes effect. Recently, a new strategy for feature extraction prior to classification based on spectral unmixing concepts has been introduced. This strategy has shown success when the spatial resolution of the hyperspectral image is not enough to separate different spectral constituents at a sub-pixel level. Another advantage over statistical transformations such as PCA or MNF is that unmixing-based features are physically meaningful since they can be interpreted as the abundance of spectral constituents. In turn, previously developed unmixing-based feature extraction chains do not include spatial information. In this chapter, two new contributions are proposed. First, we develop a new unmixing-based feature extraction technique which integrates the spatial and the spectral information using a combination of unsupervised clustering and partial spectral unmixing. Second, we conduct a quantitative and comparative assessment of unmixing-based versus traditional (supervised and unsupervised) feature extraction techniques in the context of hyperspectral image classification. Our study, conducted using a variety of hyperspectral scenes collected by different instruments, provides practical observations regarding the utility and type of feature extraction techniques needed for different classification scenarios.5 5 Part of this chapter has been published in: I. Dopido, A. Villa, A. Plaza and P. Gamba, A Quantitative and Comparative Assessment of Unmixing-Based Feature Extraction Techniques for Hyperspectral Image Classification, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 5, no. 2, pp. 421-435, April 2012 [JCR(2012)=2.874]. A Comparative Assessment of Unmixing-Based Feature Extraction Techniques 3.2 Introduction The rich spectral information available in remotely sensed hyperspectral images allows for the possibility to distinguish between spectrally similar materials [44]. However, supervised classification of hyperspectral images is a very challenging task due to the generally unfavorable ratio between the (large) number of spectral bands and the (limited) number of training samples available a priori, which results in the Hughes phenomenon [56]. As shown in [57], when the number of features considered for classification is larger than a given threshold (which is strongly application-dependent), the classification accuracy starts to decrease. The application of methods originally developed for the classification of lower dimensional data sets (such as multispectral images) provides therefore poor results when applied to hyperspectral images, especially in the case of small training sets [14]. On the other hand, the collection of reliable training samples is very expensive in terms of time and finance, and the possibility to exploit large ground truth information is not common [58]. To address this issue, a dimensionality reduction step is often performed prior to the classification process, in order to bring the information in the original space (which in the case of hyperspectral data is almost empty [14]) to the right subspace which allows separating the classes by discarding information that is useless for classification purposes. Several feature extraction techniques have been proposed to reduce the dimensionality of the data prior to classification, thus mitigating the Hughes phenomenon. These methods can be unsupervised (if a priori information is not available) or supervised (if available training samples are used to project the data onto a classification-optimized subspace [59, 60]). Classic unsupervised techniques include PCA [4], MNF [31], or ICA [26]. Supervised approaches comprise DAFE, DBFE, and NWFE, among many others [14, 37]. In the context of supervised classification, kernel methods have been widely used due to their insensitivity to the curse of dimensionality [16]. However, the good generalization capability of machine learning techniques such as SVM [46] can still be enhanced by an adequate extraction of relevant features to be used for classification purposes [47], especially if limited training sets are available a priori. Recently, we have investigated this issue by developing a new set of feature extraction techniques based on spectral unmixing concepts (see chapter 2 of this thesis and references [34, 48]). These techniques are intended to take advantage of spectral unmixing models [8] in the characterization of training samples, thus including additional information about sub-pixel composition that can be exploited at the classification stage. Another advantage of unmixing-based techniques over statistical transformations such as PCA, MNF or ICA is the fact that the features derived by spectral unmixing are physically meaningful since they can be interpreted as the abundance of spectrally pure constituents. Although unmixing-based feature extraction offers an interesting alternative to classic (supervised and unsupervised approaches), several important aspects deserve further attention [61]: 1. First, the unmixing-based chains discussed in chapter 2 do not include spatial information, which is an important source of information since hyperspectral images exhibit spatial correlation between image features. 2. Second, the study in chapter 2 suggested that partial unmixing [33, 62] could be an effective solution to deal with the likely fact that not all pure spectral constituents in the scene (needed for spectral 28 3.3 A new unmixing-based feature extraction technique unmixing purposes) are known a priori, but a more exhaustive investigation of partial unmixing (particularly in combination with spatial information) is needed. 3. Finally, the number of features to be extracted prior to classification was set in chapter 2 to an empirical value given by the intrinsic dimensionality of the input data. However, in the context of supervised feature extraction the number of features to be retained is probably linked to the characteristics of the training set rather than the full hyperspectral image. Hence, a detailed investigation of the optimal number of features that need to be extracted prior to classification is highly desirable as a follow-up to the experiments conducted in chapter 2 of this thesis. In this chapter, we address the aforementioned issues by means of two highly innovative contributions. First, a new feature extraction technique exploiting sub-pixel information is proposed. This approach integrates spatial and spectral information using unsupervised clustering in order to define spatially homogeneous regions prior to the partial unmixing stage. A second contribution of this chapter is a detailed investigation on the issue of how many (and what type of) features should be extracted prior to SVM-based classification of hyperspectral data. For this purpose, different types of (classic and unmixingbased) feature extraction strategies, both unsupervised and supervised in nature, are considered. The remainder of the chapter is organized as follows. Section 3.3 describes a new unmixing-based feature extraction technique which integrates the spatial and the spectral information. A supervised and an unsupervised version of this technique are developed. Section 3.4 describes several representative hyperspectral scenes which have been used in our experiments. This includes three scenes collected by AVIRIS [20] system over the regions of Indian Pines, Indiana, Kennedy Space Center, Florida, and Salinas Valley, California, and also a hyperspectral scene collected by ROSIS [38] over the city of Pavia, Italy. Section 3.5 provides an experimental comparison of the proposed feature extraction chains with regards to other classic and unmixing-based approaches, using the four considered hyperspectral image scenes. Section 3.6 concludes with some remarks and hints at plausible future research lines. 3.3 A new unmixing-based feature extraction technique This section is organized as follows. In subsection 3.3.1 we fix notation and describe some general concepts about linear spectral unmixing, adopted as our baseline mixture model due to its simplicity and computational tractability. Subsection 3.3.2 describes an unsupervised feature extraction strategy based on spectral unmixing concepts. This strategy first performs k-means clustering, searching for as many classes as the number of features that need to be retained. The centroids of each cluster are considered as the endmembers, and then the features are obtained by applying spectral unmixing for abundance estimation. The main objective of this chain is to solve problems highlighted by endmember extraction based algorithms, which are sensitive to outliers and pixels with extreme values of reflectance. By using an unsupervised clustering method, the endmembers extracted are expected to be more spatially significant. Finally, subsection 3.5.1.2 describes a modified version of the feature extraction technique in which the endmembers are searched in the available training set instead of the entire original image. Here, our assumption is that training samples may better represent the available land cover classes in the subsequent classification process. 29 A Comparative Assessment of Unmixing-Based Feature Extraction Techniques 3.3.1 Linear spectral unmixing Since in this chapter we will include the spatial information together with the spectral information when describing the discussed unmixing chains, a slight abuse of notation is used here (with regards to the notations introduced in chapter 2) in order to redefine the mathematical formulation of linear spectral unmixing using the spatial coordinates of the pixels involved. Let us denote a remotely sensed hyperspectral scene with n bands by I, in which the pixel at the discrete spatial coordinates (i, j) of the scene is represented by a vector X(i, j) = [x1 (i, j), x2 (i, j), · · · , xn (i, j)] ∈ ℜn , where ℜ denotes the set of real numbers in which the pixel’s spectral response xk (i, j) at sensor channels k = 1, . . . , n is included. Under the linear mixture model assumption, each pixel vector in the original scene can now be modeled using the following expression: X(i, j) = p ∑ Φz (i, j) · Ez + n(i, j), (3.1) z=1 where Ez denotes the spectral response of endmember z, Φz (i, j) is a scalar value designating the fractional abundance of the endmember z at the pixel X(i, j), p is the total number of endmembers, and n(i, j) is a noise vector. An unconstrained solution to Eq. (3.1) is simply given by the following expression [2]: Φ̂UC (i, j) = (ET E)−1 ET X(i, j). (3.2) Two physical constrains are generally imposed into the model described in Eq. (3.1), these are ANC, ∑p i.e., Φz (i, j) ≥ 0, and ASC, i.e., z=1 Φz (i, j) = 1 [19]. Imposing the ASC constraint results in the following optimization problem: { } T minΦ(i,j)∈∆ (X(i, j) − Φ(i, j) · E) (X(i, j) − Φ(i, j) · E) , subject to: ∆ = { p } ∑ Φ(i, j) Φz (i, j) = 1 . (3.3) z=1 Similarly, imposing the ANC constraint results in the following optimization problem: } { T minΦ(i,j)∈∆ (X(i, j) − Φ(i, j) · E) (X(i, j) − Φ(i, j) · E) , subject to: ∆ = {Φ(i, j)|Φz (i, j) ≥ 0 for all 1 ≤ z ≤ p} . (3.4) As indicated in [19], a fully constrained (i.e., ASC-constrained and ANC-constrained) estimate can be obtained in least-squares sense by solving the optimization problems in Eq. (3.3) and Eq. (3.4) simultaneously. However, in order for such estimate to be meaningful, it is required that the spectral signatures of all endmembers, i.e., {Ez }pz=1 , are available a priori, which is not always possible. In the case where not all endmember signatures are available in advance, partial unmixing has emerged as a suitable alternative to solve the linear spectral unmixing problem [33]. 30 3.3 A new unmixing-based feature extraction technique 3.3.2 Unsupervised unmixing-based feature extraction In this subsection we describe our first approach to design a new unmixing-based feature extraction technique which integrates spatial and spectral information. It can be summarized by the flowchart in Fig. 3.1. First, we apply the k-means algorithm [63] to the original hyperspectral image. Its goal is to determine a set of c points, called centers, so as to minimize the mean squared distance from each pixel vector to its nearest center. The algorithm is based on the observation that the optimal placement of a center is at the centroid of the associated cluster. It starts with a random initial placement. At each stage, the algorithm moves every center point to the centroid of the set of pixel vectors for which the center is a nearest neighbor according to SA [8], and then updates the neighborhood by recomputing the SA from each pixel vector to its nearest center. These steps are repeated until the algorithm converges to a point that is a minimum for the distortion [63]. The output of k-means is a set of spectral clusters, each made up of one or more spatially connected regions. In order to determine the number of clusters (endmembers) in advance, techniques used to estimate the number of endmembers like VD [43] or HySime [25] can be used. In our experiments we vary the number of clusters in a certain range in order to analyze the impact of this parameter. In fact, our main motivation for using a partial unmixing technique at this point is the fact that the estimation of the number of endmembers in the original image is a very challenging issue. It is possible that the actual number of endmembers in the original image, p, is larger than the number of clusters derived by k-means. In this case, in order to unmix the original image we need to address a situation in which not all endmembers may be available a priori ). It has been shown in the previous chapter that the FCLSU technique does not provide accurate results in this scenario [34]. In turn, it is also possible that p ≤ c. In this case, partial unmixing has shown great success [33] in abundance estimation. Following this line of reasoning, we have decided to resort to partial unmixing techniques in this chapter. A successful technique to estimate abundance fractions in such partial unmixing scenarios is MTMF [33] –also known in the literature as CEM [2, 62]– which combines the best parts of the linear spectral unmixing model and the statistical matched filter model while avoiding some drawbacks of each parent method. From matched filtering, it inherits the ability to map a single known target without knowing the other background endmember signatures, unlike the standard linear unmixing model. From spectral mixture modeling, it inherits the leverage arising from the mixed pixel model and the constraints on feasibility including the ASC and ANC requirements. It is essentially a target detection algorithm designed to identify the presence (or absence) of a specified material by producing a score of 1 for pixels wholly covered by the material of interest, while keeping the average score over an image as small as possible. It uses just one endmember spectrum (that of the target of interest) and therefore behaves as a partial unmixing method that suppresses background noise and estimates the sub-pixel abundance of a single endmember material without assuming the presence of all endmembers in the scene, as it is the case with FCLSU. If we assume that Ez is the endmember to be characterized, MTMF estimates the abundance fraction Φ(i, j) of Ez in a specific pixel vector X(i, j) of the scene as follows: Φ̂MTMF (i, j) = ((ETz R−1 Ez )−1 R−1 Ez )T X(i, j), 31 (3.5) A Comparative Assessment of Unmixing-Based Feature Extraction Techniques Figure 3.1: Block diagram illustrating an unsupervised clustering followed by MTMF (CMTMFunsup ) technique for unmixing-based feature extraction. where R is the matrix: R= y x 1 ∑∑ X(i, j)XT (i, j), s × l i=1 j=1 (3.6) with s and l respectively denoting the number of samples and the number of lines in the original hyperspectral image. As shown by Fig. 3.1, the features resulting from the proposed unmixing-based technique, referred to hereinafter as unsupervised clustering followed by MTMF (CMTMFunsup ) [23], are used to train an SVM classifier with a few randomly selected labeled samples. The classifier is then tested using the remaining labeled samples. 3.3.3 Supervised unmixing-based feature extraction Fig. 3.2 describes a variation of the CMTMFunsup technique presented in the previous subsection in which the endmembers are extracted from the available (labeled) training samples instead of from the original image. This introduces two main properties with regards to CMTMFunsup : 1) the number of endmembers to be extracted is given by the total number of different classes, c, in the labeled samples available in the training set, and 2) the endmembers (class centers) are obtained after clustering the training set, which reduces computational complexity significantly. The increase in computational performance comes at the expense of introducing an additional consideration. In this scenario, it is likely that the actual 32 3.4 Hyperspectral data sets Figure 3.2: Block diagram illustrating a supervised clustering followed by MTMF (CMTMFsup ) technique for unmixing-based feature extraction. number of endmembers in the original image, p, is larger than the number of different classes comprised by available labeled training samples, c. Therefore, in order to unmix the original image we again need to address a partial unmixing problem. Then, as shown by Fig. 3.2, standard SVM classification is performed on the stack of abundance fractions using randomly selected training samples. Hereinafter, we refer to the feature extraction technique described in Fig. 3.2 as supervised clustering followed by MTMF (CMTMFsup ) [23]. 3.4 Hyperspectral data sets In order to have a fair experimental comparison between the proposed and available feature extraction approaches, several representative hyperspectral data sets are investigated. In this chapter, we have considered four different images captured by two different sensors: AVIRIS and ROSIS. The images span a wide range of land cover use, from agricultural areas of Indian Pines and Salinas, to urban zones in the town of Pavia and mixed vegetation/urban features in Kennedy Space Center. The AVIRIS Indian Pines scene was already described in subsection 2.4.1.1, while the AVIRIS Kennedy Space Center scene was described in subsection 2.4.1.2. Hence, here we will only describe the remaining two scenes. The number of ground-truth pixels per class for all the considered hyperspectral images is given in Table 3.1. In the following, we briefly describe each of the data sets considered in our study. 33 AVIRIS Indian Pines Alfalfa (54) Corn-Notill (1434) Corn-Min (834) Corn (234) Grass-Pasture (497) Grass-Trees (747) Grass-Pasture-Mowed (26) Hay-Windrowed (489) Oats (20) Soybeans-Notill (968) Soybeans-Min (2468) Soybeans-Clean (614) Wheat (212) Woods (1294) Bldg-Grass-Tree-Drives (380) Stone-Steel-Towers (95) AVIRIS Kennedy Space Center Scrub (761) Willow (243) Hammock (256) Oak (252) Slash Pine (161) Oak/Broadleaf (229) Hardwood Swamp (105) Graminoid Marsh (311) Spartina Marsh (520) Cattail Marsh (404) Salt Marsh (186) Mud Flats (134) AVIRIS Salinas Valley Broccoli Green Weeds 1 (1893) Broccoli Green Weeds 2 (3704) Fallow (1960) Fallow Rough Plow (1228) Fallow Smooth (2560) Stubble (3841) Celery (3543) Grapes Untrained (11287) Soil Vinyard Develop (6128) Corn Senesced Green Weeds (3154) Lettuce Romaine 4 weeks (984) Lettuce Romaine 5 weeks (1850) Lettuce Romaine 6 weeks (818) Lettuce Romaine 7 weeks (1003) Vinyard Untrained (7055) Vinyard Vertical Trellis (1622) ROSIS Pavia University Asphalt (6631) Meadows (18649) Gravel (2099) Trees (3064) Metal Sheets (1345) Bare Soil (5029) Bitumen (1330) Self-Blocking Bricks (3682) Shadow (947) Table 3.1: Number of pixels in each ground-truth class in the four considered hyperspectral images. The number of training and test pixels used in our experiments can be derived from this table. A Comparative Assessment of Unmixing-Based Feature Extraction Techniques 34 3.4 Hyperspectral data sets (a) (b) Figure 3.3: (a) False color composition an AVIRIS hyperspectral image comprising several agricultural fields in Salinas Valley, California. (b) Ground truth-map containing 15 mutually exclusive land-cover classes. 3.4.1 AVIRIS Salinas Valley This scene was collected over the Valley of Salinas in Southern California. The full scene consists of 512 lines by 217 samples with 186 spectral bands (after removal of water absorption and noisy bands) from 0.4 to 2.5 µm, nominal spectral resolution of 10 nm, and 16-bit radiometric resolution. It was taken at low altitude with a pixel size of 3.7 meters (high spatial resolution). The data include vegetables, bare soils and vineyard fields. Fig. 3.3(a) shows a false color composition of the scene and Fig. 3.3(b) shows the available ground-truth regions for this scene, which cover about two thirds of the entire Salinas scene. Finally, Fig. 3.4 shows some pictures of selected land-cover classes taken on the imaged site at the same time as the data was being collected by the sensor. Of particular interest are the relevant differences in the romaine lettuce classes resulting from different soil cover proportions. 3.4.2 ROSIS Pavia University This scene was collected by the ROSIS optical sensor over the urban area of the University of Pavia, Italy. The flight was operated by DLR in the framework of the HySens project, managed and sponsored by the European Union. The image size in pixels is 610 × 340, with very high spatial resolution of 1.3 meters per pixel. The number of data channels in the acquired image is 115 (with spectral range from 35 Figure 3.4: Photographs taken at the site during data collection. A Comparative Assessment of Unmixing-Based Feature Extraction Techniques 36 3.5 Experimental results (a) (b) (c) Figure 3.5: (a) False color composition of the ROSIS Pavia scene. (b) Ground truth-map containing 9 mutually exclusive land-cover classes. (c) Training set commonly used for the ROSIS Pavia University scene. 0.43 to 0.86 µm). Fig. 3.5(a) shows a false color composite of the image, while Fig. 3.5(b) shows nine ground-truth classes of interest, which comprise urban features, as well as soil and vegetation features. Finally, Fig. 3.5(c) shows a commonly used training set directly derived from the ground-truth in Fig. 3.5(b). 3.5 Experimental results In this section we conduct a quantitative and comparative analysis of different feature extraction techniques for hyperspectral image classification, including unmixing-based and more traditional (supervised and unsupervised) approaches. The main goal is to use spectral unmixing and classification as complementary techniques, since the latter are more suitable for the classification of pixels dominated by a single land cover class, while the former are devoted to the characterization of mixed pixels. Because hyperspectral images often contain areas with both pure and mixed pixels, the combination of these two analysis techniques provides a synergistic data processing approach that has been explored in previous contributions [34, 51, 64, 65, 66]. Before describing the results obtained in experimental validation, we first describe the feature extraction techniques that will be used in our comparison in subsection 3.5.1. Then, subsection 3.5.2 describes the adopted supervised classification system and the experimental setup. Finally, subsection 3.5.3 discusses the obtained results in comparative fashion. 3.5.1 Feature extraction techniques used in the comparison In our classification system, relevant features are first extracted from the original image. Several types of input features have been considered in the classification experiments conducted in this chapter. In 37 A Comparative Assessment of Unmixing-Based Feature Extraction Techniques the following, we provide an overview of the new techniques used to extract features from the original hyperspectral data. A detailed mathematical description of these techniques is out of the scope of this chapter, since most of them are algorithms well known in the remote sensing literature, so only a short description of the conceptual basics for each method is given here. The techniques are divided into unsupervised approaches, if the algorithm is applied on the whole data cube, or supervised techniques, if the information associated with the training set of the data is somehow exploited during the feature extraction step. 3.5.1.1 Unsupervised Feature Extraction Techniques We consider five unsupervised feature extraction techniques in this chapter. Three of them are classic algorithms available in the literature (PCA, MNF and ICA described in the last chapter), and the two remaining ones are based on the exploitation of sub-pixel information through spectral unmixing, including the best unsupervised method in [34] and a newly proposed technique CMTMFunsup . A brief summary of the new considered unsupervised techniques follows: • MTMFunsup , which first performs an MNF-based dimensionality reduction and then applies the MTMF method in order to estimate fractional abundances of spectral endmembers extracted from the original data using the OSP algorithm [36]. In [34, 48] it is shown that MTMF outperforms other techniques for abundance estimation such as unconstrained and FCLSU [19] since it can provide meaningful abundance maps by means of partial unmixing in case not all endmembers are available a priori. • CMTMFunsup developed and intended to solve the problems highlighted by endmember extraction algorithms which are sensitive to outliers and pixels with extreme values of reflectance. By using an unsupervised clustering method such as the k-means to extract features, the endmembers extracted are expected to be more spatially significant. • FCunsup is an extension of the k-means clustering method [67] which provides soft clusters, where a particular pixel has a degree of membership in each cluster. This strategy is faster than the two previous strategies as it does not include a spectral unmixing step. 3.5.1.2 Supervised feature extraction techniques We consider several supervised feature extraction techniques in this chapter. The first techniques considered were DAFE and DBFE [14]. However, DBFE could not be applied in the case of very limited training sets since it requires a number of samples (for each class) bigger than the number of dimensions of the original data set in order to estimate the statistics used to project the data. As it will be shown in the next sections, this requirement was not satisfied for most of the experiments carried out. In turn, the results provided by DAFE were poor compared to the other methods for a low number of training samples, hence we did not include them in our comparison. As a result, the supervised methods adopted in our comparison were NWFE and three sub-pixel techniques based on estimating fractional abundances. Two of them were already presented in [34, 48], and the third one is the CMTMFsup technique developed in this chapter. Although a number of supervised feature extraction techniques has 38 3.5 Experimental results been available in the literature [14], according to our experiments the advantages provided by supervised techniques is not always evident, especially in the case of limited training sets [68]. A brief summary of the considered supervised techniques follows: • NWFE focuses on selecting samples near the eventual decision boundaries that best separate the classes. The main ideas of the NWFE are: 1) assigning different weights to every training sample in order to compute local means, and 2) defining non-parametric between-class and within-class scatter matrices to perform feature extraction [14]. • MTMFsup is equivalent to MTMFunsup but assuming that the pure spectral components are searched by the OSP endmember extraction algorithm in the training set instead of in the entire hyperspectral image. Our assumption is that training samples may better represent the available land cover classes in the subsequent classification process [34]. • MTMFavg is equivalent to MTMFsup but assuming that the representative spectral signatures are obtained as the average of the signatures belonging to each class in the training set (here, the number of components to be retained by MNF applied prior to the MTMF is varied in a given range). In this case, the OSP algorithm is not used to extract the spectral signatures, which are obtained in supervised fashion from the available training samples [34]. • CMTMFsup developed and acting as the supervised counterpart of CMTMFunsup . It mainly differs with regards to that technique in the fact that the clustering process is performed in the training samples, and not in the full hyperspectral image. 3.5.2 Supervised classification system and experimental setup In our supervised classification system, different types of input features are extracted from the original hyperspectral image prior to classification. In addition to the unsupervised and supervised feature extraction techniques described in the previous subsection, we also use the (full) original spectral information available in the hyperspectral data as input to the proposed classification system. In the latter case, the dimensionality of the input features used for classification equals n, the number of spectral bands in the original data set. When using feature extraction techniques, the number of features was varied empirically in our experiments and only the best results are reported. In all cases, a supervised classification process was performed using the SVM classifier with Gaussian kernel (observed to perform better than other tested kernels, such as polynomial or linear). Kernel parameters were optimized by a grid search procedure, and the optimal parameters were selected using 10-fold cross-validation (selected after testing different configurations). The LIBSVM library was in our experiments. In order to evaluate the ability of the tested methods to perform under training sets with different number of samples, we adopted the following training-test configurations: • In our experiments with the AVIRIS Indian Pines data set in Fig. 2.3(a), we randomly selected 5% and 15% of the pixels in each ground-truth class in Table 3.1 and used them to build the training set. The remaining pixels were used as test pixels. 39 A Comparative Assessment of Unmixing-Based Feature Extraction Techniques • In our experiments with the AVIRIS Salinas data set in Fig. 3.3(a), in which the size of the smaller classes is bigger when compared to those in the AVIRIS Indian Pines data set, we decided to reduce the training sets even more and selected only 2% and 5% of the available ground-truth pixels in Table 3.1 for training purposes. • In our experiments with the AVIRIS Kennedy Space Center data set, we decided to reduce the training sets even more and selected only 1% and 5% of the available ground-truth pixels in Table 3.1 for training purposes. • Finally, in our experiments with the ROSIS Pavia data set in Fig. 3.5(a), we used the training set in Fig. 3.5(c) and also a different training set made up of only 50 pixels for each class in Table 3.1 for comparative purposes. Based on the aforementioned training sets, OA and AA were computed over the remaining test samples for each data set. This experiment was repeated ten times to guarantee statistical consistency, and the average results after ten runs are provided. An assessment of the obtained results is reported in the following subsection. 3.5.3 Analysis and discussion of results Table 3.2 and table 3.3 show the OA and AA (in percentage) obtained by the considered classification system for different hyperspectral scenes using the original spectral information as input feature, and also the features provided by the unsupervised and supervised feature extraction techniques described in subsection 3.5.1. It is important to emphasize that, in the tables, we only report the best case (meaning the one with highest OA) for each considered feature extraction technique, after testing numbers of extracted features ranging from 5 to 50. In all cases, this range was sufficient to observe a decline in classification OA after a certain number of features, so the number given in the parentheses in the tables correspond to the optimal number of features for each considered feature extraction technique (in the case of the original spectral information, the number in the parentheses corresponds to the number of bands of the original hyperspectral image). Finally, in order to outline the best feature extraction technique in each considered experiment, we highlight in bold typeface the best classification result observed across all tested feature extraction methods. In previous chapter [34], the statistical significance of some of the processing chains considered in Table 3.2 and Table 3.3 were assessed using the McNemar’s test [55], concluding that the differences between the tested methods were statistically significant. Other similar tests are also available in the literature [69]. According to our experimental results, the same observations regarding statistical significance apply to the new processing chains included in this chapter. From Table 3.2 and Table 3.3, several conclusions can be drawn. First and foremost, we can observe that the use of supervised techniques for feature extraction is not always beneficial to improve the OA and AA, especially in case of limited training sets and statistical feature extraction approaches. For example, NWFE exhibits better results when compared to traditional unsupervised techniques such as PCA or ICA. However, DAFE (not included in the tables) exhibited quite poor results. The low performances obtained by DAFE should be therefore attributed to the very small size of the training set and to the fact that the land cover classes can be spectrally very close (as in the case of the AVIRIS 40 3.5 Experimental results Table 3.2: OA and AA (in percentage) obtained by the considered classification system for different hyperspectral image (AVIRIS Indian Pines and AVIRIS Kennedy Space Center) scenes using the original Spectral information, unsupervised feature extraction techniques, and supervised feature extraction techniques. Only the best case is reported for each considered feature extraction technique (with the optimal number of features in the parentheses) and the best classification result across all methods in each experiment is highlighted in bold typeface. Overall Features Original info PCA ICA MNF MTMFunsup CMTMFunsup FCunsup NWFE MTMFsup MTMFavg CMTMFsup Average Features Original info PCA ICA MNF MTMFunsup CMTMFunsup FCunsup NWFE MTMFsup MTMFavg CMTMFsup AVIRIS Indian Pines 5% Training 15% Training 75.78% (202) 84.49% (202) 77.25% (20) 83.86% (20) 76.84% (20) 83.52% (20) 86.67% (10) 91.35% (10) 84.90% (10) 89.50% (10) 87.18% (30) 91.61% (25) 74.57% (30) 79.45% (25) 79.76% (10) 86.11% (10) 85.96% (10) 90.28% (10) 86.24% (10) 91.00% (10) 85.08% (10) 90.19% (20) AVIRIS Indian Pines 5% Training 15% Training 66.37% (202) 79.52% (202) 70.59% (15) 80.37% (15) 70.03% (15) 80.03% (15) 83.31% (10) 89.04% (10) 80.12% (10) 86.65% (10) 82.17% (20) 89.55% (20) 69.49% (10) 76.10% (10) 72.46% (5) 78.80% (5) 82.57% (10) 87.76% (10) 82.31% (10) 89.16% (10) 83.34% (10) 89.47% (10) AVIRIS Kennedy Space Center 1% Training 5% Training 72.26% (176) 85.50% (176) 75.54% (5) 86.64% (10) 73.88% (10) 86.36% (10) 78.09% (15) 90.12% (15) 76.48% (10) 88.61% (15) 80.34% (35) 90.20% (45) 64.71% (10) 77.55% (30) 73.17% (10) 85.66% (10) 77.78% (10) 89.24% (15) 74.87% (10) 86.61% (20) 76.48% (10) 88.74% (15) AVIRIS Kennedy Space Center 1% Training 5% Training 67.10% (176) 81.95% (176) 68.92% (5) 82.37% (10) 65.92% (10) 81.85% (10) 79.23% (10) 87.69% (15) 69.13% (10) 85.48% (15) 74.32% (35) 87.72% (45) 59.49% (30) 73.01% (30) 64.26% (5) 81.31% (10) 70.37% (10) 86.79% (15) 67.11% (15) 83.39% (20) 70.28% (15) 86.19% (15) Indian Pines scene) thus making it very difficult to separate them by using spectral means and covariance matrices. Moreover, the importance of integrating the additional information provided by the training samples is strictly connected with the nature of the considered approach. This can be noticed when comparing the MTMF versus the CMTMF chains. In the former case, the best results are generally provided by the supervised approach (MTMFsup ) since the supervised strategy for extracting spectral endmembers using the OSP approach benefits from the reduction of outliers and pixels with extreme values of reflectance, which affect negatively this endmember extraction algorithm. In the latter case, the best results are generally provided by the unsupervised approach (CMTMFunsup ) due to the fact that, when trying to identify clusters in a very small training set, several problems appear, such as the bad conditioning of matrices when computing the inverse (in the k-means clustering step) or the eventual 41 A Comparative Assessment of Unmixing-Based Feature Extraction Techniques Table 3.3: OA and AA (in percentage) obtained by the considered classification system for different hyperspectral image scenes (AVIRIS Salinas Valley and ROSIS Pavia University) Using the Original Spectral information, unsupervised feature extraction techniques, and supervised feature extraction techniques. Only the best case is reported for each considered feature extraction technique (with the optimal number of features in the parentheses) and the best classification result across all methods in each experiment is highlighted in bold typeface. Overall Features Original info PCA ICA MNF MTMFunsup CMTMFunsup FCunsup NWFE MTMFsup MTMFavg CMTMFsup Average Features Original info PCA ICA MNF MTMFunsup CMTMFunsup FCunsup NWFE MTMFsup MTMFavg CMTMFsup AVIRIS Salinas Valley 2% Training 5% Training 88.39% (186) 90.66% (186) 91.93% (10) 93.55% (10) 91.72% (20) 93.33% (20) 93.71% (15) 94.90% (15) 93.27% (15) 94.38% (15) 92.83% (30) 94.47% (35) 88.87% (30) 90.29% (30) 92.28% (10) 93.47% (10) 92.67% (15) 94.49% (15) 93.42% (15) 94.67% (15) 92.63% (30) 93.95% (25) AVIRIS Salinas Valley 2% Training 5% Training 92.29% (186) 94.46% (186) 95.48% (10) 96.77% (20) 95.33% (10) 96.62% (20) 96.60% (15) 97.48% (20) 96.27% (15) 97.10% (15) 96.18% (30) 97.23% (35) 88.96% (30) 91.54% (25) 96.09% (10) 96.91% (10) 95.78% (10) 97.22% (20) 96.37% (15) 97.30% (15) 95.39% (30) 96.73% (25) ROSIS Pavia University 50 pixels Standard 84.09% (103) 80.99% (103) 81.65% (15) 81.81% (10) 81.39% (15) 81.44% (10) 83.52% (5) 77.77% (10) 83.16% (5) 75.56% (10) 86.83% (20) 84.25% (15) 66.57% (25) 70.07% (30) 81.39% (15) 78.56% (15) 81.41% (5) 75.21% (5) 84.91% (10) 83.58% (10) 85.34% (10) 81.48% (15) ROSIS Pavia University 50 pixels Standard 87.78% (103) 88.28% (103) 84.78% (15) 86.97% (10) 84.63% (15) 86.92% (10) 87.74% (5) 86.63% (10) 87.70% (5) 86.23% (10) 88.14% (20) 89.98% (20) 72.87% (25) 76.75% (30) 84.46% (15) 86.68% (20) 86.16% (5) 86.20% (10) 88.25% (5) 88.08% (10) 87.72% (10) 88.90% (15) selection of very similar clusters, leading to redundant information in class prototyping which ultimately affects the subsequent partial unmixing step and the obtained classification performances. In addition to the aforementioned observations, we emphasize that the supervised version derives the endmembers (via clustering) from a limited training set, while the unsupervised version derives the endmembers from the whole hyperspectral image. The former approach has the advantage of computational complexity, as the search for endmembers is only conducted in the small training set, but this comes at the expense of reduced modelling accuracy as expected. Although in previous chapter we developed MTMFavg in the hope of addressing these problems, our experimental results indicate that CMTMF techniques in general and CMTMFunsup in particular (an unsupervised approach as opposed to MTMFavg ) performs a better job in characterizing the sub-pixel information prior to classification of hyperspectral data. Finally, it is also worth noting the good performance achieved in all experiments by MNF, another unsupervised 42 3.5 Experimental results feature extraction strategy. Figs. 3.6 , 3.7, 3.8 show the results obtained in some of the experiments. An arising question at this point is whether there is any advantage of using unmixing chains versus the MNF transform. Since both feature extraction methods are unsupervised, with similar computational complexity and leading to similar classification results, it is not clear from the context if there exists any advantage of using an unmixing-based technique over a well-known, statistical method such as MNF. In order to address this issue, Fig. 3.9 shows the first 9 components extracted by MNF from the ROSIS Pavia University image. These components are ordered in terms of SNR, with the first component providing the maximum amount of information. Here, noise can be clearly appreciated in the last three components. In turn, Fig. 3.10 shows the components extracted for the same image by the CMTMFunsup technique. The components are arranged in no specific order, as spectral unmixing assigns the same priority to each endmember when deriving the associated abundance map. As shown by Fig. 3.10, the components provided by the unmixing-based technique can be interpreted in a physical manner (as the abundances of each spectral constituent in the scene) and most importantly these components can be related to the ground-truth classes in Fig. 3.5(b). This suggests that unmixing-based chains can provide an alternative strategy to classic feature extraction chains such as MNF with three main differences: 1. Unmixing-based feature extraction techniques incorporate information about mixed pixels, which are the dominant type of pixel in hyperspectral images. Quite opposite, standard feature extraction techniques such as MNF do not incorporate the pure/mixed nature of the pixels in hyperspectral data, disregarding a source of information that could be useful for the final classification. 2. The components provided by unmixing-based feature extraction techniques can be interpreted as the abundance of spectral constituents in the scene, while the components provided by other classic feature extraction techniques such as MNF do not necessarily have any physical meaning. 3. Unmixing-based feature extraction techniques do not penalize classes which are not relevant in terms of variance or SNR, while some classic feature extraction techniques such as MNF relegate variations of less significant size to low-order components. If such low-order components are not preserved, small classes may be affected. An additional aspect resulting from our experiments is that unmixing-based chains allow for a natural integration of the spatial information available in the original hyperspectral image (through the clustering strategy for endmember extraction designed). Although the aforementioned aspects may offer important advantages in hyperspectral data classification, the true fact is that our comparative assessment (conducted in terms of OA and AA using four representative hyperspectral images) only indicates a moderate improvement (or comparable performance) of the best unmixing-guided feature extraction method (CMTMFunsup ) with regards to the best statistical feature extraction method (MNF) reported in our experiments. This leads us to believe that further improvements to the integration of the information provided by spectral unmixing into the classification process are possible. With this in mind, we anticipate significant advances in the integration of spectral unmixing and classification of hyperspectral data in future developments. 43 A Comparative Assessment of Unmixing-Based Feature Extraction Techniques 3.6 Final observations and future directions In this chapter, we have investigated the advantages that can be gained by including information about spectral mixing at sub-pixel levels in the feature extraction stage that is usually conducted prior to hyperspectral image classification. For this purpose, we have developed a new unmixing-based feature extraction technique that combines the spatial and the spectral information through a combination of unsupervised clustering and partial spectral unmixing. We have compared our newly developed technique (which can be applied in both unsupervised and supervised fashion) with other classic and unmixingbased techniques for feature extraction. Our detailed quantitative and comparative assessment has been conducted using four representative hyperspectral images collected by two different instruments (AVIRIS and ROSIS) over a variety of test sites and in the framework of supervised classification scenarios dominated by the limited availability of training samples. Our experimental results indicate that the unsupervised version of our newly developed technique provides components which are physically meaningful and significant from a spatial point of view, resulting in good classification accuracies (without penalizing very small classes) when compared to the other feature extraction techniques tested in this chapter. In turn, since our analysis scenarios are dominated by very limited training sets, we have experimentally observed that, in this context, the use of supervised feature extraction techniques can lead to lower classification accuracies as the information considered for projecting the data into a lowerdimensional space is not representative of the thematic classes of the image. Future developments will include an investigation of additional techniques for feature extraction from a spectral unmixing point of view, in order to fully substantiate the advantages that can be gained at the feature extraction stage by including additional information about mixed pixels (which are predominant in hyperspectral images) prior to classification purposes. Another research line deserving future attention is the determination of automatic procedures to determine the optimal number of features to be extracted from each tested method. While several methods for estimating the intrinsic dimensionality of hyperspectral images exist, the determination of the number of features suitable for classification purposes depends on each particular method and, in the case of supervised feature extraction methods, on the available training. Although we have investigated performance in a suitable range of extracted features, the automatic determination of the optimal number of features for each method should be investigated in future work for practical reasons. Finally, future work should also consider nonlinear feature extraction methods such as kernel PCA [70] in addition to the linear feature extraction methods considered. 44 3.6 Final observations and future directions (a) Ground Truth (b) PCA (c) ICA (d) MNF (e) MTMFunsup (f) CMTMFunsup (g) NWFE (h) MTMFsup (i) MTMFavg (i) CMTMFsup Figure 3.6: Classification results for the AVIRIS Indian Pines scene (obtained using an SVM classifier with Gaussian kernel, trained with 5% of the available samples). 45 A Comparative Assessment of Unmixing-Based Feature Extraction Techniques (a) Ground Truth (b) PCA (e) MTMFunsup (h) MTMFsup (c) ICA (f) CMTMFunsup (i) MTMFavg (d) MNF (g) NWFE (i) CMTMFsup Figure 3.7: Classification results for the AVIRIS Salinas Valley scene (obtained using an SVM classifier with Gaussian kernel, trained with 2% of the available samples). 46 3.6 Final observations and future directions (a) Ground Truth (b) PCA (c) ICA (d) MNF (e) MTMFunsup (f) CMTMFunsup (g) NWFE (h) MTMFsup (i) MTMFavg (i) CMTMFsup Figure 3.8: Classification results for the ROSIS Pavia University scene (obtained using an SVM classifier with Gaussian kernel, trained with 50 pixels of each available ground-truth class). 47 48 Figure 3.10: Components extracted by the CMTMFunsup feature extraction technique from the ROSIS Pavia University scene (in no specific order). Figure 3.9: Components extracted by MNF from the ROSIS Pavia University scene (ordered from left to right in terms of amount of information). A Comparative Assessment of Unmixing-Based Feature Extraction Techniques Chapter 4 Semi-Supervised Self-Learning for Hyperspectral Image Classification 4.1 Summary As it has been shown in previous chapters, supervised hyperspectral image classification is a difficult task due to the unbalance between the high dimensionality of the data and the limited availability of labeled training samples in real analysis scenarios. While the collection of labeled samples is generally difficult, expensive and time-consuming, unlabeled samples can be generated in a much easier way. This observation has fostered the idea of adopting semi-supervised learning techniques in hyperspectral image classification. The main assumption of such techniques is that the new (unlabeled) training samples can be obtained from a (limited) set of available labeled samples without significant effort/cost. In this chapter, we develop a new approach for semi-supervised learning which adapts available active learning methods (in which a trained expert actively selects unlabeled samples) to a self-learning framework in which the machine learning algorithm itself selects the most useful and informative unlabeled samples for classification purposes. In this way, the labels of the selected pixels are estimated by the classifier itself, with the advantage that no extra cost is required for labeling the selected pixels using this machinemachine framework when compared with traditional machine-human active learning. The proposed approach is illustrated with two different classifiers: MLR and a probabilistic pixel-wise SVM. Our experimental results with real hyperspectral images collected by the NASA Jet Propulsion Laboratory’s AVIRIS and ROSIS, indicate that the use of self-learning represents an effective and promising strategy in the context of hyperspectral image classification.6 4.2 Introduction Remotely sensed hyperspectral image classification [14] takes advantage of the detailed information contained in each pixel (vector) of the hyperspectral image to generate thematic maps from detailed 6 Part of this chapter has been published in: I. Dopido, J. Li, P. R. Marpu, A. Plaza, J. M. Bioucas-Dias and J. A. Benediktsson, Semi-Supervised Self-Learning for Hyperspectral Image Classification, IEEE Transactions on Geoscience and Remote Sensing, vol. 51, no. 7, pp. 4032-4044, July 2013 [JCR(2012)=3.467]. Semi-Supervised Self-Learning for Hyperspectral Image Classification spectral signatures. A relevant challenge for supervised classification techniques (which assume prior knowledge in the form of class labels for different spectral signatures) is the limited availability of labeled training samples, since their collection generally involves expensive ground campaigns [71]. While the collection of labeled samples is generally difficult, expensive and time-consuming, unlabeled samples can be generated in a much easier way. This observation has fostered the idea of adopting semi-supervised learning techniques in hyperspectral image classification. The main assumption of such techniques is that new (unlabeled) training samples can be obtained from a (limited) set of available labeled samples without significant effort/cost [72]. The area of semi-supervised learning has experienced a significant evolution in terms of the adopted models, which comprise complex generative models [73, 74, 75], self-learning models [76, 77], multi-view learning models [78, 79], TSVMs [40, 41], and graph-based methods [80]. A survey of semi-supervised learning algorithms is available in [17]. Most of these algorithms use some type of regularization which encourages the fact that “similar” features are associated to the same class. The effect of such regularization is to push the boundaries between classes towards regions with low data density [81], where the usual strategy adopted first associates the vertices of a graph to the complete set of samples and then builds the regularizer depending on variables defined on the vertices. This trend has been successfully adopted in several recent remote sensing image classification studies. For instance, in [58] TSVMs are used to gradually search a reliable separating hyperplane (in the kernel space) with a transductive process that incorporates both labeled and unlabeled samples in the training phase. In [82], a semi-supervised method is presented that exploits the wealth of unlabeled samples in the image, and naturally gives relative importance to the labeled ones through a graph-based methodology. In [83], kernels combining spectral-spatial information are constructed by applying spatial smoothing over the original hyperspectral data and then using composite kernels in graph-based classifiers. In [84], a semisupervised SVM is presented that exploits the wealth of unlabeled samples for regularizing the training kernel representation locally by means of cluster kernels. In [85, 86], a new semi-supervised approach is presented that exploits unlabeled training samples (selected by means of an active selection strategy based on the entropy of the samples). Here, unlabeled samples are used to improve the estimation of the class distributions, and the obtained classification is refined by using a spatial multi-level logistic prior. In [87], a novel context-sensitive semi-supervised SVM is presented that exploits the contextual information of the pixels belonging to the neighborhood system of each training sample in the learning phase to improve the robustness to possible mislabeled training patterns. In [88], two semi-supervised one-class (SVM-based) approaches are presented in which the information provided by unlabeled samples present in the scene is used to improve classification accuracy and alleviate the problem of free-parameter selection. The first approach models data marginal distribution with the graph Laplacian built with both labeled and unlabeled samples. The second approach is a modification of the SVM cost function that penalizes more the errors made when classifying samples of the target class. In [89] a new method to combine labeled and unlabeled pixels to increase classification reliability and accuracy, thus addressing the sample selection bias problem, is presented and discussed. In [90], an SVM is trained with the linear combination of two kernels: a base kernel working only with labeled examples is deformed by a likelihood kernel encoding similarities between labeled and unlabeled examples, and then applied in the context of urban hyperspectral image classification. In [91], similar concepts to those addressed before are adopted 50 4.2 Introduction using a neural network as the baseline classifier. In [92], a semi-automatic procedure to generate land cover maps from remote sensing images using active queries is presented and discussed. In contrast to supervised classification, the aforementioned semi-supervised algorithms generally assume that a limited number of labeled samples are available a priori, and then enlarge the training set using unlabeled samples, thus allowing these approaches to address ill-posed problems. However, in order for this strategy to work, several requirements need to be met. First and foremost, the new (unlabeled) samples should be generated without significant cost/effort. Second, the number of unlabeled samples required in order for the semi-supervised classifier to perform properly should not be too high in order to avoid increasing computational complexity in the classification stage. In other words, as the number of unlabeled samples increases, it may be unbearable for the classifier to properly exploit all the available training samples due to computational issues. Further, if the unlabeled samples are not properly selected, these may confuse the classifier, thus introducing significant divergence or even reducing the classification accuracy obtained with the initial set of labeled samples. In order to address these issues, it is very important that the most highly informative unlabeled samples are identified in computationally efficient fashion, so that significant improvements in classification performance can be observed without the need to use a very high number of unlabeled samples. In this chapter, we evaluate the feasibility of adapting available active learning techniques (in which a trained expert actively selects unlabeled samples) to a self-learning framework in which the machine learning algorithm itself selects the most useful unlabeled samples for classification purposes, with the ultimate goal of systematically achieving noticeable improvements in classification results with regards to those found by randomly selected training sets of the same size. In the literature, active learning techniques have been mainly exploited in a supervised context, i.e., a given supervised classifier is trained with the most representative training samples selected after a (machine-human) interaction process in which the samples are actively selected according to some criteria based on the considered classifier, and then the labels of those samples are assigned by a trained expert in fully supervised fashion [32, 35, 86, 93, 94, 95]. In this supervised context, samples with high uncertainty are generally preferred as they are usually more informative. At the same time, since the samples are labeled by a human expert, high confidence can be expected in the class label assignments. As a result, classic (supervised) active learning generally focuses on samples with high confidence at the human level and high uncertainty at the machine level. In turn, we adapt standard active learning methods into a self-learning scenario. The main idea is to obtain new (unlabeled) samples using machine-machine interaction instead of human supervision. Our first (machine) level –similar to the human level in classic (supervised) active learning– is used to infer a set of candidate unlabeled samples with high confidence. In our second (machine) level –similar to the machine level for supervised active learning– the machine learning algorithm itself automatically selects the samples with highest uncertainty from the obtained candidate set. As a result, in our proposed approach the classifier replaces the human expert. In other words, here we propose a novel two-step semi-supervised self-learning approach: 1. The first step infers a candidate set using a self-learning strategy based on the available (labeled and unlabeled) training samples. Here, a spatial neighborhood criterion is used to derive new 51 Semi-Supervised Self-Learning for Hyperspectral Image Classification candidate samples as those which are spatially adjacent to the available (labeled) samples. 2. The second step automatically selects (and labels) new samples from the candidate pool by assuming that those pixels which are spatially adjacent to a given class can be labeled with high confidence as belonging to the same class. As a result, our proposed strategy relies on two main assumptions. The first assumption (global) is that training samples having the same spectral structure likely belonging to the same class. The second assumption (local) is that spatially neighboring pixels likely belong to the same class. As a result, our proposed approach naturally integrates the spatial and the spectral information in the semi-supervised classification process. The remainder of the chapter is organized as follows. Section 4.3 describes the proposed approach for semi-supervised self-learning. We illustrate the proposed approach with two probabilistic classifiers: MLR and a probabilistic pixel-wise SVM, which are both shown to achieve significant improvements in classification accuracy resulting from its combination with the proposed semi-supervised self-learning approach. Section 4.4 reports classification results using two real hyperspectral images collected by AVIRIS [20] and ROSIS [49] imaging spectrometers. Finally, section 4.5 concludes with some remarks and hints at plausible future research lines. 4.3 Proposed approach First, we briefly define the notations used in this chapter. Let K ≡ {1, . . . , K} denote a set of K class labels, S ≡ {1, . . . , n} a set of integers indexing the n pixels of an image, x ≡ (x1 , . . . , xn ) ∈ Rd×n an image of d-dimensional feature vectors, y ≡ (y1 , . . . , yn ) an image of labels, Dl ≡ {(yl1 , xl1 ), . . . , (yln , xln )} a set of labeled samples, ln the number of labeled training samples, Yl ≡ {yl1 , . . . , yln } the set of labels in Dl , Xl ≡ {xl1 , . . . , xln } the set of feature vectors in Dl , Du ≡ {Xu , Yu } a set of unlabeled samples, Xu ≡ {xu1 , . . . , xun } the set of unlabeled feature vectors in Du , Yu ≡ {yu1 , . . . , yun } the set of labels associated with Xu , and un the number of unlabeled samples. With this notation in mind, the proposed semi-supervised self-learning approach consists of two main ingredients: semi-supervised learning and self-learning, which are described next. 4.3.1 Semi-supervised learning For the semi-supervised part of our approach, we use two different probabilistic classifiers [96] to model the class posterior density. The first one is the MLR, which is formally given by [30]: T exp(ω (k) h(xi )) p(yi = k|xi , ω) = ∑K , (k)T h(x )) i k=1 exp(ω (4.1) where h(x) = [h1 (x), ..., hl (x)]T is a vector of l fixed functions of the input, often termed features; ω T T are the regressors and ω = [ω (1) , ..., ω (K) ]T . Notice that, the function h may be linear, i.e., h(xi ) = [1, xi,1 , ..., xi,d ]T , where xi,j is the j-th component of xi ; or nonlinear, i.e., h(xi ) = [1, Kxi ,x1 , ..., Kxi ,xl ]T , 52 4.3 Proposed approach where Kxi ,xj = K(xi , xj ) and K(·, ·) is some symmetric kernel function. Kernels have been largely used because they tend to improve the data separability in the transformed space. We use a RBF K(xi , xj ) = exp(−∥xi − xj ∥2 /2σ 2 ) kernel, which is widely used in hyperspectral image classification [16]. We selected this kernel (after extensive experimentation using other kernels, including linear and polynomial kernels) because we empirically observed that it provided the best results. From now on, d denotes the dimension of h(x). Under the present setup, learning the class densities amounts to estimating the logistic regressors. Following the work in [29, 39], we can compute ω by obtaining MAP estimate: b = arg max ω ω ℓ(ω) + log p(ω), (4.2) where p(ω) ∝ exp(−λ∥ω∥1 ) is a Laplacian prior to promote sparsity and λ is a regularization parameter b in [29, 39]. In our previous work [29], it was shown that controlling the degree of sparseness of ω parameter λ is rather insensitive to the use of different datasets, and that there are many suboptimal values for this parameter which lead to very accurate estimation of parameter ω. In our experiments, we set λ = 0.001 as we have empirically found that this parameter setting provides very good performance [97]. Finally, ℓ(ω) is the log-likelihood function over the training samples Dl+u ≡ Dl + Du , given by: ℓ(ω) ≡ ln∑ +un (4.3) log p(yi = k|xi , ω). i=1 As shown by Eq. (4.3), labeled and unlabeled samples are integrated to learn the regressors ω. The considered semi-supervised approach belongs to the family of self-learning approaches, where the training set Dl+u is incremented under the following criterion. Let DN (i) ≡ {(b yi1 , xi1 ), . . . , (b yin , xin )} be the set of neighboring samples of (yi , xi ) for i ∈ {l1 , . . . , ln , u1 , . . . , un }, where in is the number of samples in DN (i) and ybij is MAP estimate from the MLR classifier, with ij ∈ {i1 , . . . , in }. If ybij = yi , we increment the unlabeled training set by adding (b yij , xij ), i.e., Du = {Du , (b yij , xij )}. This increment is reasonable due to the following considerations. First, from a global viewpoint, samples which have the same spectral structure likely belong to the same class. Second, from a local viewpoint, it is very likely that two neighboring pixels also belong to the same class. Therefore, the newly included samples are reliable for learning the classifier. In this chapter, we run an iterative scheme to increment the training set as this strategy can refine the estimates and enlarge the neighborhood set such that the set of potential unlabeled training samples is increased. It is important to mention that problem (4.2), although convex, is very difficult to compute because the term ℓ(ω) is non-quadratic and the term log p(ω) is non-smooth. SMLR algorithm presented in [39] solves this problem with O((d(K − 1))3 ) complexity. However, most hyperspectral data sets are beyond the reach of this algorithm as their analysis becomes unbearable when the number of classes increases. In order to address this issue, we take advantage of LORSAL algorithm [28] which allows replacing a difficult non-smooth convex problem with a sequence of quadratic plus diagonal l2 -l1 problems with practical complexity of O(d2 (K − 1)). Compared with the figure O((d(K − 1))3 ) of the SMLR algorithm, the complexity reduction of d(K − 1)2 is quite significant [28, 29]. Finally, we have also used an alternative probabilistic classifier for the semi-supervised learning part of our methodology. This is the probabilistic SVM in [98, 40]. Other probabilistic classifiers could be used, 53 Semi-Supervised Self-Learning for Hyperspectral Image Classification but we have selected the SVM as a possible alternative to MLR since this classifier is already widely used to analyze hyperspectral data [58, 82], while the MLR has only recently emerged as a feasible technique for this purpose. It should be noted that the standard SVMs do not provide probability estimates for the individual classes. In order to get these estimates, pairwise coupling of binary probabilistic estimates is applied [98, 99], which has been applied for hyperspectral classifications [100]. 4.3.2 Self-learning The proposed semi-supervised self-learning approach is based on two steps. In the first step, a candidate set (based on labeled and unlabeled samples) is inferred using a self-learning strategy based on spatial information, so that high confidence can be expected in the class labels of the obtained candidate set. This is similar to human interaction in classic (supervised) active learning, in which the class labels are known and given by an expert. In a second step, we run standard active learning algorithms on the previously derived candidate set, so that they are adapted to a self-learning scenario to automatically (and intelligently) select the most informative samples from the candidate set. Here, the goal is to find the samples with higher uncertainty. As a result, in the proposed semi-supervised self-learning scheme our aim is to select the most informative samples without the need for human supervision. The class labels of the newly selected unlabeled training samples are predicted by the considered semi-supervised algorithm as mentioned in subsection 4.3.1. Let Dc be the newly generated unlabeled training set at each iteration, which meets the criteria of the considered semi-supervised algorithm. Notice that the self-learning step in the proposed approach leads to high confidence in the class labels of the newly generated set Dc . Now we can run standard active learning algorithms over Dc to find the most informative set Du , i.e., samples with high uncertainty, such that Du ⊆ Dc . Due to the fact that we use discriminative classifiers and a self-learning strategy for the semi-supervised algorithm, algorithms which focus on the boundaries between the classes are preferred. In our study, we use four different techniques to evaluate the proposed approach [90]: 1) MS, 2) BT, 3) MBT, and 4) nEQB, in addition to random selection (RS) in which the new samples are randomly selected from the candidate set. In the following we briefly outline each method (for a more detailed description of these approaches, we refer to [32, 86]): • The MS technique [32] samples the candidates lying within the margin by computing their distance to the hyperplane separating the classes. In other words, the MS minimizes the distance of the sample to the optimal separating hyperplane defined for class in a one-against-all setting for multiclass problems. • The BT algorithm [21] relies on the smallest difference of the posterior probabilities for each sample. In a multi-class setting, the algorithm can be applied (independently of the number of classes available) by calculating the difference between the two highest probabilities. As a result, the algorithm finds the samples minimizing the distance between the first two most probable classes. In previous work [29], it has been shown that the BT criterion generally focuses on the boundaries comprising many samples, possibly disregarding boundaries with fewer samples. • The MBT scheme [29] was originally proposed to include more diversity in the sampling process as 54 4.3 Proposed approach compared to the BT approach. It finds the samples maximizing the probability of the largest class for each individual class. This method takes into account all the class boundaries by conducting the sampling in cyclic fashion, making sure that the MBT does not get trapped in any class whereas BT could be trapped in a single (complex) boundary. • The nEQB approach [35] is a form of committee-based sampling algorithm that quantifies the uncertainty of a pixel by considering a committee of learners. Each member of the committee exploits different hypotheses about the classification problem and consequently labels the pixels in the pool of candidates. The algorithm then selects the samples showing maximal disagreement between the different classification models in the committee. Specifically, the nEQB approach uses bagging to build the committee and Entropy maximization as the multiclass heuristic, which provides a measure that is then normalized in order to bound it with respect to the number of classes predicted by the committee and avoid hot spots of the value of uncertainty in regions where several classes overlap. The version of nEQB used in this work is the one implemented in7 . At this point, it is important to emphasize that the aforementioned sampling algorithms have been used for intelligently selecting the most useful candidate samples based on the available probabilistic information. As a result, spatial information is not directly addressed by these methods, but by the strategy adopted to generate the pool of candidate samples. Since spatial information is the main criterion adopted in this stage, there is a risk that the initial pool of candidate samples may smooth out broad areas in the scene. However, we emphasize that our proposed method for generating the pool of initial candidates is not exclusively spatial as we use the probabilistic information provided by spectral-based classifiers (such as MLR or probabilistic SVM) in order to assess the similarity between the previously selected samples and the new candidates. Hence, as we have experimentally observed, no significant smoothing effects happen in broad areas and good initial candidates are generally selected. It is also worth noting that we use two classifiers with probabilistic output that are well-suited for the aforementioned algorithms (MLR and probabilistic SVM). However, the proposed approach can be adapted to any other probabilistic classifiers. For illustrative purposes, Fig. 4.1 shows how spatial information can be adopted as a reasonable criterion to select unlabeled samples and prevent labeling errors in a semi-supervised classification process using a probabilistic classifier. As Fig. 4.1 shows, we use an iterative process to achieve the final classification results. First, we use a probabilistic classifier (the MLR or the probabilistic SVM) to produce a global classification map which contains the probability of each pixel to belong to each class in the considered hyperspectral image. Based on a local similarity assumption, we identify the neighbors of the labeled training samples (using first-order spatial connectivity) and then compute the candidate set Dc by analyzing the spectral similarity of the spatial neighbors with regards to the original labeled samples. This is done by analyzing the probabilistic output associated to each neighboring sample. In this way, the candidate set Dc is obtained based on spectral and spatial information and its samples are highly reliable. At the same time, it is expected that there may be redundant information in Dc . In other words, some of the samples in the candidate set may not be useful for training the classifier as they may be too similar to the original labeled samples. This could introduce difficulties from the viewpoint 7 http://code.google.com/p/altoolbox 55 Figure 4.1: A graphical example illustrating how spatial information can be used as a criterion for semi-supervised self-learning in hyperspectral image classification. Semi-Supervised Self-Learning for Hyperspectral Image Classification 56 4.4 Experimental results of computational complexity. Therefore, after Dc is obtained, we run active learning algorithms on the candidate set in order to automatically select the most informative unlabeled training samples. Since the active learning algorithms are based on the available probabilistic information, they are adapted to a self-learning scenario and used to intelligently reduce possibly existing redundancies in the candidate set, thus obtaining a highly informative pool of training samples which ultimately contain only the most relevant samples for classification purposes. The newly obtained labeled and unlabeled training samples are finally used to retrain the classifier. The procedure is repeated in iterative fashion until a convergence criterion is met, for example, until a certain number of unlabeled training samples is obtained. 4.4 Experimental results In this section, two real hyperspectral images are used to evaluate the proposed approach for semisupervised self-learning. In our experiments with the MLR and SVM classifiers, we apply the Gaussian RBF kernel to a normalized version of the considered hyperspectral data set8 . We reiterate that the Gaussian RBF kernel was selected after extensive experimentation with other kernels. In all cases, the reported figures of OA, AA, kappa statistic, and class individual accuracies are obtained by averaging the results obtained after conducting 10 independent Monte Carlo runs with respect to the labeled training set Dl from the ground truth image, where the remaining samples are used for validation purposes. Finally, the optimal parameters C (parameter that controls the amount of penalty during the SVM optimization [40]) and σ (spread of the Gaussian RBF kernel) were chosen by 10-fold cross validation. These parameters are updated at each iteration. In order to illustrate the good performance of the proposed approach, we use very small labeled training sets on purpose. As a result, the main difficulties that our proposed approach should circumvent can be summarized as follows. First and foremost, it is very difficult for supervised algorithms to provide good classification results as very little information is generally available about the class distribution. Poor generalization is also a risk when estimating class boundaries in scenarios dominated by limited training samples. Since our approach is semi-supervised, we take advantage of unlabeled samples in order to improve classification accuracy. However, if the number of labeled samples l is very small, increasing the number of unlabeled samples u could bias the learning process. In order to analyze the aforementioned issues and provide a quantitative evaluation of our proposed approach with regards to the optimal case in which true active learning methods (i.e., those relying on the knowledge of the true labels of the selected samples) were used, we have implemented the following validation framework. Let Dur be a set of unlabeled samples for which true labels are available. These samples are included in the ground-truth associated to the hyperspectral image but are not used in the set of labeled samples used initially by the classifier. In order to evaluate the effectiveness of the proposed approach, we can effectively label these samples in Dur using their true (ground-truth) labels instead of estimating the labels by our proposed approach. Clearly, these samples will be favored over those selected by our proposed method which makes use of estimated labels. But it is interesting to quantify such an advantage (the lower it is, the better for our method). Following this rationale, the 8 The normalization is simply given by xi := √∑x i ( ∥xi ∥2 ) , for i = 1, . . . , n, where xi is a spectral vector. 57 Semi-Supervised Self-Learning for Hyperspectral Image Classification optimal case is that most samples in Du have true labels available, which means that Dur contains most of the unlabeled samples in Du . In our experiments, we denote by lr the number of unlabeled samples for which a true label is available in the ground-truth associated to the considered hyperspectral image. If lr = 0, this means that the labels of all unlabeled samples are estimated by our proposed approach. If lr = ur , this means that true labels are available for all the samples in Dur . Using this strategy, we can substantiate the deviation of our proposed approach with regards to the optimal case in which true labels for the selected samples are available. Typically, true labels will be only available for part of the samples as the considered hyperspectral data sets do not contain ground-truth information for all pixels. In this scenario, the optimal case comprises both true (whenever available) and estimated labels (the value of lr is given in all experiments). The remainder of this section is organized as follows. In subsection 4.4.1, we describe the experiments conducted using the first data set: AVIRIS Indian Pines [101]. Finally, subsection 4.4.2 conducts experiments using a second data set: ROSIS Pavia University [102]. In all cases, the results obtained by the supervised versions of the considered classifiers are also reported for comparative purposes. 4.4.1 Experiments with AVIRIS Indian Pines data set In the first experiment we evaluated the impact of the number of unlabeled samples on the classification performance achieved by the two considered probabilistic classifiers using the AVIRIS Indian Pines data set in Fig. 2.3(a), described in subsection 2.4.1.1. Fig. 4.2 shows the OAs in classification accuracy as a function of the number of unlabeled samples obtained by the MLR (top) and probabilistic SVM (bottom) classifiers, respectively. The plots in Fig. 4.2, which were generated using estimated labels only, reveal clear advantages of using unlabeled samples for the proposed semi-supervised self-learning approach when compared with the supervised algorithm alone. In all cases, the proposed strategy outperforms the corresponding supervised algorithm significantly, and the increase in performance is more relevant as the number of unlabeled samples increases. These unlabeled samples are automatically selected by the proposed approach, and represent no cost in terms of data collection or human supervision which are key aspects for self-learning. In Fig. 4.2 it can also be seen that using intelligent training sample selection algorithms such as MS, BT, MBT or nEQB greatly improved the obtained accuracies in comparison with simple random selection (RS). The results in Fig. 4.2 also reveal that BT outperformed other strategies in most cases, with MBT providing lower classification accuracies than BT. This is expected, as the candidate set Dc is more relevant when the samples are obtained from the class boundaries. Finally, it can also be observed that the MLR always performed better than the probabilistic SVM in terms of classification accuracies. In order to show the classification results in more details, Tables 4.1 to 4.4 show OA, AA, individual classification accuracies (in percentage) and the kappa statistic obtained by the supervised MLR and probabilistic SVM –trained using only 10 labeled samples per class– and by the proposed approach (based on the same classifier) using the four considered sample selection algorithms (executed using 30 iterations) in comparison with the optimal case for the same algorithms, in which true labels are used whenever available in the ground-truth. In all cases, we report the value of lr to provide an indication of the number of true versus estimated labels used in the experiments. It is noticeable that, by including 58 59 ln = 240 (SVM classifier) ln = 240 (MLR classifier) Figure 4.2: OA (as a function of the number of unlabeled samples) obtained for the AVIRIS Indian Pines data set using the MLR (right) and probabilistic SVM (left) classifier, respectively. Estimated labels were used in all the experiments, i.e., lr = 0. ln = 160 (SVM classifier) ln = 80 (SVM classifier) ln = 160 (MLR classifier) ln = 80 (MLR classifier) 4.4 Experimental results Alfalfa (54) Corn-Notill (1434) Corn-Min (834) Corn (234) Grass-Pasture (497) Grass-Trees (747) Grass-Pasture-Mowed (26) Hay-Windrowed (489) Oats (20) Soybeans-Notill (968) Soybeans-Min (2468) Soybeans-Clean (614) Wheat (212) Woods (1294) Bldg-Grass-Tree-Drives (380) Stone-Steel-Towers (95) OA AA kappa Supervised MLR classifier MS lr = 0 lr = 683 83.64±5.12 84.55±6.10 86.82±5.00 48.38±6.54 71.64±6.05 75.23±6.07 47.65±7.33 66.36±12.63 72.73±12.55 70.63±9.43 85.76±8.13 85.49±5.74 75.42±7.35 85.50±4.93 87.37±7.43 86.01±4.61 96.54±1.17 96.65±1.21 88.12±6.88 93.75±6.62 87.50±5.89 88.89±5.41 97.45±0.82 97.43±0.89 98.00±4.22 96.00±11.35 95.00±10.80 58.68±9.18 80.87±7.17 83.39±7.99 44.85±10.85 72.51±4.70 74.49±7.29 52.50±9.91 80.88±10.40 85.02±7.99 98.76±1.57 99.21±0.33 99.26±0.42 75.63±9.38 92.40±3.41 93.23±3.76 50.84±7.65 66.70±7.56 65.62±6.12 79.88±8.22 82.94±7.91 84.12±10.90 60.12± 3.08 80.00± 1.09 82.14±5.88 71.74± 1.54 84.57± 1.03 85.58±3.60 55.43± 3.20 77.31± 1.26 79.74±6.50 BT lr = 0 lr = 668 85.00±6.43 84.77±5.87 72.88±4.58 74.23±4.32 64.60±12.79 72.28±11.97 87.54±5.86 88.04±4.53 85.48±5.32 88.67±5.57 95.97±2.02 97.06±1.17 93.75±5.47 86.88±8.56 98.27±0.55 98.16±0.64 97.00±11.35 96.00±6.99 83.36±7.39 86.03±5.47 70.14±5.28 72.76±5.72 82.04±9.54 86.61±6.53 99.16±0.41 99.31±0.71 94.21±5.14 94.07±2.80 67.38±11.11 68.86±7.84 80.94±7.75 83.29±9.79 80.04± 1.28 82.28±6.12 84.86± 1.53 86.06±3.86 77.39± 1.45 79.93±6.79 Table 4.1: OA, AA, individual classification accuracies [statistic obtained using the MLR probabilistic classifier when applied to the AVIRIS Indian Pines hyperspectral data set, with 10 labeled samples per class (160 samples in total) and un = 750 unlabeled training samples. It is applied two active learning techniques (MS and BT), and the supervised case is also reported. lr denotes the number of true labels available in Du (used to implement an optimal version of each sampling algorithm). The standard deviations are also reported for each test. Semi-Supervised Self-Learning for Hyperspectral Image Classification 60 Alfalfa (54) Corn-Notill (1434) Corn-Min (834) Corn (234) Grass-Pasture (497) Grass-Trees (747) Grass-Pasture-Mowed (26) Hay-Windrowed (489) Oats (20) Soybeans-Notill (968) Soybeans-Min (2468) Soybeans-Clean (614) Wheat (212) Woods (1294) Bldg-Grass-Tree-Drives (380) Stone-Steel-Towers (95) OA AA kappa lr = 0 87.27±2.92 72.23±3.86 63.86±10.46 92.23±2.45 87.08±6.30 96.53±1.23 89.38±7.25 98.77±0.39 99.00±3.16 79.84±7.40 62.58±8.20 85.45±8.62 99.60±0.31 94.81±3.74 66.89±7.02 91.06±3.19 78.34± 2.11 85.41± 1.12 75.59± 2.29 nEQB lr = 0 lr = 603 82.50±3.40 81.14±4.92 77.96±4.56 73.62±3.16 64.82±11.64 69.14±10.11 86.38±6.30 80.40±13.18 79.49±8.35 83.78±7.28 91.37±5.16 93.31±2.93 90.63±4.42 88.12±9.97 99.19±0.33 96.43±1.75 97.00±6.75 96.00±6.99 82.00±8.82 81.86±6.29 68.04±5.60 69.29±5.43 83.77±10.90 87.28±6.05 98.96±0.28 97.77±0.85 86.45±10.15 82.32±7.40 78.30±12.87 72.73±7.75 79.53±5.74 85.06±10.23 79.02±1.53 79.64±4.88 84.15±1.24 83.64±3.05 76.31±1.66 76.85±5.40 MLR classifier lr = 646 89.09±3.18 72.16±5.00 68.50±8.56 90.67±6.48 89.45±5.96 97.08±1.77 90.63±5.31 98.60±0.61 99.00±3.16 83.25±5.37 65.36±5.96 85.12±9.42 99.31±0.35 93.78±3.95 67.51±7.20 90.82±3.91 79.68±5.28 86.27±3.84 77.08±5.85 MBT RS lr = 0 lr = 747 79.55±4.48 80.23±5.87 60.25±7.97 61.84±9.02 53.39±8.47 53.18±6.63 66.29±16.34 71.74±12.94 81.79±5.15 83.59±6.71 94.02±2.75 94.12±2.96 85.00±6.72 86.25±5.74 96.74±1.33 96.35±1.38 99.00±4.22 98.00±4.22 67.47±11.43 65.50±11.99 50.81±12.98 54.02±8.23 61.79±12.36 65.71±11.30 99.55±0.28 99.50±0.33 88.86±6.18 89.55±6.78 55.38±8.20 54.16±9.98 77.53±8.55 78.00±7.73 68.01± 3.04 69.28±2.63 76.09± 1.76 76.98±1.46 64.01± 3.30 65.39±2.86 Table 4.2: OA, AA, individual classification accuracies [and kappa statistic obtained using the MLR probabilistic classifier when applied to the AVIRIS Indian Pines hyperspectral data set, with 10 labeled samples per class (160 samples in total) and un = 750 unlabeled training samples. It is applied two active learning techniques (MBT and nEQB), and the random sampling case (RS) is also reported. lr denotes the number of true labels available in Du (used to implement an optimal version of each sampling algorithm). The standard deviations are also reported for each test. 4.4 Experimental results 61 Alfalfa (54) Corn-Notill (1434) Corn-Min (834) Corn (234) Grass-Pasture (497) Grass-Trees (747) Grass-Pasture-Mowed (26) Hay-Windrowed (489) Oats (20) Soybeans-Notill (968) Soybeans-Min (2468) Soybeans-Clean (614) Wheat (212) Woods (1294) Bldg-Grass-Tree-Drives (380) Stone-Steel-Towers (95) OA AA kappa Probabilistic SVM classifier Supervised MS lr = 0 lr = 695 79.77±12.70 75.23±8.67 65.23±11.19 32.32±14.21 63.90±13.67 77.46±1.89 37.17±19.56 56.70±25.76 80.24±3.09 68.62±10.32 87.95±3.29 89.24±1.73 77.19±7.29 87.54±7.09 91.21±3.01 65.36±14.50 93.96±2.75 91.90±2.82 90.63±6.75 90.00±7.34 93.75±2.95 78.06±8.12 95.80±1.75 97.70±0.60 97.00±6.75 93.00±9.49 100.00 49.42±18.23 80.96±7.68 88.68±3.02 33.90±12.83 65.50±12.51 65.98±2.15 43.31±12.88 77.90±10.32 90.79±2.09 93.61±3.96 98.37±1.07 97.82±1.40 72.39±15.02 89.24±6.07 93.90±1.92 47.84±14.90 68.11±14.08 64.95±5.97 86.35±10.26 96.35±4.72 93.53±3.65 50.61±5.34 75.87±3.44 81.82±7.54 65.93±2.99 82.53±2.03 86.40±4.47 45.14±5.35 72.76±3.76 79.49±8.26 BT lr = 0 lr = 717 84.32±3.78 84.77±3.72 62.97±15.49 76.54±3.16 58.12±24.62 76.58±4.23 82.10±13.80 86.38±3.52 89.16±6.02 93.37±1.35 95.29±2.62 94.02±2.53 92.50±4.93 95.00±3.95 97.89±0.89 98.10±0.46 93.00±6.75 99.00±3.16 82.03±8.88 91.39±2.14 63.36±15.50 68.60±2.36 81.42±11.08 91.42±1.24 98.66±0.81 97.52±1.34 92.94±4.58 97.34±0.40 66.81±16.28 61.97±3.04 93.18±5.62 90.82±3.79 76.23±5.40 82.91±0.75 83.36±2.15 87.68± 0.67 73.18±5.81 80.71±0.83 Table 4.3: OA, AA, individual classification accuracies [kappa statistic obtained using the probabilistic SVM classifier when applied to the AVIRIS Indian Pines hyperspectral data set, with 10 labeled samples per class (160 samples in total) and un = 750 unlabeled training samples. It is applied two active learning techniques (MS and BT), and the supervised case is also reported. lr denotes the number of true labels available in Du (used to implement an optimal version of each sampling algorithm). The standard deviations are also reported for each test. Semi-Supervised Self-Learning for Hyperspectral Image Classification 62 Alfalfa (54) Corn-Notill (1434) Corn-Min (834) Corn (234) Grass-Pasture (497) Grass-Trees (747) Grass-Pasture-Mowed (26) Hay-Windrowed (489) Oats (20) Soybeans-Notill (968) Soybeans-Min (2468) Soybeans-Clean (614) Wheat (212) Woods (1294) Bldg-Grass-Tree-Drives (380) Stone-Steel-Towers (95) OA AA kappa Probabilistic SVM classifier MBT nEQB lr = 0 lr = 649 lr = 0 lr = 701 89.77±3.08 85.91±0.96 80.00±12.21 55.45±7.74 51.33±19.49 59.70±2.85 60.72±17.53 75.67±2.12 55.98±22.21 72.34±2.15 55.42±22.33 77.97±1.64 81.03±13.28 84.06±2.72 86.38±4.02 86.34±4.26 88.17±6.40 93.24±1.23 82.40±6.03 90.60±2.99 90.39±4.96 88.66±2.22 87.72±7.29 92.29±2.42 90.00±4.37 93.75±2.95 89.38±6.62 93.13±1.98 98.52±1.19 98.27±0.43 93.26±3.95 97.93±1.38 95.00±12.69 100.00 98.00±4.22 97.00±4.83 72.13±24.41 87.21±2.60 71.34±27.13 85.75±2.73 50.16±12.02 53.59±5.69 58.33±23.25 62.12±2.40 63.00±17.91 84.39±7.02 76.71±13.10 92.04±1.71 98.22±2.40 99.01±0.52 97.28±0.91 97.48±1.00 92.10±6.25 97.81±0.55 77.73±10.45 90.73±2.72 65.46±8.72 58.51±4.37 72.54±12.16 64.86±5.76 88.35±9.87 83.18±2.29 94.47±5.82 87.41±4.11 68.66±5.35 75.26±1.39 70.47±5.24 79.69±0.62 79.35±2.16 83.73±0.79 80.10±2.43 84.17±0.65 64.90±5.75 72.39±1.49 66.79±5.65 77.14±0.67 RS lr = 0 lr = 740 82.05±7.68 66.14±7.98 44.56 ±18.39 55.32±3.61 43.28±25.34 61.77±6.22 72.50±13.19 85.49±2.64 85.73±5.77 89.45±2.47 88.36±5.99 82.63±4.95 87.50 ±8.33 93.13±1.98 93.49±4.39 97.24±0.67 95.00±7.07 100.00 65.10±18.05 84.38±3.66 50.44±15.80 44.10±13.02 52.91±8.92 61.94±11.52 97.38±1.51 97.62±0.45 89.36±6.60 96.94±0.74 42.35±13.44 40.00±7.62 90.35±4.95 84.35±2.54 63.59±5.59 68.40±2.85 73.77±2.18 77.53±0.96 59.13±5.68 64.73±2.99 Table 4.4: OA, AA, individual classification accuracies [and kappa statistic obtained using the probabilistic SVM classifier when applied to the AVIRIS Indian Pines hyperspectral data set, with 10 labeled samples per class (160 samples in total) and un = 750 unlabeled training samples. It is applied two active learning techniques (MBT and nEQB), and the random sampling case (RS) is also reported. lr denotes the number of true labels available in Du (used to implement an optimal version of each sampling algorithm). The standard deviations are also reported for each test. 4.4 Experimental results 63 Semi-Supervised Self-Learning for Hyperspectral Image Classification Supervised (60.12%) MS (80.00%) BT (80.04%) MBT with (78.34%) nEQB (79.02%) RS (68.01%) Figure 4.3: Classification maps and OA (in the parentheses) obtained after applying the MLR classifier to the AVIRIS Indian Pines data set by using 10 labeled training samples and 750 unlabeled samples, i.e., ln = 160, un = 750 and lr = 0. 64 4.4 Experimental results unlabeled samples, the classification results are significantly improved in all cases. Furthermore, it can be observed that the MLR classifier is more robust than the probabilistic SVM in our framework. For example, with un = 750 and BT sampling, only 2.24% difference in classification can be observed between the implementation using only estimated labels and the optimal case in which both true and estimated labels are considered. However, for the probabilistic SVM classifier the difference is 6.67%. Similar observations can be made for the other sampling algorithms considered in our experiments. Figure 4.5: OA (as a function of the number of unlabeled samples) obtained for the AVIRIS Indian Pines data set using the MLR classifier with BT sampling by using 5 labeled samples per class (in total 80 samples). Two cases are displayed: the one in which all unlabeled samples are estimated by the proposed approach (i.e., lr = 0) and the optimal case, in which true labels are used whenever possible (i.e., lr = ur ). For illustrative purposes, Fig. 4.5 analyzes the convergence of our proposed approach by plotting the obtained classification accuracies for the AVIRIS Indian Pines scene as a function of the number of unlabeled samples, using only 5 labeled samples per class (in total 80 labeled samples) for the MLR classifier with BT sampling approach. In the figure, we report the case in which all unlabeled samples are estimated by the proposed approach (i.e., lr = 0) and also the optimal case in which true labels are used whenever possible (i.e., lr = ur ). As can be seen in Fig. 4.5, the proposed approach achieved good performance when compared with the optimal case, with a difference of about 5% in classification accuracy when 3500 training samples were used. Finally, Figs. 4.3 and 4.4 respectively show some of the classification maps obtained by the MLR and probabilistic SVM classifiers for the AVIRIS Indian Pines scene. These classification maps correspond to one of the 10 Monte-Carlo runs that were averaged in order to generate the classification scores reported in Tables 4.1 to 4.4. Similarly, Fig.4.4 shows some of the classification maps obtained by the probabilistic SVM classifier for the same scene, which correspond to one of the 10 Monte-Carlo runs that were averaged in order to generate the classification scores reported in Tables 4.1 to 4.4. The advantages obtained by adopting a semi-supervised learning approach with regards to the corresponding supervised case can be 65 Semi-Supervised Self-Learning for Hyperspectral Image Classification clearly appreciated in the classification maps displayed in Fig. 4.3 and Fig. 4.4, which also report the classification OAs obtained for each method in the parentheses. 4.4.2 Experiments with ROSIS Pavia University data set In this subsection we perform a set of experiments to evaluate the proposed approach using the ROSIS Pavia University dataset in Fig. 3.5, described in subsection 3.4.2. This problem represents a very challenging classification scenario dominated by complex urban classes and nested regions. First, Fig. 4.6 shows how the OA results increase as the number of unlabeled samples increases, indicating again clear advantages of using unlabeled samples for the proposed semi-supervised self-learning approach in comparison with the supervised case. In this experiment, the four considered sample selection approaches (MS, BT, MBT and nEQB) perform similarly and slightly better than simple random selection. For instance, when ln = 45 labeled samples were used, the performance increase observed after including un = 700 unlabeled samples with regards to the supervised case was 13.93% (for the MS), 13.86% (for the BT), 10.27% (for the MBT) and 9.56% (for the nEQB). These results confirm our introspection that the proposed semi-supervised self-learning approach can greatly assist in improving the results obtained by different supervised classifiers based on limited training samples. Furthermore, Tables 4.5 to 4.8 show OA, AA, individual classification accuracies (in percentage) and the kappa statistic using only 10 labeled samples per class, in total, ln = 90 samples and un = 700 unlabeled samples for the semi-supervised cases in comparison with the optimal case, in which true labels are used whenever available in the ground-truth. In all cases, we provide the value of lr to provide an indication of the number of true versus estimated labels used in the experiments. It can be observed from Table 4.5 to 4.8 that the proposed approach is quite robust as it achieved classification results which are very similar to those found by the optimal case. For example, by using the BT sampling algorithm the proposed approach obtained an OA of 83.73% which is almost the same as the one obtained the optimal case, which achieved an OA of 84.07% by using true labels whenever possible. This observation is confirmed by Fig. 4.9, which plots the classification accuracy obtained (as a function of the number of unlabeled samples) for a case in which 100 labeled training samples per class were used (a total 900 samples) for the MLR classifier with BT sampling approach. In the figure, we report the case in which all unlabeled samples are estimated by the proposed approach (i.e., lr = 0) and also the optimal case in which true labels are used whenever possible (i.e., lr = ur ). Although in this experiment the number of initial labeled samples is significant, it is remarkable that the results obtained by the proposed approach using only estimated labels are almost the same than those obtained with the optimal version using true labels, which means that the unlabeled training samples estimated by the proposed approach are highly reliable in this experiment. 66 Asphalt (6631) Meadows (18649) Gravel (2099) Trees (3064) Metal Sheets (1345) Bare Soil (5029) Bitumen (1330) Self-Blocking Bricks (3682) Shadow (947) OA AA kappa MLR classifier Supervised MS lr = 0 lr = 443 64.05±7.34 74.57±7.48 75.41±6.01 63.15±7.27 80.71±5.71 83.92±2.84 66.28±9.21 80.05±9.35 80.33±8.86 84.74±11.11 84.88±9.97 85.47±8.66 8.64±0.60 99.49±0.44 98.68±1.24 69.54±8.79 89.61±3.22 89.74±4.15 87.70±3.31 95.29±1.66 93.93±2.18 73.22±7.57 82.19±7.02 81.38±5.06 98.44±1.91 98.90±2.56 97.88±3.33 69.25±3.75 82.63±2.55 84.08±0.98 78.42±1.75 87.30±1.28 87.41±0.76 61.69±4.01 77.78±3.08 79.50±1.14 BT lr = 0 lr = 356 72.62±4.97 74.72±6.14 83.33±4.49 84.62±2.24 82.07±9.31 81.09±9.00 88.07±8.87 83.32±9.45 99.29±0.36 99.32±0.47 89.59±3.86 88.93±4.62 96.17±0.99 95.39±2.02 80.99±7.09 80.48±4.46 99.12±1.79 98.60±1.86 83.73±1.86 84.07±1.52 87.92±1.13 87.39±1.25 79.12±2.23 79.45±1.90 Table 4.5: OA, AA, individual classification accuracies [%], and kappa statistic obtained using the MLR probabilistic classifier when applied to the ROSIS University of Pavia hyperspectral data set by using 10 labeled samples per class (in total 90 samples) and un = 700 unlabeled training samples. It is applied two active learning techniques (MS and BT), and the supervised case is also reported. lr denotes the number of true labels available in Du (used to implement an optimal version of each sampling algorithm). The standard deviations are also reported for each test. 4.4 Experimental results 67 Asphalt (6631) Meadows (18649) Gravel (2099) Trees (3064) Metal Sheets (1345) Bare Soil (5029) Bitumen (1330) Self-Blocking Bricks (3682) Shadow (947) OA AA kappa MBT lr = 0 lr = 412 71.43±4.75 71.54±4.57 77.35±3.56 80.57±4.67 79.24±9.19 81.55±7.65 94.41±3.58 88.45±6.82 99.77±0.21 99.70±0.29 82.45±6.58 86.31±4.13 96.53±1.18 96.52±1.17 82.87±6.76 77.83±7.48 98.98±1.88 99.30±0.49 80.59±1.38 81.72±1.96 87.00±0.77 86.86±0.73 75.44±1.61 76.70±2.27 nEQB lr = 0 lr = 365 72.91±7.37 72.40±7.63 74.08±6.95 81.18±4.75 81.86±7.50 82.28±7.59 91.46±4.05 85.64±8.83 98.79±0.82 98.85±0.82 71.29±6.26 82.99±5.29 85.39±7.60 90.26±5.56 79.29±9.17 80.16±8.40 99.88±0.15 99.55±0.72 77.33±3.80 81.5±51.54 83.88±2.30 85.92±0.96 71.27±4.53 76.36±1.74 MLR classifier RS lr = 0 lr = 558 66.40±7.55 68.85±6.03 76.23±6.93 81.01±4.62 73.44±7.55 77.21±10.77 82.97±9.16 85.04±5.01 99.03±0.48 98.87±0.54 76.84±11.58 82.57±8.73 92.07±3.52 93.27±3.79 76.08±7.85 75.74±10.34 98.85±2.28 99.52±0.32 76.81±3.38 80.30±2.54 82.43±1.60 84.68±1.39 70.45±3.86 74.75±3.03 Table 4.6: OA, AA, individual classification accuracies [%], and kappa statistic obtained using the MLR probabilistic classifier when applied to the ROSIS University of Pavia hyperspectral data set by using 10 labeled samples per class (in total 90 samples) and un = 700 unlabeled training samples. It is applied two active learning techniques (MBT and nEQB), and the random sampling case (RS) is also reported. lr denotes the number of true labels available in Du (used to implement an optimal version of each sampling algorithm). The standard deviations are also reported for each test. Semi-Supervised Self-Learning for Hyperspectral Image Classification 68 Asphalt (6631) Meadows (18649) Gravel (2099) Trees (3064) Metal Sheets (1345) Bare Soil (5029) Bitumen (1330) Self-Blocking Bricks (3682) Shadow (947) OA AA kappa Probabilistic SVM classifier Supervised MS lr = 0 lr = 454 60.43± 8.23 75.71± 12.63 76.25±9.46 54.36± 9.43 68.35± 7.10 69.95±6.72 62.23± 10.33 75.72±14.03 75.30±11.18 90.75± 7.19 88.77±9.31 88.94±5.97 96.68± 5.68 99.91±0.08 99.90±0.11 62.74± 19.59 87.47±4.81 88.08±5.22 89.90± 5.14 92.47±4.12 93.21±3.22 66.50± 8.44 71.64±18.83 75.52±9.44 99.26± 1.62 99.77±0.19 99.73±0.52 63.68± 4.97 76.27± 4.68 77.47±3.26 75.76± 3.74 84.42±2.22 85.21±1.47 55.48± 5.55 70.40±5.26 71.79±3.71 BT lr = 0 lr = 382 74.38± 7.89 72.82±8.00 79.57± 8.28 78.96±9.21 80.01± 9.72 80.25±8.33 85.14± 8.41 87.80±7.00 99.84± 0.10 99.83±0.12 88.60± 3.92 90.26±2.66 94.38± 3.55 95.67±1.97 80.89± 8.04 80.39±8.07 97.98± 2.74 98.54±1.43 81.85± 4.44 81.95±4.68 86.75± 1.55 87.17±1.45 76.89± 5.19 76.94±5.42 Table 4.7: OA, AA, individual classification accuracies [%], and kappa statistic obtained using the probabilistic SVM classifier when applied to the ROSIS University of Pavia hyperspectral data set by using 10 labeled samples per class (in total 90 samples) and un = 700 unlabeled training samples. It is applied two active learning techniques (MS and BT), and the supervised case is also reported. lr denotes the number of true labels available in Du (used to implement an optimal version of each sampling algorithm). The standard deviations are also reported for each test. 4.4 Experimental results 69 Asphalt (6631) Meadows (18649) Gravel (2099) Trees (3064) Metal Sheets (1345) Bare Soil (5029) Bitumen (1330) Self-Blocking Bricks (3682) Shadow (947) OA AA kappa Probabilistic SVM classifier MBT nEQB lr = 0 lr = 324 lr = 0 lr = 337 72.27± 3.13 70.68±3.72 70.16±8.34 70.01±9.05 63.53± 11.57 64.61±12.56 66.16±13.17 66.62±7.29 72.58± 12.90 75.14±9.40 80.82±9.35 80.05±9.60 92.31± 7.43 92.01±5.25 89.52±7.01 89.43±7.24 99.55± 0.33 99.64±0.34 99.86±0.10 99.86±0.10 77.89± 12.67 78.95±9.39 73.03±10.42 76.96±7.93 94.84± 1.39 95.56±1.67 90.33±3.92 90.79±3.22 74.95± 24.57 81.00±7.17 72.01±5.71 71.82±7.16 97.56± 2.68 99.11±2.16 99.90±0.01 99.88±0.14 72.90± 5.42 73.93±4.95 73.61±3.89 77.02±5.87 82.83± 2.43 84.08±1.46 82.42±1.98 82.82±2.03 66.60± 5.85 67.86±5.40 66.44±6.26 67.08±4.40 RS lr = 0 lr = 557 61.14± 7.06 61.52±5.37 62.35± 12.01 65.95±12.93 70.97± 12.39 70.61±10.49 89.90± 5.20 85.15±7.60 99.71± 0.12 99.69±0.19 75.01± 14.21 73.05±23.65 92.93± 4.66 92.97±3.67 70.04± 12.33 72.23±13.42 99.31± 1.47 99.77±0.26 69.63± 5.25 70.88±5.20 80.15± 2.92 80.11±3.46 62.46± 5.57 63.70±5.70 Table 4.8: OA, AA, individual classification accuracies [%], and kappa statistic obtained using the probabilistic SVM classifier when applied to the ROSIS University of Pavia hyperspectral data set by using 10 labeled samples per class (in total 90 samples) and un = 700 unlabeled training samples. It is applied two active learning techniques (MBT and nEQB), and the random sampling case (RS) is also reported. lr denotes the number of true labels available in Du (used to implement an optimal version of each sampling algorithm). The standard deviations are also reported for each test. Semi-Supervised Self-Learning for Hyperspectral Image Classification 70 4.5 Summary and future directions Figure 4.9: OA (as a function of the number of unlabeled samples) obtained for the ROSIS Pavia University data set using the MLR classifier with BT sampling by using 100 labeled samples per class (in total 900 samples). Two cases are displayed: the one in which all unlabeled samples are estimated by the proposed approach (i.e., lr = 0) and the optimal case, in which true labels are used whenever possible (i.e., lr = ur ). For illustrative purposes, Figs. 4.7 and 4.8 respectively show some of the classification maps obtained by the MLR and probabilistic SVM classifiers for the ROSIS Pavia University dataset, which corresponds to one of the 10 Monte-Carlo runs that were averaged in order to generate the classification scores reported in Tables 4.5 to 4.8. 4.5 Summary and future directions In this chapter, we have developed a new approach for semi-supervised classification of hyperspectral images in which unlabeled samples are intelligently selected using a self-learning approach. Specifically, we automatically select the most informative unlabeled training samples with the ultimate goal of improving classification results obtained using randomly selected training samples. In our semisupervised context, the labels of the selected training samples are estimated by the classifier itself, with the advantage that no extra cost is required for labeling the selected samples when compared to classic (supervised) active learning. Our experimental results, conducted using two different classifiers: MLR and probabilistic SVM, indicate that the proposed approach can greatly increase the classification accuracies obtained in the supervised case through the incorporation of unlabeled samples which can be obtained with very little cost and effort. The obtained results have been compared to the optimal case in which true labels are used, and the differences observed when using estimated samples by our proposed approach were always quite small. This is a good quantitative indicator of the good performance achieved by our proposed approach, which has been illustrated using two hyperspectral scenes collected by different instruments. In future work, we are planning on combining the proposed approach with other probabilistic classifiers. We are also considering the use of expectation-maximization as a form of 71 Semi-Supervised Self-Learning for Hyperspectral Image Classification self-learning [17]. Although in this chapter we focused our experiments on hyperspectral data, the proposed approach can also be applied to other types of remote sensing data, such as multispectral data sets. In fact, since the dimensionality of the considered hyperspectral data sets is quite high, the proposed approach could greatly benefit from the use of feature extraction/selection methods prior to classification in order to make the proposed less sensitive to the Hughes effect [4] and to the possibly very limited initial availability of training samples. This research topic also deserves future attention. Another interesting future research line is to adapt our proposed sample selection strategy (which is based on the selection of individual pixels) to the selection and labeling of spatial sub-regions or boxes withing the image, which could be beneficial in certain applications. Finally, another important research topic deserving future attention is the inclusion of a cost associated to the labels generated by the proposed algorithm. This may allow a better evaluation of the training samples actively selected by our proposed approach. 72 4.5 Summary and future directions Supervised (50.61%) MS (75.87%) BT (76.23%) MBT (68.66%) nEQB (70.47%) RS (63.51%) Figure 4.4: Classification maps and OA (in the parentheses) obtained after applying the probabilistic SVM classifier to the AVIRIS Indian Pines data set by using 10 labeled training samples and 750 unlabeled samples, i.e., ln = 160, un = 750 and lr = 0. 73 74 ln = 135 (SVM classifier) ln = 135 (MLR classifier) Figure 4.6: OA (as a function of the number of unlabeled samples) obtained for the ROSIS Pavia University data set using the MLR (right) and probabilistic SVM (left) classifier, respectively. Estimated labels were used in all the experiments, i.e., lr = 0. ln = 90 (SVM classifier) ln = 45 (SVM classifier) ln = 90 (MLR classifier) ln = 45 (MLR classifier) Semi-Supervised Self-Learning for Hyperspectral Image Classification 4.5 Summary and future directions Supervised (69.25%) MS (82.63%) BT (83.73%) MBT (80.59%) nEQB (77.33%) RS (76.81%) Figure 4.7: Classification maps and OA (in the parentheses) obtained after applying the MLR classifier to the ROSIS Pavia University data set (in all cases, ln = 90 and lr = 0). 75 Semi-Supervised Self-Learning for Hyperspectral Image Classification Supervised (63.68%) MS (76.27%) BT (81.85%) MBT (72.90%) nEQB (77.02%) RS (69.63%) Figure 4.8: Classification maps and OA (in the parentheses) obtained after applying the probabilistic SVM classifier to the ROSIS Pavia University data set (in all cases, ln = 90 and lr = 0). 76 Chapter 5 A New Hybrid Strategy Combining Semi-Supervised Classification and Spectral Unmixing 5.1 Summary Spectral unmixing and classification have been widely used in the recent literature to analyze remotely sensed data. However, few strategies have combined these two approaches in the analysis of hyperspectral data. In this chapter, we propose a new hybrid strategy for semi-supervised classification of hyperspectral data which exploits spectral unmixing and classification concepts (already discussed in previous chapters) in synergistic fashion. During the process, active learning techniques are used in order to select the most informative unlabeled samples in the pool of candidates, thus reducing the computational cost of the process by including only the most informative unlabeled samples. Here, we integrate a well-established discriminative probabilistic classifier (the MLR) with different spectral unmixing chains, thus bridging the gap between unmixing and classification. The effectiveness of the proposed method is evaluated using two real hyperspectral images.9 5.2 Introduction Spectral unmixing and classification are two active areas of research in hyperspectral data interpretation. On the one hand, spectral unmixing is a fast growing area in which many algorithms have been recently developed to retrieve pure spectral components (endmembers) and determine their abundance fractions in mixed pixels, which dominate hyperspectral images [7]. On the other hand, supervised hyperspectral image classification is a difficult task due to the unbalance between the high dimensionality of the data 9 Part of this chapter has been published in: I. Dopido, J. Li, A. Plaza and P. Gamba, Semi-Supervised Classification of Urban Hyperspectral Data Using Spectral Unmixing Concepts, IEEE Urban Remote Sensing Event (JURSE 2013), Sao Paulo, Brazil, 2013 and I. Dopido, J. Li, P. Gamba and A. Plaza, Semi-Supervised Classification of Hyperspectral Data Using Spectral Unmixing Concepts, Tyrrhenian Workshop 2012 on Advances in Radar and Remote Sensing, Naples, Italy, 2012. Also, we are currently working towards the preparation of a journal contribution based on this chapter, to be submitted to the IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. A New Hybrid Strategy Combining Semi-Supervised Classification and Spectral Unmixing and the limited availability of labeled training samples in real analysis scenarios. While the collection of labeled samples is generally difficult, expensive and time-consuming, unlabeled samples can be generated in a much easier way [71]. As indicated in the previous chapter, this observation has fostered the idea of adopting semi-supervised learning techniques in hyperspectral image classification. In remote sensing image classification, it is usually difficult or expensive to get a sufficient number of training samples to develop robust classifiers. Hurdles, i.e., the Hughes phenomenon, come out as the data dimensionality increases. These difficulties have fostered the development of new algorithms. Supervised classifiers, such as the SVM or MLR, excel in using labeled information, and exhibit stateof-the-art performance when dealing with hyperspectral problems [16, 103]. However, when limited labeled information is available, the supervised case is troublesome as the probability distribution of the image can not be properly derived and poor generalization is also a risk. Based on these observations, a recent trend, namely, semi-supervised learning, which integrates labeled and unlabeled samples to learn the classifiers, has been widely studied in the literature for hyperspectral classification problem [82, 83, 84, 86, 104, 105]. This approach was extensively discussed in the previous chapter of the thesis. As it was mentioned in the previous chapter, semi-supervised learning, first known as co-training [74, 106], has evolved into more complex generative models [73, 75], self-learning models [76, 77], multiview learning models [78, 79], TSVMs [40, 41], and graph-based methods [80, 107]. We refer to [17] for a literature review. Most semi-supervised learning algorithms use some type of regularization which encourages the fact that “similar” features belong to the same class. The effect of this regularization is to push the boundaries between classes towards regions of low data density [81], where a rather usual way of building such regularizer is to associate the vertices of a graph to the complete set of samples and then build the regularizer depending on variables defined on the vertices. This trend has been successfully adopted in several remote sensing image classification studies [44, 58, 82, 84, 86, 108]. The aforementioned semi-supervised algorithms generally assume that a limited number of labeled samples are available a priori, and then enlarge the training set using unlabeled samples, thus allowing these approaches to address ill-posed problems. However, in order for this strategy to be successful, several requirements need to be satisfied. First and foremost, the new (unlabeled) samples should be generated without significant cost/effort. Second, the number of unlabeled samples required in order for the semi-supervised classifier to perform properly should not be too high in order to avoid increasing the computational complexity of the classification stage. In other words, as the number of unlabeled samples increases, it may be unbearable for the classifier to properly exploit all the available training samples due to computational issues. Further, if the unlabeled samples are not properly selected, these may confuse the classifier, thus introducing significant divergence or even reducing the classification accuracy obtained with the initial set of labeled samples. In order to address these issues, it is very important that the most highly informative unlabeled samples are identified in computationally efficient fashion, so that significant improvements in classification performance can be observed without the need to use a high number of unlabeled samples. In this chapter, we develop a new approach to perform semi-supervised classification of hyperspectral images by exploiting the information retrieved with spectral unmixing. Our main goal is to synergize two of the most widely used approaches to interpret hyperspectral data into a unified framework which uses active learning techniques for automatically selecting unlabeled samples in semi-supervised fashion. 78 5.3 Proposed approach Specifically, we use active learning to select highly informative unlabeled training samples in order to enlarge the initial (possibly very limited) set of labeled samples and perform semi-supervised classification based on the information provided by a well-established discriminative classifier (MLR [30]) and different spectral unmixing chains [24, 34]. 5.3 Proposed approach The proposed approach consists of three main ingredients: semi-supervised learning (already described in the previous chapter), spectral unmixing (for which we specifically describe the unmixing chains considered in this section), and active learning techniques, which are specifically described before introducing our method. 5.3.1 Considered spectral unmixing chains Several spectral unmixing chains were described in chapters 2 and 3 of this thesis. These chains are based on the well-known linear mixture model [7] presented in subsection 3.3.1 of this document. In this section, we outline the specific spectral unmixing chains that will be used for experiments in this chapter, where some of these developments are described in more detail in [24]: 1. FCLSU-based unmixing (hereinafter, strategy 1), which first assumes that the labeled samples are made up of spectrally pure constituents (endmembers) and then calculates their abundances by means of the FCLSU method and provides a set of fractional abundance maps (one per labeled class) as shown by Fig. 5.1. 2. MTMF-based unmixing (hereinafter, strategy 2), which also assumes that the labeled samples are made up of spectrally pure constituents (endmembers) but now calculates their abundances by means of MTMF [33] method, thus providing a set of fractional abundance maps (one per labeled class) as shown by Fig. 5.2. 3. Unsupervised clustering followed by FCLSU (hereinafter, strategy 3), which is intended to solve the problems highlighted by endmember extraction algorithms which are sensitive to outliers and pixels with extreme values of reflectance. By using an unsupervised clustering method such as the k-means on the original image, the endmembers extracted (from class centers) are expected to be more spatially significant. Then, FLCSU is conducted using the resulting endmembers as shown by Fig. 5.3. 4. Unsupervised clustering followed by MTMF (hereinafter, strategy 4), which is exactly the same as the previous strategy (strategy 3) but this time MTMF method is conducted instead of FCLSU, using the resulting endmembers after k-means clustering. This chain is described in Fig. 3.1, subsection 3.3.2. 79 A New Hybrid Strategy Combining Semi-Supervised Classification and Spectral Unmixing Figure 5.1: Flowchart of the unmixing-based chain designated as strategy 1. Figure 5.2: Flowchart of the unmixing-based chain designated as strategy 2. 80 5.3 Proposed approach Figure 5.3: Flowchart of the unmixing-based designated as strategy 3. 5.3.2 Proposed hybrid strategy Let x ≡ (x1 , . . . , xn ) ∈ Rd×n be a set of d-dimensional feature vectors, and let y ≡ (y1 , . . . , yn ) a set of labels. With this notation in mind, the proposed hybrid classification method can be simply described as follows: pbi (yi = k|xi ) = αf1 (yi = k|xi ) + (1 − α)f2 (yi = k|xi ), (5.1) where pbi (·) is the joint estimate for the kth class, i.e., yi = k, obtained by the classification and unmixing methods given observation xi , where pbi (·) will serve as the indicator, i.e., probability, for the semisupervised active learning. In this work, function f1 (·) is the probability obtained by the classification algorithm, i.e., the MLR classifier; and function f2 (·) is the abundance fraction obtained by any of the spectral unmixing chains presented in subsection 5.3.1. The balance between the classification probabilities and the abundance fractions is controlled by parameter α, where 0 ≤ α ≤ 1. As shown in (5.1), if α = 1, only classification probabilities are considered by the proposed strategy, which leads to the semi-supervised learning strategy presented in previous chapter. On the other hand, if α = 0, only spectral unmixing is taken into account for the proposed strategy. Therefore, by tuning α to a value ranging between 0 and 1, we can adjust the impact between classification and unmixing methods. Moreover, by introducing parameter α, the proposed hybrid strategy takes advantage from both classification and unmixing such that the new unlabeled samples selected are more informative in comparison with those samples selected only from classification or unmixing methods. In the following subsection, we explain in more details how the unlabeled samples are selected using active learning concepts. 81 A New Hybrid Strategy Combining Semi-Supervised Classification and Spectral Unmixing 5.3.3 Active learning In order to train the hybrid classifier described in the previous subsection, active learning techniques are used in order to improve the selection of unlabeled samples for semi-supervised learning. In the process, the candidate set for the active learning process (based on the available labeled and unlabeled samples) is inferred using spatial information (specifically, by applying a first-order spatial neighborhood on available samples) so that high confidence can be expected in the class labels of the obtained candidate set. This idea, which was presented in the previous chapter devoted to self-learning, is similar to human interaction in supervised active learning, where the class labels are known and given by an expert. In a second step, we run active learning to select the most informative samples from the candidate set. This is similar to the machine interaction level in supervised active learning, where in both cases the goal is to find the samples with higher uncertainty. Due to the fact that we use a discriminative classifier (MLR) and spectral unmixing techniques, active learning algorithms which focus on the boundaries between the classes (which are often dominated by mixed pixels) are preferred. In this way, we can combine the properties of the probabilistic MLR classifier and spectral unmixing concepts to find the most suitable (complex) unlabeled samples for improving the classification results through the selected active learning strategy. It should be noted that many active learning techniques are available in the literature [90]. Several of these techniques were inter-compared in the previous chapter. In this chapter, we use only the BT method [21] to evaluate the proposed approach, this method is described in detail in the previous chapter. 5.4 Experimental results In this section, we evaluate the new methodology presented in this chapter using two different hyperspectral images: AVIRIS Indian Pines described in subsection 2.4.1.1 [109], and ROSIS Pavia University described in subsection 3.4.2 [110]. In our experiments with the MLR classifier, we apply the Gaussian RBF kernel to a normalized version of the considered hyperspectral data sets. In all cases, the reported figures of OA, AA, kappa statistic, and class individual accuracies are obtained by averaging the results obtained after conducting 10 independent Monte Carlo runs with respect to the labeled training set from the ground-truth data, where the remaining samples are used for validation purposes. In order to illustrate the good performance of the proposed approach, we use very small labeled training sets on purpose. As a result, the main difficulties that our proposed approach should circumvent can be summarized as follows. First and foremost, it is very difficult for supervised algorithms to provide good classification results as very little information is available about the class distribution. Poor generalization is also a risk when estimating class boundaries in scenarios dominated by limited training samples. Since our approach is semi-supervised, we take advantage of unlabeled samples in order to improve classification accuracy. However, if the number of labeled samples is very small, increasing the number of unlabeled samples could bias the learning process. This effect is explored in the remainder of this section, which is organized as follows. In subsection 5.4.1 we study the impact of parameter α (which balances unmixing and classification in our proposed hybrid strategy) in the final obtained results. Subsections 5.4.2 and 5.4.3 respectively describe the experiments conducted with the AVIRIS 82 5.4 Experimental results Indian Pines and ROSIS Pavia University scenes. In all cases, the results obtained by the supervised version of the considered classifier are also reported for comparative purposes. 5.4.1 Balance between classification and unmixing In this set of experiments, we evaluate the impact of parameter α controlling the relative weight of classification and unmixing in the proposed hybrid classifier. Here, the semi-supervised classifier is trained using only 5 labeled samples per class, using different spectral unmixing chains and testing the following values for parameter α: {1, 0.75, 0.5, 0.25, 0}. In all cases, we execute 300 iterations to actively select 300 unlabeled samples. After an extensive set of experiments with the two considered hyperspectral scenes, Table 5.1 (AVIRIS Indian Pines) and 5.2 (ROSIS Pavia University) reveal that a good compromise value for parameter α is 0.75, which means that classification generally needs more weight than unmixing in order to obtain the best analysis results from our proposed hybrid classifier. This is expected, since the information provided by classification is indeed very important but can be refined by including unmixing information also in the process. The results in this experiment confirm our introspection that the joint exploitation of classification and unmixing provides advantages over the use of either technique alone, at least in the framework of semi-supervised classification using limited training samples. 5.4.2 Results for AVIRIS Indian Pines In this section, the proposed approach is evaluated using the AVIRIS Indian Pines data set described in Fig. 2.3. We consider different numbers of labeled samples per class: 5, 10 and 15. Table 5.3 shows the OA, AA and the kappa statistic obtained by the supervised strategy based on the MLR classifier and by the proposed semi-supervised approach, using two different strategies for active learning: RS and BT (both executed using 300 iterations to actively select 300 unlabeled samples). Table 5.3 also reports the classification results obtained by the hybrid classification-unmixing approach with the MLR classifier and four spectral unmixing chains (with α = 0.75). It should be noted that the selected classification scenario represents a very challenging one. For instance, when 5 labeled samples are used per class, only 80 labeled samples in total are assumed to be available as the initial condition for the considered classifier, which is much lower than the number of spectral bands available in the scene. As we can observe in Table 5.3, the inclusion of unlabeled samples significantly improved the classification results in all cases. If we compare the supervised case with the semi-supervised techniques, we can observe that the unlabeled samples always significantly help to improve the accuracy results. When the proposed strategy (combining classification and spectral unmixing) was used, a significant improvement is observed over the supervised case. As shown by Table 5.3, the classification accuracies increase as the number of labeled training samples increases. This is expected, since the uncertainty of the classifier boundaries decreases as more labeled samples are used in the supervised case. In Fig. 5.4, we evaluated the impact of the number of unlabeled samples on the classification performance achieved by the considered probabilistic classifiers. Specifically, we plot the OAs in classification accuracy obtained by the supervised MLR (trained using only 5, 10 and 15 labeled samples per class) and by the proposed approach (based on the same classifier plus spectral unmixing) using the 83 0.00 52.01 58.90 52.64 60.71 84 ROSIS Pavia University Number of labeled samples per class (l = 5) Parameter (ite.) 1.0 0.75 0.50 0.25 Strategy 1 (300) 75.48 78.05 68.20 72.39 Strategy 2 (300) 75.48 79.39 78.80 77.76 Strategy 3 (300) 75.48 79.53 65.10 66.88 Strategy 4 (300) 75.48 79.53 73.58 71.08 0.00 64.41 71.42 64.82 69.11 Table 5.2: OA [%] obtained or different values of parameter α in the analysis of the ROSIS Pavia University scene with 5 labeled samples per class. The four considered spectral unmixing strategies are compared. The total number of iterations are given in the parentheses. AVIRIS Indian Pines Number of labeled samples per class (l = 5) Parameter (ite.) 1.0 0.75 0.50 0.25 Strategy 1 (300) 65.25 69.10 55.60 55.47 Strategy 2 (300) 65.25 68.89 66.45 68.76 Strategy 3 (300) 65.25 69.83 51.90 59.23 Strategy 4 (300) 65.25 70.45 66.34 60.50 Table 5.1: OA [%] obtained for different values of parameter α in the analysis of the AVIRIS Indian Pines hyperspectral data set with 5 labeled samples per class. The four considered spectral unmixing strategies are compared. The total number of iterations are given in the parentheses. A New Hybrid Strategy Combining Semi-Supervised Classification and Spectral Unmixing BT 68.51±5.16 75.66±2.33 64.40±5.56 BT 70.46±3.16 79.47±1.81 66.65±3.38 Supervised 60.12±3.08 71.74±1.54 55.43±3.20 Supervised 66.20±1.99 77.39±1.06 62.09±2.13 OA AA kappa BT 65.25±5.76 69.98±2.91 60.65±6.05 OA AA kappa OA AA kappa Supervised 51.78±2.51 63.82±2.69 46.26±2.74 5 labeled samples per class RS Strategy 1 Strategy 2 60.03±2.91 69.10±2.66 68.89±4.13 66.31±2.93 72.73±2.24 70.81±2.24 54.81±3.19 64.82±2.94 64.54±4.31 10 labeled samples per class RS Strategy 1 Strategy 2 65.59±2.94 70.91±4.87 74.91±2.10 73.49±1.83 77.13±2.70 78.21±2.36 61.17±3.16 67.08±5.40 70.69±2.40 15 labeled samples per class RS Strategy 1 Strategy 2 70.29±1.93 73.57±3.83 74.04±2.26 78.76±1.09 80.86±1.85 79.76±1.26 66.41±2.13 70.16±4.26 70.51±2.49 85 Strategy 3 76.89±2.56 81.90±1.46 73.71±2.85 Strategy 3 75.05±1.68 78.44±1.96 71.60±1.91 Strategy 3 69.83±1.78 71.37±2.49 65.58±2.08 Strategy 4 76.68±2.04 81.94±1.63 73.53±2.22 Strategy 4 75.29±2.40 79.05±2.00 71.84±2.75 Strategy 4 70.45±1.78 72.32±2.20 66.25±2.16 Table 5.3: OA, AA [%], and kappa statistic obtained using different classifiers when applied to the AVIRIS Indian Pines hyperspectral data set. The standard deviation is also reported in each case. 5.4 Experimental results 86 50 0 55 60 65 50 0 70 75 (a) 5 labeled samples 100 150 200 The number of unlabeled samples 50 60 0 100 150 200 The number of unlabeled samples 300 65 70 75 (c) 15 labeled samples BT RS Strategy 1 Strategy 2 Strategy 3 Strategy 4 250 Overall Accuracy (%) 250 50 100 150 200 The number of unlabeled samples 300 (b) 10 labeled samples BT RS Strategy 1 Strategy 2 Strategy 3 Strategy 4 250 300 Figure 5.4: OA (as a function of the number of unlabeled samples) obtained for the AVIRIS Indian Pines data set by different classifiers. BT is the semi-supervised classifier where unlabeled samples are selected using breaking ties. RS is the semi-supervised classifier where unlabeled samples are selected using random sampling. Finally, Strategy 1 to Strategy 4 denote the semi-supervised hybrid classifier integrating classification and spectral unmixing (with α = 0.75), where unlabeled samples are selected using BT. Overall Accuracy (%) 70 BT RS Strategy 1 Strategy 2 Strategy 3 Strategy 4 Overall Accuracy (%) 75 A New Hybrid Strategy Combining Semi-Supervised Classification and Spectral Unmixing 87 Strategy 2 (74.91) Supervised (60.12%) Strategy 3 (75.05) BT (68.51%) Strategy 4 (75.29) RS (65.59%) Figure 5.5: Classification maps and OAs (in the parentheses) obtained after applying different classifiers to the AVIRIS Indian Pines data set. In all cases the number of labeled samples was 10, and the number of unlabeled samples (used in the semi-supervised strategies: BT, RS, Strategy 1, Strategy 2, Strategy 3 and Strategy 4) was set to 300. Strategy 1 (70.91) Ground-truth 5.4 Experimental results A New Hybrid Strategy Combining Semi-Supervised Classification and Spectral Unmixing four considered strategies for including unmixing information, as a function of the number of unlabeled samples. For the active learning part, we considered two strategies: RS and BT. Again we can observe in Fig. 5.4 how the unlabeled samples help to improve the accuracy of the obtained results. For illustrative purposes, Fig. 5.5 shows some of the classification maps obtained for the AVIRIS Indian Pines scene. These classification maps correspond to one of the 10 Monte-Carlo runs that were averaged in order to generate the classification scores reported in Table 5.3. The advantages obtained by adopting a semi-supervised learning approach which combines classification and unmixing concepts can be clearly appreciated in the classification maps displayed in Fig. 5.5, which also reports the classification OAs obtained for each method in the parentheses. 5.4.3 Results for ROSIS Pavia University The second data set used in this experiment is ROSIS Pavia University described in Fig. 3.5. Table 5.4 shows the OA, AA and the kappa statistic obtained by the supervised strategy (trained with 5, 10 and 15 labeled samples per class) and by the proposed approach (based on the MLR classifier and different spectral unmixing approaches), using two different strategies for active learning: RS and the BT (both executed using 300 iterations to actively select 300 unlabeled samples). Several conclusions can be obtained from the results reported on Table 5.4. The use of unlabeled samples provides advantages with respect to the supervised algorithm alone. In all cases, the proposed unmixing strategies significantly outperform the corresponding supervised algorithm, and the increase in performance is more relevant as the number of unlabeled samples increases. These unlabeled samples are automatically selected by the proposed approach, and represent no cost in terms of data collection or human supervision. In Fig. 5.6 we can observe how the accuracy results improve as the number of unlabeled samples increases. For instance, in the case with 10 labeled samples per class (see Table 5.4), the supervised approach obtained an OA of 69.25%. When the proposed strategy (combining classification and spectral unmixing) was used, the classification accuracies improved to an OA of 83.93% (Strategy 4) which represents a significant improvement over the supervised case. For illustrative purposes, Fig. 5.7 shows some of the classification maps obtained in our experiments. These maps correspond to one of the 10 Monte-Carlo runs that were averaged in order to generate the classification scores reported in Table 5.4. The advantages obtained by adopting a semi-supervised learning approach, which combines classification and unmixing concepts, can be clearly appreciated in the classification maps displayed in Fig. 5.7, which also report the classification OAs obtained for each method in the parentheses. 5.5 Summary and future directions In this chapter, we have presented a new hybrid technique which incorporates the information provided by spectral unmixing concepts in the classification process. In the validation of the method, we have considered four different unmixing-based feature extraction chains and used a limited number of training samples. Unlabeled samples are selected using active learning techniques in order to select the most informative unlabeled samples. The effectiveness of the proposed approach has been illustrated using 88 89 BT 80.70±3.07 82.81±1.55 74.87±3.75 BT 81.00±5.75 83.35±2.82 75.44±3.80 Supervised 72.34±2.22 80.01±2.29 65.21±2.23 OA AA kappa BT 75.48±4.63 79.33±2.62 68.30±5.38 Supervised 69.25±3.75 78.42±1.75 61.69±4.01 Supervised 63.56±4.63 72.93±2.08 54.78±4.45 OA AA kappa OA AA kappa Number of labeled samples per class l=5 RS Strategy 1 Strategy 2 71.80±2.16 78.05±3.20 79.09±1.94 75.45±1.86 79.62±2.45 79.68±3.63 63.50±2.12 71.54±3.69 72.69±2.73 Number of labeled samples per class l = 10 RS Strategy 1 Strategy 2 75.72±2.19 82.86±2.30 83.31±2.04 80.52±1.52 83.48±1.40 83.51±1.95 68.95±2.54 77.68±2.78 78.06±2.55 Number of labeled samples per class l = 15 RS Strategy 1 Strategy 2 76.88±2.08 83.54±2.47 83.47±2.18 81.61±2.46 83.62±2.43 83.60±2.34 70.39±2.09 78.49±2.85 78.56±2.68 Strategy 3 84.83±3.21 85.54±2.08 80.15±3.84 Strategy 3 84.14±1.97 84.48±1.04 79.23±2.37 Strategy 3 79.53±2.12 79.97±2.43 73.05±2.63 Strategy 4 85.25±2.59 85.48±1.22 80.63±3.11 Strategy 4 83.93±1.73 84.83±1.53 78.97±2.07 Strategy 4 79.10±2.34 80.29±2.67 72.48±3.03 Table 5.4: OA, AA [%], and kappa statistic (in the parentheses) obtained using the MLR classifier when applied to the ROSIS Pavia University hyperspectral data set. The total number of labeled samples in each ground-truth class is given in the parentheses. 5.5 Summary and future directions 90 50 0 75 80 85 (a) 5 labeled samples 100 150 200 The number of unlabeled samples 50 Overall Accuracy (%) 0 100 150 200 The number of unlabeled samples 300 70 75 80 85 (c) 15 labeled samples BT RS Strategy 1 Strategy 2 Strategy 3 Strategy 4 250 Overall Accuracy (%) 250 50 100 150 200 The number of unlabeled samples 300 (b) 10 labeled samples BT RS Strategy 1 Strategy 2 Strategy 3 Strategy 4 250 300 Figure 5.6: OA (as a function of the number of unlabeled samples) obtained for the ROSIS Pavia University data set by different classifiers. BT is the semi-supervised classifier where unlabeled samples are selected using breaking ties. RS is the semi-supervised classifier where unlabeled samples are selected using random sampling. Finally, Strategy 1 to Strategy 4 denote the semi-supervised hybrid classifier integrating classification and spectral unmixing (with α = 0.75), where unlabeled samples are selected using BT. 0 65 70 Overall Accuracy (%) 75 BT RS Strategy 1 Strategy 2 Strategy 3 Strategy 4 A New Hybrid Strategy Combining Semi-Supervised Classification and Spectral Unmixing 91 Strategy 2 (77.92) Supervised (69.25%) Strategy 3 (84.14) BT (80.70%) Strategy 4 (83.93) RS (75.72%) Figure 5.7: Classification maps and OAs (in the parentheses) obtained after applying different classifiers to the ROSIS Pavia University data set. In all cases the number of labeled samples was 10, and the number of unlabeled samples (used in the semi-supervised strategies: BT, RS, Strategy 1, Strategy 2, Strategy 3 and Strategy 4) was set to 300. Strategy 1 (82.86) Ground-truth 5.5 Summary and future directions A New Hybrid Strategy Combining Semi-Supervised Classification and Spectral Unmixing two representative hyperspectral images collected by the AVIRIS and ROSIS sensors over a variety of test sites. The experimental results obtained indicate that the combination of spectral unmixing and semisupervised classification leads to a powerful new framework for hyperspectral data interpretation. In the future, we will explore additional strategies to generate unlabeled samples through active learning and also consider additional probabilistic classifiers that can be easily integrated in the proposed framework. 92 Chapter 6 Conclusions and Future Research Lines Spectral unmixing and classification of hyperspectral data have been the main topics addressed in this thesis work. These concepts have rarely been studied together, although they exhibit complementary properties that can offer several advantages when they are applied to hyperspectral image analysis. • On the one hand, classification is a challenging topic. It has been conducted typically using supervised and semi-supervised techniques. But this process has encountered several problems due to the structure of the hyperspectral data and the limited number of available training samples. • On the other hand, spectral unmixing allows one to analyze the properties of each pixel including additional information about the characterization of mixed pixels in hyperspectral data. This information could be effective for the classification process since it provides a kind of soft information that can properly complement the hard output generally provided by classification techniques. These reasons have motivated us to focus this thesis work on the development of new efficient hyperspectral techniques able to combine concepts of spectral unmixing and classification. As a result, a main goal of this thesis is to include detailed information about mixed pixels in the classification process. For this purpose, we have exhaustively studied several processing techniques for joint hyperspectral unmixing and classification. One of the most challenging aspects addressed in this thesis is the high dimensional nature of hyperspectral data; the first two chapters of this thesis focus on how to provide the most suitable input features for the classification process and, particularly, on the possible role of spectral unmixing techniques in this task. In our proposal, the input of the classifier has been replaced by the abundance maps derived by different spectral unmixing chains in order to include additional information about mixed pixels. At the same time, the computational cost of the classification is also significantly reduced because the number of pure materials is usually lower than the number of spectral bands in the original hyperspectral data. We conclude that there are several potential advantages resulting from the use of abundance fractions as input features for classification purposes: in the first place, they supply Conclusions and Future Research Lines information about mixed pixels in hyperspectral data; in the second place, each abundance map can be physically explained as the proportion of each pure material in the data, and in the third place, the use of abundance fractions as input features does not penalize very small classes. Several experiments have been conducted studying different techniques to compute the abundance maps. The effectiveness of such strategy could be appreciated after including spectral unmixing concepts prior to classification fo hyperspectral data. Another important aspect of the thesis is how to increase the set of training samples for semisupervised classification. This is a very important task, due to the limited availability of training samples that introduces many problems in supervised classification. Active learning concepts have been used to develop new self-learning strategies in which the classifier itself selects the most useful unlabeled samples without the need for human interaction. Our semi-supervised approach also includes spatial information as a criterion to select the new samples. In order to retain informative samples, several active learning approaches have been used. In this work, we have used two different probabilistic classifiers to test the presented approach: MLR and a probabilistic SVM. The proposed framework avoids the need to have a large number of training samples in advance, while at the same time it allows for the intelligent generation of unlabeled samples. The effectiveness of the presented framework is illustrated using real hyperspectral datasets. The results obtained reveal that the proposed approach can provide good classification results with very limited labeled samples. A final contribution of the present thesis work is the joint consideration of techniques for spectral unmixing and classification. This has been done by injecting spectral unmixing information in the semisupervised classification process based on self-learning concepts, that we also developed as part of this thesis work. In this case, it is important to define what is the relative weight given to unmixing with regards to classification and vice-versa. Several experiments have been performed in order to analyze this issue. Our conclusion is that there are several potential advantages of jointly considering spectral unmixing and classification, which are apparent in the classification accuracies obtained by the proposed semi-supervised hybrid framework. Summarizing, the innovative contributions presented in this thesis work are related with the joint exploitation of classification and spectral unmixing concepts, and with the intelligent generation of unlabeled samples for semi-supervised learning (called self-learning in this thesis work). Several strategies have been developed for the combination of spectral unmixing and classification under different scenarios (unmixing prior to classification, joint unmixing and classification based on self-learning, etc.). To the best of our knowledge, this study represents one of the first efforts in the literature in order to synergistically exploit two analysis techniques (unmixing and classification) that have been traditionally exploited independently in hyperspectral image analysis. In this regard, the connections and possible bridges between both techniques represent another unique contribution of this thesis. As future work, we are planning on developing computationally efficient implementations of the new techniques developed in this thesis using high performance computing architectures, such as clusters of computers (possibly, with specialized hardware accelerators such as graphics processing units). We are also planning on testing the presented developments on large-scale data repositories, in order to facilitate the processing of larger volumes of data than those reported in this work. In this case, domain adaptation techniques will be certainly needed. Although the results presented in this thesis are focused 94 on a few hyperspectral scenes only (due to the reliable ground-truth and reference information available for those scenes), the extrapolation of these techniques to larger data collections will allow a more detailed assessment of the requirements and benefits of applying the presented approaches in practical scenarios. 95 Conclusions and Future Research Lines 96 Apendix A Publications The results of this thesis work have been published in several international journal papers, book chapters and peer-reviewed international conference papers. The candidate is the first author of 3 JCR journal papers, and 11 peer-reviewed conference papers directly related to this thesis work. The candidate has been a pre-doctoral researcher in the Hyperspectral Computing Laboratory (HyperComp), Department of Technology of Computers and Communications, University of Extremadura, Spain. Below, we provide a description of the publications achieved by the candidate providing also a short description of the journal or workshop where they were presented. A.1 International journal papers. 1. I. Dopido, M. Zortea, A. Villa, A. Plaza and P. Gamba. Unmixing Prior to Supervised Classification of Remotely Sensed Hyperspectral Images. IEEE Geoscience and Remote Sensing Letters, vol. 8, no. 4, pp. 760-764, July 2011 [JCR(2011)=1.560]. This paper was published in the journal IEEE Geoscience and Remote Sensing Letters, which is one of the main journals of the remote sensing category of JCR. It is also in the second quarter of the electrical and electronic engineering category of JCR. The paper explores the use of spectral unmixing for feature extraction prior to classification of hyperspectral data, and constitutes the basis of the second chapter of this thesis. This paper was selected as one of the finalists for the Best Paper Award of the IEEE Geoscience and Remote Sensing Letters in 2011. 2. I. Dopido, A. Villa, A. Plaza and P. Gamba. A Quantitative and Comparative Assessment of Unmixing-Based Feature Extraction Techniques for Hyperspectral Image Classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 5, no. 2, pp. 421-435, April 2012 [JCR(2012)=2.874]. This paper was published in the journal IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, which is a very important journal in the first quarter of the remote sensing and electrical and electronic engineering areas of JCR. This paper provides an experimental comparison of different unmixing chains prior to supervised classification of Publications hyperspectral data, and constitutes the basis for the third chapter of the thesis. 3. I. Dopido, J. Li, P. R. Marpu, A. Plaza, J. M. Bioucas-Dias and J. A. Benediktsson. SemiSupervised Self-Learning for Hyperspectral Image Classification. IEEE Transactions on Geoscience and Remote Sensing, vol. 51, no. 7, pp. 4032-4044, July 2013 [JCR(2012)=3.467]. This paper was published in the IEEE Transactions on Geoscience and Remote Sensing, which is a top scholarly journal in the field of remote sensing. This paper develops a new semi-supervised self-learning framework for hyperspectral classification which constitutes the basis of the fourth chapter presented in this thesis. A.2 Peer-reviewed international conference papers. 1. I. Dopido, P. Gamba and A. Plaza. Spectral Unmixing-Based Post-Processing for Hyperspectral Image Classification. This work was presented as an oral contribution in the IEEE Workshop on Hyperspectral Image and Signal Processing (WHISPERS) held in Gainesville, Florida, in 2013. WHISPERS is one of the most important international workshop specialized in hyperspectral remote sensing. This paper explores the possible use of spectral unmixing as a post-processing strategy to improve the classification results provided by supervised and semi-supervised techniques for hyperspectral image classification. 2. I. Dopido, J. Li, A. Plaza and P. Gamba. Semi-Supervised Classification of Urban Hyperspectral Data using spectral unmixing concepts. This work was presented as a poster in the IEEE Urban Remote Sensing Joint Event (JURSE) held in Sao Paulo, Brazil, in 2013. JURSE is one of the most important international workshops that specialize in urban hyperspectral remote sensing. The paper explores the application of the semi-supervised hybrid strategy described in the fifth chapter of the thesis to urban hyperspectral data. 3. I. Dopido, J. Li, P. Gamba and A. Plaza. Semi-Supervised Classification of Hyperspectral Data Using Spectral Unmixing Concepts. This paper was presented in the Tyrrhenian Workshop 2012 on Advances in Radar and Remote Sensing, held in Naples, Italy, in 2012. This paper presents the semi-supervised hybrid framework for joint unmixing and classification that constitutes the basis of the fifth chapter of the present thesis work. 4. I. Dopido, J. Li, J. Bioucas-Dias and A. Plaza. A New Semi-Supervised Approach for Hyperspectral Image Classification with Different Active Learning Strategies. This work was presented as an oral contribution in the IEEE Workshop on Hyperspectral Image and Signal Processing (WHISPERS) held in Shanghai, China, in 2012. The paper presents a preliminary version of the methodology presented in the fourth chapter of the thesis, focusing on the role of different active learning techniques in the selection of informative unlabeled samples for semi-supervised self-learning. 5. I. Dopido, J. Li, A. Plaza and J. Bioucas-Dias. Semi-Supervised Active Learning for Urban Hyperspectral Image Classification. This work was presented as an oral presentation in the IEEE International Geoscience and Remote Sensing Symposium (IGARSS) held in Munich, Germany, in 98 A.2 Peer-reviewed international conference papers. 2012. This is the most important international workshop in the remote sensing field. The paper described the concept of semi-supervised active learning explored in the fourth chapter of the thesis, and particularly its application to urban data as the paper was invited in a special session devoted to this topic. 6. I. Dopido, J. Li and A. Plaza. Semi-Supervised Active Learning Approach for Hyperspectral Image Classification: Application to Multinomial Logistic Regression and Support Vector Machines. This work was presented as an oral contribution in the SPIE Optics and Photonics conference, which is a very important event held yearly in San Diego, USA. The paper explores the impact of using different probabilistic classifiers in the design of the semi-supervised self-learning method presented in the fourth chapter of this thesis. 7. I. Dopido, A. Villa, A. Plaza and P. Gamba. A Comparative Assessment of Several Processing Chains for Hyperspectral Image Classification: What Features to Use? This work was presented as an oral presentation in the IEEE Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS) held in Lisbon, Portugal, in 2011. The candidate was a member of the organizing committee of this important international workshop in the Lisbon edition. The paper explores the use of different spectral unmixing chains prior to classification of remotely sensed data using supervised techniques, which is presented in detail in the third chapter of the thesis. 8. I. Dopido and A. Plaza. Unmixing Prior to Supervised Classification of Urban Hyperspectral Images. This work was presented as an oral contribution in the IEEE Urban Remote Sensing Joint Event (JURSE) held in Munich, Germany, in 2011. The paper explores the use of unmixing prior to supervised classification in the context of urban areas, as described in the second chapter of the thesis. 9. I. Dopido, A. Villa and A. Plaza. Unsupervised Clustering and Spectral Unmixing for Feature Extraction Prior to Supervised Classification of Hyperspectral Images. This work was presented as an oral contribution in the SPIE Optics and Photonics conference, which is a very important event held yearly in San Diego, USA. The paper describes the use of unsupervised clustering as a mechanism to refine the development of unmixing-based chains for feature extraction prior to supervised classification of hyperspectral data. This technique is one of the unmixing chains compared in the third chapter of the present thesis. 10. M. Rojas, I. Dopido, A. Plaza and P. Gamba. Comparison of Support Vector Machine-Based Processing Chains for Hyperspectral Image Classification. This work was presented as an oral contribution in the SPIE Optics and Photonics conference, which is a very important event held yearly in San Diego, USA. The paper explores the use of different feature extraction approaches prior to supervised classification of hyperspectral scenes using the SVM classifier. The methods described in this contribution are compared in the second chapter of this thesis with the newly developed unmixing-based feature extraction methods presented in the same chapter. 99 Publications 11. A. Plaza, J. Plaza, I. Dopido, G. Martin, M. D. Iordache and S. Sanchez. New Hyperspectral Unmixing Techniques in the Framework of the Earth Observation Optical Data Calibration and Information Extraction (EODIX) Project. This paper, presented as a poster in the 3rd International Symposium on Recent Advances in Quantitative Remote Sensing (RAQS) held in Valencia, Spain, in 2010, summarizes some of the advances in supervised classification using spatial and spectral information developed in the framework of the HYPERCOMP/EODIX project which has partially supported the thesis work of the candidate. 100 Bibliography [1] A. Plaza, J. A. Benediktsson, J. W. Boardman, J. Brazile, L. Bruzzone, G. Camps-Valls, and G. Trianni. Recent advances in techniques for hyperspectral image processing. Remote Sensing of Environment, 113:110–122, 2009. [Cited in pag. 1] [2] C. I. Chang. Hyperspectral imaging: techniques for spectral detection and classification. Kluwer Academic/Plenum Publishers: New York, 2003. [Cited in pags. 2, 11, 30 and 31] [3] C. I. Chang. Recent advances in hyperspectral signal and image processing. John Wiley & Sons: New York, 2007. [Cited in pag. 2] [4] J. A. Richards and X. Jia. Remote sensing digital image analysis: an introduction. Springer, 2006. [Cited in pags. 2, 3, 5, 28 and 72] [5] G. Shaw and D. Manolakis. Signal processing for hyperspectral image exploitation. IEEE Signal Processing Magazine, 19(1):12–16, 2002. [Cited in pag. 2] [6] L. Bruzzone and D. F. Prieto. Automatic analysis of the difference image for unsupervised change detection. IEEE Transactions on Geoscience and Remote Sensing, 38(3):1171–1182, 2000. [Cited in pag. 2] [7] J. M. Bioucas-Dias, A. Plaza, N. Dobigeon, M. Parente, Q. Du, P. Gader, and J. Chanussot. Hyperspectral unmixing overview: Geometrical, statistical and sparse regression-based approaches. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 5(2):354–379, 2012. [Cited in pags. 2, 77 and 79] [8] N. Keshava and J. F. Mustard. Spectral unmixing. IEEE Signal Processing Magazine, 19(1):44–57, 2002. [Cited in pags. 2, 3, 11, 13, 28 and 31] [9] A. Plaza, P. Martinez, R. Perez, and J. Plaza. A quantitative and comparative analysis of endmember extraction algorithms from hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing, 42(3):650–663, 2004. [Cited in pags. 2 and 15] [10] Q. Du, N. Raksuntorn, N. H. Younan, and R. L. King. End-member extraction for hyperspectral image analysis. Applied Optics, 47(28):77–84, 2008. [Cited in pag. 2] [11] N. Keshava. A survey of spectral unmixing algorithms. Lincoln Laboratory Journal, 14(1):55–78, 2003. [Cited in pag. 2] BIBLIOGRAPHY [12] M. Parente and A. Plaza. Survey of geometric and statistical unmixing algorithms for hyperspectral images. In IEEE GRSS Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS’10), pages 1–4, 2010. [Cited in pag. 2] [13] A. Plaza, Q. Du, J. M. Bioucas-Dias, X. Jia, and F. Kruse. Foreword to the special issue on spectral unmixing of remotely sensed data. IEEE Transactions on Geoscience and Remote Sensing, 49(11):4103–4110, 2011. [Cited in pag. 2] [14] D. A. Landgrebe. Signal theory methods in multispectral remote sensing. John Wiley & Sons: New York, 2003. [Cited in pags. 3, 11, 21, 28, 38, 39 and 49] [15] F. Melgani and L. Bruzzone. Classification of hyperspectral remote-sensing images with support vector machines. IEEE Transactions on Geoscience and Remote Sensing, 42(8):1778–1790, 2004. [Cited in pags. 3, 11 and 14] [16] G. Camps-Valls and L. Bruzzone. Kernel-based methods for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 43(6):1351–1362, 2005. [Cited in pags. 3, 11, 28, 53 and 78] [17] X. Zhu. Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison, 2005. [Cited in pags. 4, 50, 72 and 78] [18] R. G. Congalton and K. Green. Assessing the accuracy of remotely sensed data: principles and practices. CRC press, 2008. [Cited in pag. 11] [19] D. C. Heinz and C. I. Chang. Fully constrained least squares linear mixture analysis for material quantification in hyperspectral imagery. IEEE Transactions on Geoscience and Remote Sensing, 39(3):529–545, 2001. [Cited in pags. 11, 14, 19, 30 and 38] [20] R. O. Green, M. L. Eastwooda, C. M. Sarturea, T. G. Chriena, M. Aronssona, J. A. Chippendalea, B. J. Fausta, B. E. Pavria, C. J. Chovita, M. Solisa, M. R. Olaha, and O. Williamsa. Imaging spectroscopy and the airborne visible/infrared imaging spectrometer (aviris). Remote Sensing of Environment, 65(3):227–248, 1998. [Cited in pags. 11, 29 and 52] [21] T. Luo, K. Kramer, D. B. Goldgof, S. Samson, A. Remsen, T. S. Hopkins, and D. Cohn. Active learning to recognize multiple types of plankton. Journal of Machine Learning Research, 3:589–613, 2005. [Cited in pags. 11, 54 and 82] [22] Q. Du, H. Ren, and C. I. Chang. A comparative study for orthogonal subspace projection and constrained energy minimization. IEEE Transactions on Geoscience and Remote Sensing, 41(6):1525–1529, 2003. [Cited in pags. 11 and 17] [23] I. Dopido, A. Villa, and A. Plaza. Unsupervised clustering and spectral unmixing for feature extraction prior to supervised classification of hyperspectral images. In SPIE Optics and Photonics, Satellite Data Compression, Communication, and Processing Conference, 2011. [Cited in pags. 11, 32 and 33] 102 BIBLIOGRAPHY [24] I. Dopido, A. Villa, A. Plaza, and P. Gamba. A quantitative and comparative assessment of unmixing-based feature extraction techniques for hyperspectral image classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 5(2):421–435, 2012. [Cited in pags. 11 and 79] [25] J. M. Bioucas-Dias and J. M. P. Nascimento. Hyperspectral subspace identification. IEEE Transactions on Geoscience and Remote Sensing, 46(8):2435–2445, 2008. [Cited in pags. 11, 15 and 31] [26] P. Comon. Independent component analysis, a new concept? Signal Processing, 36(3):287–314, 1994. [Cited in pags. 11, 14 and 28] [27] J. F. Cardoso. High-order contrasts for independent component analysis. Neural Computation, 11(1):157–192, 1999. [Cited in pags. 11 and 14] [28] J. M. Bioucas-Dias and M. Figueiredo. Logistic regression via variable splitting and augmented Lagrangian tools. Technical report, Instituto Superior Técnico, TULisbon, 2009. [Cited in pags. 11 and 53] [29] J. Li, J. M. Bioucas-Dias, and A. Plaza. Hyperspectral image segmentation using a new bayesian approach with active learning. IEEE Transactions on Geoscience and Remote Sensing, 49(10):3947–3960, 2011. [Cited in pags. 11, 53 and 54] [30] D. Bohning. Multinomial logistic regression algorithm. Annals of the Institute of Statistical Mathematics, 44(1):197–200, 1992. [Cited in pags. 11, 52 and 79] [31] A. A. Green, M. Berman, P. Switzer, and M. D. Craig. A transformation for ordering multispectral data in terms of image quality with implications for noise removal. IEEE Transactions on Geoscience and Remote Sensing, 26(1):65–74, 1988. [Cited in pags. 11, 14 and 28] [32] D. Tuia, M. Volpi, L. Copa, M. Kanevski, and J. Muñoz-Mari. A survey of active learning algorithms for supervised remote sensing image classification. IEEE Journal of Selected Topics in Signal Processing, 5(3):606–617, 2011. [Cited in pags. 11, 51 and 54] [33] J. Boardman. Leveraging the high dimensionality of AVIRIS data for improved subpixel target unmixing and rejection of false positives: mixture tuned matched filtering. Proceedings of the 5th JPL Geoscience Workshop, pages 55–56, 1998. [Cited in pags. 11, 17, 28, 30, 31 and 79] [34] I. Dopido, M. Zortea, A. Villa, A. Plaza, and P. Gamba. Unmixing prior to supervised classification of remotely sensed hyperspectral images. IEEE Geoscience and Remote Sensing Letters, 8(4):760– 764, 2011. [Cited in pags. 11, 28, 31, 37, 38, 39, 40 and 79] [35] D. Tuia, F. Ratle, F. Pacifici, M. F. Kanevski, and W. J. Emery. Active learning methods for remote sensing image classification. IEEE Transactions on Geoscience and Remote Sensing, 47(7):2218– 2232, 2009. [Cited in pags. 11, 51 and 55] 103 BIBLIOGRAPHY [36] J. C. Harsanyi and C. I. Chang. Hyperspectral image classification and dimensionality reduction: An orthogonal subspace projection. IEEE Transactions on Geoscience and Remote Sensing, 32(4):779–785, 1994. [Cited in pags. 11, 15 and 38] [37] J. A. Richards. Analysis of remotely sensed data: the formative decades and the future. IEEE Transactions on Geoscience and Remote Sensing, 43(3):422–432, 2005. [Cited in pags. 11, 13, 14 and 28] [38] F. Dell’Acqua, P. Gamba, A. Ferrari, J. A. Palmason, and J. A. Benediktsson. Exploiting spectral and spatial information in hyperspectral urban data with high resolution. IEEE Geoscience and Remote Sensing Letters, 1(4):322–326, 2004. [Cited in pags. 11, 14 and 29] [39] B. Krishnapuram, L. Carin, M. Figueiredo, and A. Hartemink. Sparse multinomial logistic regression: fast algorithms and generalization bounds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6):957–968, 2005. [Cited in pags. 11 and 53] [40] V. N. Vapnik. Statistical learning theory. John Wiley, New York, 1998. [Cited in pags. 11, 50, 53, 57 and 78] [41] T. Joachims. Transductive inference for text classification using support vector machines. In Proceedings of the Sixteenth International Conference on Machine Learning, volume 99 of ICML’99, pages 200–209, 1999. [Cited in pags. 11, 50 and 78] [42] J. M. P. Nascimento and J. M. Bioucas-Dias. Vertex component analysis: a fast algorithm to unmix hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing, 43(4):898–910, 2005. [Cited in pags. 11 and 15] [43] Q. Du and C. I. Chang. Estimation of number of spectrally distinct signal sources in hyperspectral imagery. IEEE Transactions on Geoscience and Remote Sensing, 42(3):608–619, 2004. [Cited in pags. 11, 15 and 31] [44] A. Plaza, J. A. Benediktsson, J. W. Boardman, J. Brazile, L. Bruzzone, G. Camps-Valls, J. Chanussot, M. Fauvel, P. Gamba, A. Gualtieri, M. Marconcini, J. C. Tilton, and G. Trianni. Recent advances in techniques for hyperspectral image processing. Remote Sensing of Environment, 113:110–122, 2009. [Cited in pags. 13, 14, 28 and 78] [45] J. B. Adams, M. O. Smith, and P. E. Johnson. Spectral mixture modeling: a new analysis of rock and soil types at the viking lander 1 site. Journal of Geophysical Research, 91(B8):8098–8112, 1986. [Cited in pag. 14] [46] G. Camps-Valls, L. Gomez-Chova, J. Muñoz-Mari, J. Vila-Frances, and J. Calpe-Maravilla. Composite kernels for hyperspectral image classification. IEEE Geoscience and Remote Sensing Letters, 3(1):93–97, 2006. [Cited in pags. 14 and 28] [47] A. Plaza, P. Martine, J. Plaza, and R. Perez. Dimensionality reduction and classification of hyperspectral image data using sequences of extended morphological transformations. IEEE Transactions on Geoscience and Remote Sensing, 43(3):466–479, 2005. [Cited in pags. 14 and 28] 104 BIBLIOGRAPHY [48] M. Rojas, I. Dopido, A. Plaza, and P. Gamba. Comparison of support vector machine-based processing chains for hyperspectral image classification. In SPIE Optics and Photonics, Satellite Data Compression, Communication, and Processing Conference, 2010. [Cited in pags. 14, 28 and 38] [49] J. A. Benediktsson, J. A. Palmason, and J. R. Sveinsson. Classification of hyperspectral data from urban areas based on extended morphological profiles. IEEE Transactions on Geoscience and Remote Sensing, 43(3):480 – 491, 2005. [Cited in pags. 14 and 52] [50] B. Luo and J. Chanussot. Hyperspectral image classification based on spectral and geometrical features. Proceedings of IEEE International Workshop on Machine Learning for Signal Processing, pages 1–6, 2009. [Cited in pag. 14] [51] B. Luo and J. Chanussot. Unsupervised classification of hyperspectral images by using linear unmixing algorithm. Proceedings of IEEE International Conference on Image Processing, pages 2877–2880, 2009. [Cited in pags. 14 and 37] [52] M. E. Winter. N-FINDR: an algorithm for fast autonomous spectral end-member determination in hyperspectral data. Proceedings of SPIE Image Spectrometry V, 3753:266–277, 1999. [Cited in pag. 15] [53] M. Zortea and A. Plaza. Spatial preprocessing for endmember extraction. IEEE Transactions on Geoscience and Remote Sensing, 47(8):2679–2693, 2009. [Cited in pag. 16] [54] I. Dopido and A. Plaza. Unmixing prior to supervised classification of urban hyperspectral images. In 6th IEEE GRSS/ISPRS Joint Workshop on Remote Sensing and Data Fusion over Urban Areas (JURSE’11), pages 97–100, 2011. [Cited in pag. 17] [55] G. M. Foody. Thematic map comparison: Evaluating the statistical significance of differences in classification accuracy. Photogrammetric Engineering and Remote Sensing, 70(5):627–634, 2004. [Cited in pags. 21 and 40] [56] G. F. Hughes. On the mean accuracy of statistical pattern recognizers. IEEE Transactions on Information Theory, 14(1):55–63, 1968. [Cited in pag. 28] [57] K. Fukunaga. Introduction to statistical pattern recognition. CA: Academic Press, 1990. [Cited in pag. 28] [58] L. Bruzzone, M. Chi, and M. Marconcini. A novel transductive SVM for the semisupervised classification of remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 44(11):3363–3373, 2006. [Cited in pags. 28, 50, 54 and 78] [59] L. O. Jimenez and D. A. Landgrebe. Supervised classification in high dimensional space: geometrical, statistical and asymptotical properties of multivariate data. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, 28(1):39–54, 1993. [Cited in pag. 28] 105 BIBLIOGRAPHY [60] Q. Jackson and D. A. Landgrebe. An adaptive classifier design for high dimensional data analysis with a limited training data set. IEEE Transactions on Geoscience and Remote Sensing, 39(12):2664–2679, 2001. [Cited in pag. 28] [61] I. Dopido, A. Villa, A. Plaza, and P. Gamba. A comparative assessment of several processing chains for hyperspectral image classification: what features to use? Proceedings of the IEEE/GRSS Workshop on Hyperspectral Image and Signal Processing: Evolution in remote Sensing (WHISPERS’11), pages 1–4, 2011. [Cited in pag. 28] [62] C. I. Chang, J. M. Liu, B. C. Chieu, H. Ren, C. M. Wang, C. S. Lo, P. C. Chung, C. W. Yang, and D. J. Ma. Generalized constrained energy minimization approach to subpixel target detection for multispectral imagery. Optical Engineering, 39(5):1275–1281, 2000. [Cited in pags. 28 and 31] [63] J. A. Hartigan and M. A. Wong. Algorithm as 136: a k-means clustering algorithm. Journal of the Royal Statistical Society, Series C (Applied Statistics), 28(1):100–108, 1979. [Cited in pag. 31] [64] L. Wang and X. Jia. Integration of soft and hard classifications using extended support vector machines. IEEE Geoscience and Remote Sensing Letters, 6(3):543–547, 2009. [Cited in pag. 37] [65] A. Villa, J. Chanussot, J. A. Benediktsson, and C. Jutten. Spectral unmixing for the classification of hyperspectral images at a finer spatial resolution. IEEE Journal of Selected Topics in Signal Processing, 5(3):521–533, 2011. [Cited in pag. 37] [66] F. A. Mianji and Y. Zhang. SVM-based unmixing-to-classification conversion for hyperspectral abundance quantification. IEEE Transactions on Geoscience and Remote Sensing, 49(11):4318– 4327, 2011. [Cited in pag. 37] [67] J. C. Bezdek. Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York, 1981. [Cited in pag. 38] [68] B. Mojaradi, H. Abrishami-Moghaddam, M. J. V. Zoej, and R. P. W. Duin. Dimensionality reduction of hyperspectral data via spectral feature extraction. IEEE Transactions on Geoscience and Remote Sensing, 47(7):2091–2105, 2009. [Cited in pag. 39] [69] S. Garcia, A. Fernandez, J. Luengo, and F. Herrera. Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Information Sciences, 180(10):2044–2064, 2010. [Cited in pag. 40] [70] B. Schölkopf, A. Smola, and K. R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural computation, 10(5):1299–1319, 1998. [Cited in pag. 44] [71] F. Bovolo, L. Bruzzone, and L. Carline. A novel technique for subpixel image classification based on support vector machine. IEEE Transactions on Image Processing, 19(11):2983–2999, 2010. [Cited in pags. 50 and 78] 106 BIBLIOGRAPHY [72] B. M. Shahshahani and D. A. Landgrebe. The effect of unlabeled samples in reducing the small sample size problem and mitigating the hughes phenomenon. IEEE Transactions on Geoscience and Remote Sensing, 32(5):1087 –1095, 1994. [Cited in pag. 50] [73] S. Baluja. Probabilistic modeling for face orientation discrimination: learning from labeled and unlabeled data. In Neural Information Procesing systems (NIPS ’98), 1998. [Cited in pags. 50 and 78] [74] T. Mitchell. The role of unlabeled data in supervised learning. In Proceedings of the Sixth International Colloquium on Cognitive Science, pages 2–11, 1999. [Cited in pags. 50 and 78] [75] A. Fujino, N. Ueda, and K. Saito. A hybrid generative/discriminative approach to semi-supervised classifier design. In AAAI’05 Proceedings of the 20th National Coference on Artificial Intelligence, volume 20, page 764, 2005. [Cited in pags. 50 and 78] [76] D. Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd annual meeting on Association for Computational Linguistics, ACL’95, pages 189–196, 1995. [Cited in pags. 50 and 78] [77] C. Rosenberg, M. Hebert, and H. Schneiderman. Semi-supervised self-training of object detection models. In Seventh IEEE Workshop on Applications of Computer Vision, 2005. [Cited in pags. 50 and 78] [78] V. R. De Sa. Learning classification with unlabeled data, 1994. [Cited in pags. 50 and 78] [79] U. Brefeld, T. Gartner, T. Scheffer, and S. Wrobel. Efficient co-regularised least squares regression. In Proceedings of the 23rd international conference on Machine learning, ICML’06, pages 137–144, 2006. [Cited in pags. 50 and 78] [80] A. Blum and S. Chawla. Learning from labeled and unlabeled data using graph mincuts. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML’01, pages 19– 26, 2001. [Cited in pags. 50 and 78] [81] O. Chapelle, M. Chi, and A. Zien. A continuation method for semi-supervised SVMs. In Proceedings of the 23rd International Conference on Machine Learning, pages 185–192. ACM Press, 2006. [Cited in pags. 50 and 78] [82] G. Camps-Valls, T. Bandos, and D. Zhou. Semi-supervised graph-based hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 45(10):3044–3054, 2007. [Cited in pags. 50, 54 and 78] [83] S. Velasco-Forero and V. Manian. Improving hyperspectral image classification using spatial preprocessing. IEEE Geoscience and Remote Sensing Letters, 6(2):297–301, 2009. [Cited in pags. 50 and 78] [84] D. Tuia and G. Camps-Valls. Semisupervised remote sensing image classification with cluster kernels. IEEE Geoscience and Remote Sensing Letters, 6(2):224 –228, 2009. [Cited in pags. 50 and 78] 107 BIBLIOGRAPHY [85] J. Li, J. M. Bioucas-Dias, and A. Plaza. Semi-supervised hyperspectral classification. In First IEEE GRSS Workshop on Hyperspectral Image and Signal Processing, 2009. [Cited in pag. 50] [86] J. Li, J. M. Bioucas-Dias, and A. Plaza. Semi-supervised hyperspectral image segmentation using multinomial logistic regression with active learning. IEEE Transactions on Geoscience and Remote Sensing, 48(11):4085–4098, 2010. [Cited in pags. 50, 51, 54 and 78] [87] L. Bruzzone and C. Persello. A novel context-sensitive semisupervised SVM classifier robust to mislabeled training samples. IEEE Transactions on Geoscience and Remote Sensing, 47(7):2142 –2154, 2009. [Cited in pag. 50] [88] J. Muñoz-Mari, F. Bovolo, L. Gomez-Chova, L. Bruzzone, and G. Camps-Valls. Semisupervised one-class support vector machines for classification of remote sensing data. IEEE Transactions on Geoscience and Remote Sensing, 48(8):3188 –3197, 2010. [Cited in pag. 50] [89] L. Gomez-Chova, G. Camps-Valls, L. Bruzzone, and J. Calpe-Maravilla. Mean map kernel methods for semisupervised cloud classification. IEEE Transactions on Geoscience and Remote Sensing, 48(1):207 –220, 2010. [Cited in pag. 50] [90] D. Tuia and G. Camps-Valls. Urban image classification with semisupervised multiscale cluster kernels. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 4(1):65 –74, 2011. [Cited in pags. 50, 54 and 82] [91] F. Ratle, G. Camps-Valls, and J. Weston. Semisupervised neural networks for efficient hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 48(5):2271 –2282, 2010. [Cited in pag. 50] [92] J. Muñoz-Mari, D. Tuia, and G. Camps-Valls. Semisupervised classification of remote sensing images with active queries. IEEE Transactions on Geoscience and Remote Sensing, 50(10):3751 – 3763, 2012. [Cited in pag. 51] [93] S. Rajan, J. Ghosh, and M. M. Crawford. An active learning approach to hyperspectral data classification. IEEE Transactions on Geoscience and Remote Sensing, 46(4):1231–1242, 2008. [Cited in pag. 51] [94] W. Di and M. M. Crawford. Active learning via multi-view and local proximity co-regularization for hyperspectral image classification. IEEE Journal of Selected Topics Signal Processing, 5(3):618– 628, 2011. [Cited in pag. 51] [95] S. Patra and L. Bruzzone. A batch-mode active learning technique based on multiple uncertainty for SVM classifier. IEEE Geoscience and Remote Sensing Letters, 9(3):497 –501, 2012. [Cited in pag. 51] [96] I. Dopido, J. Li, and A. Plaza. Semi-supervised active learning approach for hyperspectral image classification: Application to multinomial logistic regression and support vector machines. In SPIE Optics and Photonics, Satellite Data Compression, Communication, and Processing Conference, 2012. [Cited in pag. 52] 108 BIBLIOGRAPHY [97] J. Li, J. M. Bioucas-Dias, and A. Plaza. Spectral-spatial hyperspectral image segmentation using subspace multinomial logistic regression and markov random fields. IEEE Transactions on Geoscience and Remote Sensing, 50(3):809–823, 2012. [Cited in pag. 53] [98] J. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in large margin classifiers, volume 10, pages 61–74. MIT Press, 2000. [Cited in pags. 53 and 54] [99] T. F. Wu, C. J. Lin, and R. C. Weng. Probability estimates for multiclass classification by pairwise coupling. Journal of Machine Learning Research, 5:975–1005, 2004. [Cited in pag. 54] [100] Y. Tarabalka, M. Fauvel, J. Chanussot, and J. A. Benediktsson. SVM- and MRF-based method for accurate classification of hyperspectral images. IEEE Geoscience and Remote Sensing Letters, 7(4):736 –740, 2010. [Cited in pag. 54] [101] I. Dopido, J. Li, P. Gamba, and A. Plaza. A new semi-supervised approach for hyperspectral image classification with different active learning strategies. In IEEE GRSS Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS’12), 2012. [Cited in pag. 58] [102] I. Dopido, J. Li, A. Plaza, and J. M. Bioucas-Dias. Semi-supervised active learning for urban hyperspectral image classification. In IEEE Geoscience and Remote Sensing Symposium (IGARSS’12), pages 1586–1589, 2012. [Cited in pag. 58] [103] J. Borges, J. M. Bioucas-Dias, and A. Marçal. Evaluation of bayesian hyperspectral imaging segmentation with a discriminative class learning. In Proceedings of IEEE International Geoscience and Remote sensing Symposium, pages 3810–3813, 2007. [Cited in pag. 78] [104] J. Li, J. M. Bioucas-Dias, and A. Plaza. Supervised hyperspectral image segmentation using active learning. In IEEE GRSS Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS’10), pages 1–4, 2010. [Cited in pag. 78] [105] J. Li, J. M. Bioucas-Dias, and A. Plaza. Exploiting spatial information in semi-supervised hyperspectral image segmentation. In IEEE GRSS Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS’10), pages 1–4, 2010. [Cited in pag. 78] [106] A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In In Proceedings of the eleventh annual conference on Computational learning theory, pages 92–100, 1998. [Cited in pag. 78] [107] X. Zhu, Z. Ghahramani, T. Jaakkola, and M. Ii. Semi-supervised learning with graphs (technical report cmu-cald-02-106). Technical report, Carnegie Mellon University, 2005. [Cited in pag. 78] [108] Y. Zhong, L. Zhang, B. Huang, and L. Pingxiang. An unsupervised artificial immune classifier for multi/hyperspectral remote sensing imagery. IEEE Transations on Geoscience and Remote Sensing, 44(2):420–431, 2006. [Cited in pag. 78] 109 BIBLIOGRAPHY [109] I. Dopido, J. Li, P. Gamba, and A. Plaza. Semi-supervised classification of hyperspectral data using spectral unmixing concepts. In In Advances in Radar and Remote Sensing (TyWRRS), pages 353–358, 2012. [Cited in pag. 82] [110] I. Dopido, J. Li, A. Plaza, and P. Gamba. Semi-supervised classification of urban hyperspectral data using spectral unmixing concepts. In 8th IEEE GRSS/ISPRS Joint Workshop on Remote Sensing and Data Fusion over Urban Areas (JURSE’13), pages 186–189, 2013. [Cited in pag. 82] 110
Similar documents
TexPoint fonts used in EMF.
J. Eckstein, D. Bertsekas, “On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators”, Math. Progr., vol. 5, pp. 293-318, 1992 E. Esser, “Application...
More information