Combining vegetation indices, constrained ordination and fuzzy
Transcription
Combining vegetation indices, constrained ordination and fuzzy
Remote Sensing of Environment 114 (2010) 1155–1166 Contents lists available at ScienceDirect Remote Sensing of Environment j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / r s e Combining vegetation indices, constrained ordination and fuzzy classification for mapping semi-natural vegetation units from hyperspectral imagery Jens Oldeland a,b,⁎, Wouter Dorigo c, Lena Lieckfeld a,b, Arko Lucieer d, Norbert Jürgens a a Biocentre Klein Flottbek and Botanical Garden, University of Hamburg, Ohnhorststr. 18, 22609, Hamburg, Germany German Aerospace Center, 82203 Oberpfaffenhofen, Germany Institute of Photogrammetry and Remote Sensing, University of Technology, Gusshausstrasse 27-29, 1040 Vienna, Austria d School of Geography and Environmental Studies, University of Tasmania, Private Bag 76, Hobart 7001, Tasmania, Australia b c a r t i c l e i n f o Article history: Received 19 August 2009 Received in revised form 4 January 2010 Accepted 9 January 2010 Keywords: Cluster analysis Redundancy analysis Multivariate Supervised fuzzy c-means Semiarid Rangeland Namibia Imaging spectroscopy a b s t r a c t Vegetation mapping of plant communities at fine spatial scales is increasingly supported by remote sensing technology. However, combining ecological ground truth information and remote sensing datasets for mapping approaches is complicated by the complexity of ecological datasets. In this study, we present a new approach that uses high spatial resolution hyperspectral datasets to map vegetation units of a semiarid rangeland in Central Namibia. Field vegetation surveys provide the input to the workflow presented in this study. The collected data were classified by hierarchical cluster analysis into seven vegetation units that reflect different ecological states occurring in the study area. Spectral indices covering vegetation and soil characteristics were calculated from hyperspectral remote sensing imagery and used as environmental variables in a constrained ordination by applying redundancy analysis (RDA). The resulting statistical relationships between vegetation data and spectral indices were transferred into images of ordination axes, which were subsequently used in a supervised fuzzy c-means classification approach relying on a k-NN distance metric. Membership images for each vegetation unit as well as a confusion image of the classification result allowed a sound ecological interpretation of the resulting hard classification map. Classification results were validated with two independent reference datasets. For an internal and external validation dataset, overall accuracy reached 98% and 64% with kappa values of 0.98 and 0.53, respectively. Critical steps during the mapping workflow were highlighted and compared with similar mapping approaches. © 2010 Elsevier Inc. All rights reserved. 1. Introduction Vegetation mapping aims to accurately identify the distribution of different types of vegetation in a defined area. The resulting maps can be seen as a baseline inventory to assist natural resource or conservation management and land use planning. Depending on the scale and geographical context, vegetation can be described by its physiognomical– ecological characteristics leading to so-called formations such as grassland, shrubland or forest. These descriptions are based on dominant life forms and the main vegetation structure and can be found in many land cover descriptions suitable for coarse spatial resolutions (McDermid et al., 2005). On the other hand, floristically defined plant communities, based on e.g. diagnostic and differential plant species are often used for vegetation mapping (Chytrý & Tichý, 2003). The plant community based mapping approach is mainly used on a local or regional scale and yields species lists for all existing plant communities, ⁎ Corresponding author. University of Hamburg, Biocentre Klein Flottbek and Botanical Garden, Ohnhorststr. 18, 22609, Hamburg, Germany. Tel.: +49 40 42816 407; fax: +49 40 42816 539. E-mail address: Oldeland@botanik.uni-hamburg.de (J. Oldeland). 0034-4257/$ – see front matter © 2010 Elsevier Inc. All rights reserved. doi:10.1016/j.rse.2010.01.003 giving more precise information on plant diversity and conservation status (Amarnath et al., 2003; Van Rooyen et al., 2008). In both cases, field surveys for vegetation mapping are cost- and labor intensive. Especially in remote areas like the polar regions or many arid ecosystems, ground based mapping becomes logistically more challenging. In the last decades, remote sensing has significantly contributed to vegetation mapping of remote areas and for mapping structurally defined vegetation units on global, regional, and local extents (Cihlar, 2000; Gamon et al., 2004; McDermid et al., 2005). Extensive vegetation surveys allow the combination of floristically defined plant communities with satellite data in order to map spatial distribution of vegetation (Aragon & Oesterheld, 2008; Zak & Cabido, 2002). Over smaller extents, airborne sensors have been used successfully for the mapping of floristically defined vegetation units (Lewis, 2002; Schmidt & Skidmore, 2003; Thomas et al., 2003). These studies used hyperspectral systems with an increased spectral resolution. Hyperspectral sensors measure a large number of spectral bands, which provide a near-continuous spectrum covering a large range of wavelengths from the visual near infrared (VNIR) to the shortwave infrared (SWIR). While the VNIR region provides information specifically on leaf pigments and vegetation structure, bands in the SWIR 1156 J. Oldeland et al. / Remote Sensing of Environment 114 (2010) 1155–1166 region are known to enhance characterization of vegetation, especially in semiarid areas, by providing detailed information on woody components and water content of the vegetation (Asner & Heidebrecht, 2002; He et al., 2006; Lucas et al., 2008; Ustin et al., 2004). Many studies applying hyperspectral data have used information on the difference in reflectance values in single or combined bands (Liesenberg et al., 2007; Lucas et al., 2008). This approach suffers from two problems. Firstly, reflectance values of single bands often perform poorly when used for discriminating vegetation classes with similar species composition (Thomas et al., 2003). Secondly, a high correlation between multiple bands can lead to erroneous results when classification techniques depending on regression analysis, such as linear discriminant analysis, are applied (Hansen & Schjoerring, 2003). Information from different parts of the measured spectrum is often combined to form what is called a spectral vegetation index (VI). The spectral bands used to form the VI are selected and combined in a way to enhance spectral features related to the variable of interest while reducing undesired effects caused by variations in soil reflectance, sun and view geometry, atmospheric composition, and other leaf or canopy properties (Dorigo et al., 2007). The normalized difference vegetation index (NDVI) has become a standard remote sensing product for ecological applications (Pettorelli et al., 2005) and it has been widely applied for discriminating and interpreting mapped vegetation units (Hong et al., 2004; Rahman & Gamon, 2004). However, only few studies incorporated other spectral indices for vegetation mapping and these studies mainly used coarse multispectral satellite data (de la Cueva, 2008; Hong et al., 2004). Indices specifically designed for hyperspectral remote sensing data (hereafter called “hyperspectral indices”) take advantage of the detailed narrow-band information or the large number of contiguous bands provided by such data. While many hyperspectral indices only use the bands in the VNIR some also make use of the SWIR region. From the plethora of available spectral indices (Treitz & Howarth, 1999; Ustin et al., 2005) many have not been tested for vegetation mapping. Two main strategies can be identified in remote sensing methods for vegetation mapping: The first strategy involves classification of the spectral information, either on a per-pixel or on a sub-pixel basis. The traditional approach is to use supervised classification of remote sensing data based on a priori knowledge of land cover. Maximum likelihood classifiers are commonly used for multispectral data whereas the spectral angle mapper is a frequently used method for classifying hyperspectral data (Richards & Xiuping, 2006). Both classifiers lead to a vegetation map consisting of hard boundaries. Yet, for representing vegetation in seminatural landscapes, where ecotones are important landscape structures (Arnot & Fisher, 2007), a continuous or fuzzy interpretation of vegetation becomes increasingly important (Foody, 1992; Lees, 2006; Lucieer, 2006; Moraczewski, 1993; Schmidtlein & Sassin, 2004). Fuzzy classification techniques have been recognized as a suitable tool to map (semi-) natural vegetation units because they allow a soft overlap of several hard classes (Foody, 1992; Lu & Weng, 2007; Lucieer, 2006). Despite their great potential to map and identify continuous natural vegetation, supervised fuzzy classification algorithms are not frequently employed for mapping vegetation units. The second strategy comprises multivariate techniques, such as Canonical Correspondence Analysis (CCA) (Ter Braak, 1987) or redundancy analysis (RDA) (van den Wollenberg, 1977), that create a relationship between detailed quantitative information on vegetation, e.g. species composition, vegetation cover or other structural parameters and spectral information (Brook & Kenkel, 2002; Thomas et al., 2003). So far, this strategy has been less frequently used for vegetation mapping than image classification, although examples are increasingly found in recent literature (de la Cueva, 2008; Dobrowski et al., 2008; Jensen & Azofeifa, 2006; Malik & Husain, 2008; Yue et al., 2008). The strength of the multivariate approach is that it uses the full information on species composition by simultaneously relating each single recorded species to the data matrix on spectral information, such as sensor bands or spectral indices. This leads to an ordination space where the ordination axes reflect the statistical relationship between species and spectral information, putting species with a similar relationship to the indices in order along the axes. There are different views on how vegetation should be represented in the multivariate approach and how it should be related to spectral information. On the one hand, cluster analysis of vegetation datasets allows finding discrete units based on floristic data. These discrete groups are easy to handle and can be used in further analysis of spectral data (Lewis, 1998; Thomas et al., 2003). On the other hand, vegetation can be interpreted as a continuum consisting of transitions between plant communities. Ordination techniques such as Nonmetric Multidimensional Scaling (NMDS) or Detrendend Correspondence Analysis (DCA) arrange vegetation data along indirect floristic gradients displayed by ordination axes, which can be used for further analysis (Schmidtlein & Sassin, 2004; Schmidtlein et al., 2007). Several authors have combined cluster analysis with constrained ordination techniques such as Canonical Correspondence Analysis (CCA) using spectral bands or principal components of the satellite image as constraining variables (Armitage et al., 2000; Dirnböck et al., 2003; Ohmann & Gregory, 2002; Thomas et al., 2003). CCA assumes that species have an optimum along an environmental gradient resulting in a hump-shaped (unimodal) response (Jongman et al., 1995). Therefore, CCA calculates Gaussian canonical regressions, i.e. using polynomials for each explanatory variable, where the species– environment correlation is based on weighted averages using the Chisquare metric. Another constrained ordination technique similar to CCA is redundancy analysis (RDA) (van den Wollenberg, 1977), which relies on multiple linear regressions calculated from the weighted sums of the Euclidean distances between two matrices. The assumed linear relationship between species and environment implies a monotonic increase or decrease of species abundance or occurrence along an environmental gradient. Depending on the dataset and the underlying assumptions of the study aim, it can be useful to check whether RDA or CCA is the better choice. In spite of the possibilities to use spectral indices as explanatory variables in constrained ordination, only a few studies combined other indices other than NDVI in RDA or CCA to create relationships between ground-checked vegetation units and canopy properties measured by spectral indices (de la Cueva, 2008; Goodin et al., 2004). In this study, we present an approach for combining hyperspectral remote sensing data with field survey information on plant species composition and plant cover in order to produce a map of floristically defined vegetation units. We apply the method to a dwarf shrub savannah in Central Namibia at a spatial resolution of 5 m over an area of 19.5 km2. We aim at developing high resolution vegetation maps based on the relationship between classified field observation data and a set of hyperspectral vegetation and soil indices, established by a constrained ordination technique. Two independent test datasets are used for validation. The potential and shortcomings of our methodology are critically discussed with regard to other approaches. 2. Material and methods 2.1. Study area The study area comprises 19.5 km2 of gently undulating rangelands northwest of the town of Rehoboth, Namibia (23° 7′ 13.08″ S, 16° 53′ 47.40″ W). The climate is semiarid receiving 250 mm annual rainfall with mean annual temperatures of 20 °C. Vegetation is mainly a dwarf shrub savannah but is heavily modified by land use. The in situ field data were sampled on two farms with contrasting management strategies; the farm Narais applies an extensive grazing strategy with mainly cattle in a camp-rotation system. The second farm, Duruchaus, is intensively grazed with sheep and goats. Azonal vegetation occurs in and around clay pans and some thickets dominated by Acacia mellifera on rocky red soils occur mainly on Duruchaus. J. Oldeland et al. / Remote Sensing of Environment 114 (2010) 1155–1166 2.2. Methodological overview Our mapping approach consists of five separate steps. The first two steps comprise vegetation sampling and identification of vegetation units as well as acquisition and processing of airborne hyperspectral remote sensing data. This is followed by the calculation of a set of spectral indices that were already applied in other studies for semiarid savannahs. Constrained ordination allows vegetation samples to be related to spectral indices and allows for a derivation of ordination axes reflecting their statistical relationship. Ordination axes that show a good statistical fit with the spectral indices were converted to a set of ordination images. Finally, the new dataset is classified into quantitative vegetation distribution maps for each single vegetation unit using a supervised fuzzy classification technique. The whole procedure is explained in greater detail in the following sections. 2.3. Field sampling Field sampling was carried out in April 2007 on Narais and Duruchaus in the Rehoboth district, Central Namibia. 89 vegetation plots were placed by a preferential sampling design relying mainly on image interpretation and accessibility from roads or farm tracks. This was done in order to maximize the variation in detectable vegetation units according to the remotely sensed image and minimizing sampling effort. To ensure spatial compatibility between ground data and pixel resolution we applied a plot size of 25 m × 25 m following Justice and Townshend (1981) who provided a formula (Eq. (1)) to calculate the optimal size of vegetation plots in relation to the pixel size and the geometric accuracy of the imagery. A is the area to be sampled, in this case 625 m2, pixel size (P) was 5 m and geometric accuracy of the image (G) was set to 2 pixels following (Brogaard & Ólafsdóttir, 1997) 2 A = ðP ð1 + 2GÞÞ : ð1Þ At each vegetation plot, all vascular plant species were recorded and their abundance was estimated visually as percentage cover. Vegetation data were entered into a vegetation database (Muche et al., 2009) allowing querying and linkage to a GIS. For further analysis, abundance information was extracted from the vegetation database and stored in plot-by-species data matrices that would serve as the response matrix in the constrained ordination (Section 2.7). During the extraction from the database, species with less than three occurrences were removed in order to avoid distortion of the cluster analysis due to rare species (Cao & Larsen, 2001; Marchant, 2002). For a subset of the study area, the biodiversity network BIOTA-AFRICA (www.biota-africa.org) provided a comparable dataset of 41 vegetation plots of 10 m × 10 m which were used for accuracy assessment as an external validation dataset. 2.4. Image data and processing During a flight campaign in October 2005 a hyperspectral image was taken using the HyMap airborne imaging spectrometer (Cocks et al., 1998). The image has a spatial resolution of 5 m × 5 m and covers 126 bands with a 10 nm bandwidth in the wavelength range from approximately 450 nm to 2500 nm. The image was orthorectified using the PARGE software (Schläpfer & Richter, 2002) in combination with 15 differential GPS measurements (accuracy ∼0.5 m) from the BIOTA-AFRICA network. Errors of the rectified image were less than 1 pixel (<5.0 m) in x -and y-directions. ATCOR-4 (Richter & Schläpfer, 2002) was used for vicarious calibration and for the removal of atmospheric effects. For the vicarious calibration, spectroradiometric measurements were taken with a portable Fieldspec PRO FR spectrometer (Analytical Spectral Devices, Inc.) at four homogeneous dark and bright bare soil targets and converted into reflectance units using a Spectralon™ panel as white reference. Depending on wavelength, the 1157 deviation of ground measured reflectance and HyMap reflectance obtained after atmospheric correction varied between 1 and 4% absolute reflectance units. 2.5. Cluster analysis Since redundancy analysis was planned as an ordination method we choose Euclidean distance as a distance measure for cluster analysis in order to be sure to handle data with same distance measures for clustering and for ordination. RDA extends PCA to a constrained ordination technique by allowing multiple regressions of two matrices: one dependent matrix and one explanatory matrix (van den Wollenberg, 1977). Hence, RDA also relies on the Euclidean distance. Vegetation data sets are commonly characterized by a high amount of zeros, for which Euclidean distance is not an appropriate distance measure. Therefore, the abundance data was transformed using the Hellinger distance (Rao, 1995), which is simply the square root of the row totals divided by the row mean values. It was shown that the performance of abundance data in Euclidean space using Hellinger distance gives better results than Chi-square metric or similar approaches (Legendre & Gallagher, 2001). An agglomerative hierarchical cluster analysis was performed using Euclidean distance and Ward's minimum-variance clustering algorithm (Fielding, 2007). It is important to check cluster structure for validity because cluster analysis tends to find groups even if there is no clear group structure (Fielding, 2007). Validity of each resulting cluster diagram was assessed using the cophenetic correlation coefficient (rc) which is a widely used measure for comparing the deviance of a cluster from the original dissimilarity matrix (Sokal & Rohlf, 1962). According to McGarigal et al. (2000) a value of rc = 0.75 or higher is a good representation of the original distance matrix used for cluster analysis. We applied a second quality measure of clustering structure, the Agglomerative Coefficient, which is defined as the average height of the mergers in a dendrogram and is a dimensionless number between zero and one, values closer to one indicating a better structuring (Kaufman & Rousseeuw, 1990). It is often a difficult and subjective choice at which level of clustering the most ecologically meaningful solution can be found. For the interpretation of the optimal level of clustering, two complementary methods were used with the restriction that no groups smaller than five plots shall be produced in order to allow a sound statistical analysis. Analysis of similarity (ANOSIM) developed by Clarke (1993) is a nonparametric method for analysing group separability. It compares the difference of mean ranks between groups and within groups and yields a measure called R (not to be confused with a correlation measure). R ranges from 0 to 1, with values larger than 0.75 indicating a good separation. Second, Indicator Species Analysis (ISA), developed by Dufrene and Legendre (1997), produces an indicator value for each species, which is a measure of how well a species is restricted to a certain cluster. It also calculates the sum of all probability values, which reflects the amount of indicator species found. ANOSIM reports the strength of group separability whereas ISA allows an ecological interpretation of the classes. Results from both methods were used to verify and describe the communities derived at cluster levels from two to twenty. Finally, each vegetation unit resulting from the chosen cluster solution was characterized in terms of dominant species and through a general description of its structure. All analysis were performed using the packages cluster (Maechler et al., 2005), vegan (Oksanen et al., 2008) and labdsv (Roberts 2007) in the statistical environment R 2.8.1 (R Development Core Team, 2008). 2.6. Spectral indices A review of the literature on hyperspectral indices that are potentially suitable for characterizing the biophysical conditions of semiarid rangelands resulted in 30 vegetation and soil indices. With this set of indices, variations in all relevant canopy variables (e.g. pigments, 1158 J. Oldeland et al. / Remote Sensing of Environment 114 (2010) 1155–1166 canopy structure, canopy water content, woody parts, litter, and soil background) were covered. The indices were calculated in ENVI 4.2 (RSI 2005) using AS-Toolbox (Dorigo et al., 2006). We removed indices iteratively from the dataset until a general correlation between the indices below Pearson's r = ±0.75 was reached. Variance inflation factor analysis (VIF) (Zuur et al., 2007) was applied to further reduce the list of indices. According to Montgomery et al. (2001), VIF values larger than five prove multicollinearity in a regression analysis. Using this threshold, a total set of eight indices was selected (Table 1). The eight selected indices cover a wide range of vegetation and soil characteristics. DGVI1, DGVI2, NDLI and CAI were specifically designed for hyperspectral sensors and already applied in savannah landscapes (Chen et al., 1998; He et al., 2006; Miura et al., 2003; Nagler et al., 2003; Serrano et al., 2002); the ratios sensitive to variations in soil clay and iron content were developed for narrow bandwidths which are usually supported by hyperspectral images (Dorigo et al., 2006). Since vegetation plots are represented as center coordinates of the plots in a GIS we applied a median filter for 5 × 5 pixels for each index to calculate a single value representing the size of each vegetation plot. For each vegetation plot, values of spectral indices were extracted from the images and stored in a plotby-index data matrix which subsequently served as the predictor matrix in the constrained ordination. 2.7. Constrained ordination We analyzed the gradient length of the first axis of a Detrended Correspondence Analysis (DCA) of the vegetation data, which is a common way to interpret compositional gradients in species datasets, in order to choose suitable ordination methods (McCune et al., 2002). In DCA, a detrending and non-linear rescaling of ordination axes is performed so that axes are rescaled in units of average standard deviation (SD) of species turnover, where an axis value larger than 4 SD indicates a complete turnover in species composition (Legendre & Legendre, 1998). For the first axis, we found a gradient length of 3.1 SD, indicating an intermediate compositional gradient. Therefore, we chose redundancy analysis (RDA) which is known to be an adequate ordination method for short to intermediate compositional gradients assuming linear species responses along the environmental gradients (Legendre & Legendre, 1998). In RDA, the abundance and composition of species in the plot-byspecies matrix is constrained by the values of the spectral indices in the plot-by-index matrix such that each ordination axis represents a linear relationship, i.e. a multiple regression model, between response (species) and all predictor variables (spectral indices). For a general evaluation of ordination results the amount of total variance explained by each axis is interpreted as a measure of ordination success. Ordination diagrams were created using standard scaling and linear constraints (LC-scores) and were visually inspected for group separation along the axes as well as for vector length and direction. Vectors represent the biplot scores for each predictor variable; their Table 1 Final set of spectral indices used in the analysis. Nr. Index 1 2 3 4 5 6 7 8 CARI Full name Chlorophyll absorption in Reflectance Index LCI Leaf Chlorophyll Index DGVI1 First-order derivative green vegetation index DGVI2 Second-order derivative green vegetation index NDLI Normalized Difference Lignin Index CAI Cellulose Absorption Index CLAY Clay ratio IRON Iron ratio length and direction reflect their importance relative to the ordination axes. Significance of the ordination was assessed by calculating a goodness of fit test (using function as.mlm.rda in vegan package, Oksanen et al., 2008) yielding R2 and p-values for the relationships between indices and ordination axes and between vegetation units and ordination axes. 2.8. Calculating ordination maps Since RDA inherently produces linear combinations of predictor variables to calculate distances in the ordination space, it is possible to use regression coefficients and spectral indices to calculate maps of ordination axes. This produces a new dataset, with one image layer per ordination axis. In order to interpret the ordination images we compared the relative position of the vegetation units along the ordination axes in the ordination diagrams, e.g. we looked for relative position of unit 4 on ordination axis one, and used the regression statistics of the redundancy analysis to evaluate the fit of the relation between the axes, indices and vegetation units. 2.9. Fuzzy classification In order to finally extract continuous vegetation unit maps from the ordination images we applied a supervised fuzzy c-means classifier (SFCM) to estimate the abundance of vegetation unit per pixel (Lucieer, 2006; Zhang & Foody, 2001). For the vegetation units, regions of interest (ROI) were created that covered the area of each vegetation plot (625 m2). The ensemble of pixels was divided into a training and a validation dataset for each vegetation unit. The degree to which a sample belongs to a class is expressed by a continuous membership value that ranges between 0.0 and 1.0, where 1.0 indicates perfect similarity with a class cluster. The fuzziness component, which determines the amount of overlap allowed, was set to 2.0 following various authors (Burrough et al., 2000; Lucieer, 2006). For classification we used the non-parametric k-Nearest Neighbor (k-NN) distance metric within the SFCM algorithm following Lucieer (2006) and building onto Zhang and Foody's (2001) Euclidean SFCM algorithm. The k-NN algorithm searches the feature space for the k nearest pixels within the training sample, whose field data vectors are known, applying a distance measure defined in feature space (Franco-Lopez et al., 2001; Katila & Tomppo, 2001). The k-NN algorithm does not make any assumptions about the statistical distribution of the training pixels, which is advantageous in our situation where the number of pixels available for training is limited. Following Lucieer (2008), we used a number of k = 5 nearest neighbors. The SFCM algorithm produces a fuzzy classification of the ordination images resulting in three types of output. First, it computes a membership image for each vegetation unit indicating the percent membership of each pixel. Second, it produces a defuzzified hard classification image from the membership images based on maximum membership values. Third, it calculates an image of the Confusion Index (CI), which summarizes the confusion of class assignment in each pixel. The CI is a ratio of the second maximum membership and the maximum membership for each pixel. High values indicate a high classification uncertainty. 2.10. Accuracy assessment Feature Reference Chlorophyll Kim et al. (1994) Chlorophyll Datt et al. (2003) Greenness Chen et al. (1998) Greenness Chen et al. (1998) Lignin Litter Soil Soil Serrano et al. (2002) Daughtry (2001) Dorigo et al. (2006) Dorigo et al. (2006) Two datasets were available for assessing the accuracy of the classification result. First, the result was validated with the pixels from the ROIs that were not used to train the k-NN classifier, we refer to this as the internal validation dataset. An independent dataset was provided by the biodiversity network BIOTA-AFRICA (www.biota-africa.org). Vegetation plots of the independent dataset were assigned labels of already classified vegetation units according to species composition and abundance. Not all vegetation units were present in the BIOTA-AFRICA dataset, leaving unit four and six empty. The quality of the hard classification vegetation unit map was assessed using a confusion matrix. J. Oldeland et al. / Remote Sensing of Environment 114 (2010) 1155–1166 Table 2 Overview of vegetation units that were derived by cluster analysis. Characteristic species for each type are sorted after average abundance in the cluster. Number of plots (n) show cluster size. Type n 1 2 3 4 5 6 7 Characteristic species 20 Monechma genistifolium Pentzia calva Geigeria ornativa 5 Stipagrostis ciliata Felicia clavipilosa 28 Stipagrostis obtusa Monchema genistifolium Melolobium microphyllum 10 Acacia mellifera Albizia anthelminthica Stipagrostis uniplumis 10 Leucosphaera bainsii Aizoon schellenbergii Enneapogon desvauxii 7 Fingerhuthia africana Aizoon giessii Melhania virescens 9 Panicum lanipes Eragrostis rotifer Rhigozum trichotomum Acacia hebeclada Description of the vegetation unit Open dwarf shrub with sparse cover, mainly Monechma genistifolium on calcareous rocky soils Grass and shrub vegetation on outcrops and deeply incised rocky drainage lines Sparse grassland and open patches, mainly Stipagrostis obtusa, only few dwarf shrubs Woody acacia shrub on shallow red soils Dwarf shrub savanna with many dwarf shrubs, and perennial grasses on dark biological soil crusts Grassland with mainly Fingerhuthia africana and few dwarf shrubs on rocky siliceous soil. Shrub vegetation at the border of clay pans and shallow drainage lines with clay soils; grasses and herbs in center of pans Finally, the overall accuracy of classification (OA) and kappa values were calculated for each dataset and for subsequent numbers of ordination axes. 3. Results 3.1. Cluster analysis The quality of the produced dendrogram, i.e. cluster structure as measured by the Agglomerative Coefficient, was 0.86 whereas the cophenetic correlation coefficient reached a value of rc = 0.75, both indicating a considerable amount of structure in the dendrogram. The comparison of ANOSIM and ISA for each level of clustering from two to twenty suggested either a level of three or seven clusters. Since the cluster resolution is desired to be as detailed as possible to classify the vegetation into meaningful vegetation units, seven clusters were chosen as the appropriate size for further analysis. Table 2 gives an overview on the ecological description of the vegetation units based on constancy of species and relative abundance per group. 3.2. Constrained ordination A total of 34% variance in the species data was explained by the eight constraining axes. The first three axes explained 12%, 9% and 5% respectively. A simplified ordination diagram of the first and second RDA axis is shown in Fig. 1. In order to improve interpretability, ellipses showing 95% confidence levels were drawn around the group centroids. In the ordination diagram, the vegetation units derived by cluster analysis show distinct clusters separated by spectral indices along the first two ordination axes. The first axis represents mainly a gradient of vegetation cover along which the different vegetation units line up. The structurally more complex unit four is positively related with CLAY, DGVI1 and DGVI2 whereas all other units are negatively related with those indices along the first axis. The second axis is a gradient of chlorophyll concentration, mainly spread by CARI and LCI. CARI is positively correlated with unit 3 and LCI is positively correlated with vegetation unit 5. The third axis (Fig. 1) separates vegetation units 7, 2 and 1 by different dry matter components represented by NDLI and CAI. The latter is strongly positively related with vegetation unit 7. NDLI is strongly negatively related with units 2 and 7. The fourth axis separates units with high values for CARI and 1159 CLAY (3, 4, 5, and 6) from the groups 7, 2 and 1 which have low values for CARI and CLAY. The significance of the relationship between spectral indices, vegetation units and the ordination axes performed by RDA is given in Table 3. Spectral indices and vegetation units show high R2 values and low p-values up to the fifth ordination axis. From ordination axis six to eight, R2 values stay below 0.4, whereas overall p-values were slightly lower than 0.001 but were still significant up to the 5% level. A high significance for a spectral index along an axis indicates a strong relationship with that axis, whereas a high significance for a vegetation unit reveals a good separation of vegetation units along that specific axis. For example, vegetation unit one and two cannot be separated well from the other groups along the first axis (Table 3) but there are high significances on the third ordination axis displaying that this axis is better suited to separate vegetation units 1 and 2 from the others. 3.3. Fuzzy classification The supervised fuzzy c-means classification using the k-NN classifier produced three sets of images: class membership images for each vegetation unit (Fig. 2a–g), a confusion image (Fig. 2h) and a hard classification image (Fig. 3). The number of axes leading to the best classification results was assessed by comparing overall accuracy and kappa values obtained for the two validation data sets using different numbers of axes (Table 4). Starting from the two axes that explained most variation we iteratively added the other axes until all eight axes were included. For the internal validation dataset, best overall accuracy and kappa values were found for a total of six and eight axes. The independent BIOTA-AFRICA dataset achieved best values with only five axes, yet the values for the eight axes solution were only slightly less accurate. Based on the results obtained for both validation sets, the eight axes solution was chosen. Membership images, hard classification and confusion image were checked visually for credibility in order to better interpret classification results, e.g. high values in the confusion image (Fig. 2h) indicate overlapping classes, i.e. transition zones and mixed units, whereas low values show purer classes. The error matrices for the eight axis solution are shown in Tables 5 and 6 for the internal and independent validation dataset respectively. 4. Discussion 4.1. Vegetation maps We were able to map seven vegetation units of a rangeland area in Central Namibia with a high accuracy but a relatively low sampling effort (number of vegetation plots = 89). Mapped vegetation units comprise the main plant communities and their ecological conditions for an area of around 19.5 km2 (Table 2). The membership images of vegetation units 1, 4, and 5 (Fig. 2a, d, and e) and the hard classification image show a sharp transition in the center of the study area. This is caused by a fence line separating both farms and is typical for South African rangelands indicating contrasting management strategies (Todd & Hoffman, 1999). Validation of the hard classification map (Fig. 3) with the internal validation dataset shows a very good performance by reaching kappa values of 0.98 and an overall accuracy of 98%. The confusion image (Fig. 2h) indicates that vegetation units 4 and 7 only have little confusion with other classes. This can be explained by the high proportion of larger shrub species, mainly A. mellifera or A. hebeclada, in both vegetation units, which makes them more distinct due to higher values in the DGVI indices. The other vegetation units show higher grades of fuzziness. The error matrix of the internal validation dataset (Table 5) shows that vegetation units 1 and 3 are slightly confused while the other vegetation units were classified 100% correctly. This confusion is caused by an overlap of high membership values for units 1 and 3 (Fig. 3a, c). This overlap is a good example for the fuzziness of the classification. The difference between both vegetation 1160 J. Oldeland et al. / Remote Sensing of Environment 114 (2010) 1155–1166 Fig. 1. Simplified ordination diagram of the RDA analysis. Ellipsoids show 95% interval of vegetation units which are represented by number codes. Vectors show direction and importance of spectral indices. First two constrained RDA axes (left) explain 21%, RDA axes three and four (right) explain additional 9% variation in species composition through spectral indices. units is mainly a difference in stoniness and the cover fraction of grass and shrubs, whereas species composition is mainly the same. Both vegetation units principally are found on the southern farm and are mostly neighboring, which increases the possibility of overlap. The independent dataset shows a moderate performance with a kappa value of 0.53 and an overall accuracy of 64%. In the classification results again some confusion exists between vegetation units 1 and 3 (Table 6), but also between vegetation units 1 and 2, as well as between 2 and 5 (Fig. 2a, b, c and e). The reasons for the less accurate classification results based on the external dataset are various; firstly, the independent dataset was not collected for the special purpose of vegetation mapping but for biodiversity monitoring. This means that vegetation plots are constantly placed in the center of selected hectares, belonging to a grid of 1 × 1 km with a mesh width of 100 m, according to the sampling scheme of the monitoring project (Jürgens, 1998). Thus, vegetation plots are restricted to center points of hectares and might be in the vicinity of different vegetation units. This seems to be the main cause for the confusion of the classification. Secondly, there were only few vegetation plots in total (n = 41) and plot size was smaller than our design, e.g. 10 m × 10 m versus our 25 m × 25 m, capturing fewer species. Thirdly, with the same sampling approach different survey teams usually yield different estimates of percent cover or differ in species identification (Kercher et al., 2003) which can lead to different classifications, making it more difficult to compare vegetation datasets between different projects. Finally, two vegetation units did not fall into the sampling scheme of the external dataset and vegetation unit two is underrepresented with only one vegetation plot. On a comparable scale, Thomas et al. (2003) mapped boreal peatlands in Canada using 600 vegetation plots of 1 m2 by which they were able to distinguish up to nine types of fen vegetation. By applying a maximum likelihood classification to a hyperspectral image, they yielded maximum kappa values between 0.32 and 0.55. The moderate performance was explained by the low spectral separability of the produced vegetation classes which also could be due to the very small plot size and the small total area (600 m2) sampled. In fact, the hypothesis that plant communities can be clearly separated by their spectral Table 3 Significance of relationships between spectral indices, vegetation units and ordination axes. Spectral indices CARI LCI DGVI1 DGVI2 NDLI CAI CLAY IRON R2 F-value p-value Vegetation unit 1 2 3 4 5 6 7 R2 F-value p-value RDA1 RDA2 RDA3 RDA4 * * *** *** . *** * ** *** * * ** *** . 0.6323 17.41 < 0.001 ** 0.6305 17.28 <0.001 * *** *** *** *** *** 0.9027 126.7 < 0.001 ** *** * *** *** 0.6716 27.95 <0.001 Signif. codes: < 0.001***; 0.01 = **; 0.05 = * 0.05; ‘.’ = 0.1. *** *** ** * 0.6209 16.58 < 0.001 *** *** *** *** *** . *** 0.7651 44.52 < 0.001 RDA5 RDA6 RDA7 *** *** *** * . ** ** * . * ** * *** 0.5326 11.54 <0.001 RDA8 * * 0.4206 7.35 < 0.001 *** 0.3217 4.802 <0.001 *** 0.2604 3.564 < 0.05 0.1779 2.192 <0.05 ** *** *** *** *** *** 0.6206 22.36 <0.001 . ** ** * 0.4873 12.99 < 0.001 . * ** 0.2643 4.91 <0.01 * * * *** ** 0.3226 6.509 < 0.001 0.1656 2.713 <0.05 J. Oldeland et al. / Remote Sensing of Environment 114 (2010) 1155–1166 1161 Fig. 2. Resulting membership images derived by the supervised fuzzy c-means classification. Vegetation units 1–7 are displayed from a–g. Bright pixels indicate high membership values, dark pixels indicate low values. Confusion image (h) shows areas of high confusion in white and more pure classes in black. 1162 J. Oldeland et al. / Remote Sensing of Environment 114 (2010) 1155–1166 Table 5 Error matrix for eight axes solution using k-NN algorithm on internal validation dataset. Values represent percent of pixels classified into class. Unit Pixels 1 2 3 4 5 6 7 Total Unclassified 1 2 3 4 5 6 7 Total 0 116 31 195 110 84 71 108 715 0 93.1 0 1.72 0.86 4.31 0 0 100 0 0 100 0 0 0 0 0 100 0 0 0 97.44 0 2.05 0.51 0 100 0 0 0 0 100 0 0 0 100 0 0 0 0 0 100 0 0 100 0 0 0 0 0 0 100 0 100 0 0 0 0 0 0 0 100 100 0 15.1 4.34 26.85 15.52 13.01 10.07 15.1 100 Overall accuracy = 98.18%; kappa = 0.98. classification algorithms available (see Lu and Weng (2007) for an overview of classification algorithms), supervised fuzzy classification approaches seem to be most promising for natural landscapes. Foody (1992, 1996) pioneered the application of fuzzy classification of vegetation with remote sensing data. Malik and Husain (2006) also used a supervised fuzzy classification to discriminate between four plant communities and five land cover classes on a subset of a SPOT XS scene covering 6 km2 of a valley in Pakistan. They reached overall accuracies between 65% and 72%, yet also reported the problem of spectral separability between vegetation classes. Lucieer (2006) used supervised fuzzy c-means classification applying the Mahalanobis distance metric on IKONOS panchromatic bands for classifying sub-Antarctic vegetation. Lucieer did not use floristically defined vegetation units but seven rather broad categories (four non-vegetated), which lead to an overall accuracy of 73% and a kappa value of 0.69. In comparison with the above mentioned studies, our approach yielded equal or even better results depending on the dataset used for validation. 4.2. Constrained ordination using spectral indices Fig. 3. Classified vegetation map based on a supervised fuzzy c-means classification result, hard class labels according to identified vegetation units are assigned based on maximum membership values in each pixel. The colors represent different vegetation units, see Table 2 for explanation. features is based on the assumption that boundaries between plant communities are hard. This might be acceptable for intensively used agricultural or urban landscapes with crisp boundaries between land cover types, but in semi-natural savannah systems transitions between plant communities are gradual. Thus, from the vast amount of image Table 4 Effect of number of bands on overall accuracy (OA) and kappa values for the internal and the independent validation datasets. Important values are highlighted in bold. Axes 1 2 3 4 5 6 7 8 Internal Independent OA Kappa OA Kappa 39.02 64.06 86.01 93.29 94.54 98.18 97.90 98.18 0.24 0.57 0.83 0.92 0.93 0.98 0.98 0.98 38.16 43.59 52.52 56.52 64.00 61.56 58.70 63.82 0.19 0.25 0.38 0.43 0.53 0.49 0.46 0.52 Vegetation mapping usually relies on the classification of remotely sensed images. In our case, classification was done on the basis of images of ordination axes, which reflect the relationship between vegetation units and spectral indices. Visualization was possible since the RDA produces a linear combination of predictor variables, as in multiple regressions, which can be combined with images of spectral indices. The ordination diagram showed a clear separation of vegetation units by the constraining spectral indices (Fig. 2), where the spectral indices enhanced the ordination, i.e. explained 34% of the overall variation in the species data. Apparently constrained axes show only low eigenvalues, yet this is due to the large amount of unconstrained axes available to explain the variation, i.e. one for each species (n = 79). The regression results clearly show that there is an overall good quality of the ordination up to the fifth or sixth ordination axes (Table 3). The highest R2 value achieved was R2 = 0.63 for the regression of indices and ordination axes and R2 = 0.92 for the regression of vegetation units and Table 6 Error matrix for eight axes solution using k-NN algorithm on independent validation dataset. Values represent percent of pixels classified into class. No vegetation plots from independent dataset did fit into classes four and six leaving them empty. Unit Pixels 1 2 3 4 5 6 7 Total Unclassified 1 2 3 4 5 6 7 Total 0 68 21 121 – 131 – 10 351 0 97.06 0 2.94 – 0 – 0 100 0 42.86 0 57.14 – 0 – 0 100 0 35.00 0 51.67 – 4.17 – 9.17 100 – – – – – – – – – 0 4.13 13.22 6.61 – 65.29 – 10.74 100 – – – – – – – – – 0 0 0 0 – 0 – 100 100 0 35.88 4.71 24.71 – 24.71 – 10 100 Overall accuracy = 63.82%; kappa = 0.52. J. Oldeland et al. / Remote Sensing of Environment 114 (2010) 1155–1166 ordination axes. Nevertheless, using all ordination axes produced the best classification results, which means that also information with a low significance can improve the overall classification outcome. In comparison, Thomas et al. (2003) used a CCA and were able to explain a total amount of 44% variation in the species data using spectral bands, but their highest R2 was 0.47. Brook and Kenkel (2002) also applied an RDA on scores of four ordination axes derived by a correspondence analysis of the vegetation data and Landsat TM channels 3, 4, 5 and 7. They were able to explain 47% of the variation in the species data and interpreted the relationship between spectral reflectance and vegetation mainly as a structural rather than a floristic (species composition) effect. As mentioned earlier, RDA and CCA differ in the underlying species response model (Austin, 1987; McCune et al., 2002), hence the question whether RDA or CCA is the more appropriate multivariate analysis depends on the ecological gradients covered within the dataset and the aim of the analysis. In our case, an approach based on CCA would have resulted in a total variance explained in 24% and less interpretable ordination diagrams, meaning that RDA leads to more meaningful results. The application of hyperspectral indices in multivariate analysis of relationships between vegetation and spectral information seems to be a logical step as the use of vegetation indices makes the approach more robust to uncertainties in atmospheric correction and changing illumination and observation conditions compared to the direct inclusion of spectral bands. Moreover, the use of vegetation indices reduces the large amount of highly collinear data to a reasonable amount of less correlated features that can be directly linked to vegetation properties relevant for the observed canopies. However, we found only one study that used spectral indices other than NDVI in a multivariate analysis, namely the three tasseled cap indices brightness, greenness and wetness derived from Landsat ETM+ data (de la Cueva, 2008). Hyperspectral signatures of vegetation canopies are rich in information (Ustin et al., 2004) and many vegetation indices are available to exploit this information. The explanatory power of spectral indices is quite high as these indices are strongly related to certain aspects of canopy information, such as dry matter, chlorophyll and other plant pigments, water, nitrogen, or cellulose (He et al., 2006; Treitz & Howarth, 1999; Ustin et al., 2004). In this study, the vegetation indices DGVI1 and DGVI2 were highly correlated with the first ordination axis which can be interpreted as an increase in vegetation cover but also in vegetation height. Unit 4, consisting of woody acacia thicket was positively correlated with both indices, whereas the sparse vegetation of unit 3 was negatively correlated. This correlation indicates that the DGVI1 and DGVI2 are sensitive to an increase in vegetation cover also in semiarid dwarf shrub savannah. Miura et al (2003) showed this for a Cerrado savannah in Brazil. However, vegetation on Namibian rangelands is much sparser than Cerrado vegetation. The LCI was positively related with unit 5, which was identified on the northern farmland (Fig. 3h). Here, the vegetation composition is much different from that in units 1 and 3 which both are negatively correlated with LCI. The vegetation in unit 5 is dominated by dense stands of the dwarf shrub Leucosphaera bainsii (Table 3). However, it is hard to define causality with LCI here because there is a considerable amount of dark biological soil crusts which also might contribute to forming this relationship. The dry matter, or litter indices, CAI and NDLI, both point at unit 7, which shows a considerable amount of dry biomass originating from dry grass material from the last year. Regarding the soil indices, it is notable that the clay index is positively correlated with axis three and four and not with the first axis as it might be interpreted from the ordination diagram. The iron index is related to iron induced absorptions in the NIR which lead to the reddish coloring of the soil with increasing iron content. Yet, a clear interpretation remains difficult. The iron index is negatively related with ordination axis two, helping to separate unit 5 from the other units. Its most significant contribution is found on axis six (Table 3) where it separates the red soils of the unit 4 from all other groups. It is important to notice that the set of eight indices selected in 1163 this study equally samples from the VNIR and SWIR region, each stressing different properties of vegetation (Asner & Heidebrecht, 2002; Nagler et al., 2003). Transferring the approach presented here to other study areas should include a sound selection of spectral indices appropriate for the studied ecosystem and the applied spectral sensor. Hyperspectral indices provide information on a wide range of canopy related variables and seem suitable to be used by ecologists as variables in multivariate ecosystem experiments or for vegetation mapping approaches relying on multivariate relationships between spectral and compositional data. 4.3. Pitfalls of cluster analysis A source of uncertainty that might influence classification results lies in the means of cluster analysis which is the complicated task of structuring data by grouping objects according to their similarity. Complicated, because there are many subjective choices to make, such as the choice of the overall clustering strategy, e.g. hierarchical or partitioning, an appropriate distance measure, and the clustering algorithm (Fielding, 2007). The result is heavily dependent on the origin and quality of the data used, as well as the available expert knowledge of the analyst. Ecological datasets in particular require special transformations in order to be applicable with cluster analysis or ordination techniques (Legendre & Gallagher, 2001). For example, when dealing with remotely sensed data, it seems more realistic to apply a quantitative distance measure in order to take species abundance information into consideration. Distance measures based on presence–absence, like the frequently used Jaccard or Sörensen Index, can lead to a very different result (Legendre & Legendre, 1998), making interpretation of vegetation classifications in relation with spectral data more difficult. In other words, if a species occurs only with 1% cover, and is transformed to presence, then the plot is treated in the same way as it would have been when the species had shown a cover of 100%. This is an unrealistic assumption in a vegetation mapping context based on spectral properties of dominant plant canopies. Yet, the greatest challenge is the interpretation of a resulting dendrogram. Although methods exist for checking its quality and deciding up to which level the cluster can be interpreted (Fielding, 2007; Pillar, 1999), these methods are rarely reported in the literature (Aho et al., 2008). We applied a thorough assessment of cluster structure using two measures of cluster quality, the cophenetic correlation coefficient and the Agglomerative Coefficient, which reached high values indicating that the clustering is based on a highly structured dataset. Furthermore, before trying to test for spectral separability between vegetation units, one should verify that vegetation units are already separated as much as possible. We proposed the combination of two separability measures ANOSIM and ISA for choosing an optimal level of clustering. This turned out to be an efficient way of interpreting group separation. We are not aware of any other vegetation mapping study using cluster analysis and remote sensing data reporting one of the above mentioned quality checks. Dendrograms with low structure quality produced by cluster analysis might be the main cause for the low spectral separability found in other studies (Malik & Husain, 2006; Thomas et al., 2003). Interestingly, the method most frequently applied to delineate vegetation units in remote sensing studies is the Two-way Indicator Species Analysis (TWINSPAN), a polythetic divisive clustering algorithm developed in vegetation science by Hill (1979), for examples see Malik and Husain (2006), Peel et al. (2007), Ravan et al. (1995), and Thomas et al. (2003). This method has been criticized in ecological literature mainly for two reasons: first, it mainly detects one large gradient due to its statistical restriction by using correspondence analysis to span a floristic gradient (Belbin & Mcdonald, 1993; Kent, 2006). McCune et al. (2002) suggested that this method should not be applied at all, except in situations where there is a known large one-dimensional gradient in the dataset. This can be the case when the floristic gradient reflects the major gradient in vegetation cover (Nilsen et al., 1999). Second, 1164 J. Oldeland et al. / Remote Sensing of Environment 114 (2010) 1155–1166 TWINSPAN uses the Chi-square metric, which gives high weight to species with low abundance (Faith et al., 1987). For delineating vegetation units based on abundance of dominant species this is not necessarily useful. Several modifications of the TWINSPAN algorithm are reported in the literature, e.g. Roleček et al. (2009) have improved TWINSPAN classifications by incorporating a measure of cluster heterogeneity into the algorithm, available in the free software package JUICE (Tichy, 2002). We suggest using either an improved version of TWINSPAN or alternative clustering techniques since they might contribute to a better discrimination between vegetation units and help to avoid the shortcomings of the original method. vegetation mapping approaches based on multivariate relationships. Finally, we used supervised fuzzy classification technique to create abundance maps for each vegetation unit as well as a hard classification image suitable for conservation management or landscape planning. To consider fuzziness is especially important in semi-natural landscapes where transitions between different plant communities are often gradual. As Kerr and Ostrovsky (2003) have stated, ecologists have begun to recognize the potential of remotely sensed data. Conversely, one could state that the remote sensing community should follow by recognizing the potential of ecological datasets and methods and be aware of the potential pitfalls during field sampling and ecological data analysis in order to produce accurate vegetation maps. 4.4. Problems during field sampling Acknowledgements A sound sampling design is most important because success is only possible when the baseline data is a realistic representation of the study area. In this study, the decision for a preferential sampling driven by cost and time factors led to a satisfying result. However, a more sophisticated design, such as a systematic or stratified random sampling, that additionally minimizes the effects of spatial autocorrelation, could be a more efficient approach for vegetation mapping studies (Fortin et al., 1989; McGwire et al., 1993; Stohlgren, 2007). As Nilsen et al. (1999) already pointed out, a sound sampling design matched to the sensor specifications is required to avoid error propagation and should be taken into consideration during the planning phase. For example, the extent of the image affects the number of plots needed to capture the variety of vegetation units (Marignani et al., 2007). Sensor resolution, i.e. pixel size, predetermines a reasonable plot size (Nagendra & Rocchini, 2008; Rahman et al., 2003), which is known to affect different properties of species data in vegetation classification, such as species constancy and number of species (Dengler, 2009b; Dengler et al., 2009). We have chosen a plot size of 25 m × 25 m, which sufficiently covered a pixel size of 5 m × 5 m and was appropriate for the homogeneous vegetation and level of geometric correction of the HyMap image. In addition to plot size, plot shape, e.g. quadratic, rectangular, circular or hexagonal shape, can affect those vegetation parameters (Dengler, 2009a; Stohlgren, 2007). As stressed by Stohlgren (2007), quadratic vegetation plots facilitate the comparison of sampled vegetation data with pixel information of remotely sensed images. However, rectangular plots, e.g. 20 m × 50 m have been identified to be more appropriate for relating spectral data to biodiversity measurements since they are able to catch a wider range of ecological gradients than quadratic shaped vegetation plots (Oldeland et al., 2009). 5. Conclusion Remote sensing approaches for vegetation mapping using multivariate analyses have been increasingly applied over the last years. These approaches combine detailed ground data from ecological field surveys with remotely sensed data, showing great potential in the field of fine scale vegetation mapping (Alexander & Millington, 2000). In this study, we extended the multivariate approach for vegetation mapping by connecting field data and spectral indices with different multivariate analysis techniques. First, hierarchical cluster analysis was applied to delineate meaningful vegetation units. This is a crucial step in the methodology since all following analyses are built on proper group identification. Checking dendrograms for structure and quality is therefore a necessary step. Second, ordination of species data constrained by hyperspectral indices leads to images representing the statistical relationship between vegetation units and spectral data. Here, the ability to relate spectral indices with vegetation data allows for a good interpretation of the spectral properties of each vegetation unit. The power of hyperspectral indices for multivariate ecological applications is still relatively untouched. In our opinion, there is a great potential for the communities of remote sensing and ecological scientists to use these types of predictor variables for improving We thank the farm owners of Narais and Duruchaus for providing access to their rangelands, Dirk Wesuls for helping with plant identification and interpretations of the vegetation classification, our colleagues at the DLR for the assistance during pre-processing, Jari Oksanen for comments on ordination, the BIOTA-AFRICA project for providing infrastructure and the external dataset and finally the Helmholtz-EOS PhD Programme for funding this research project. We also thank the handling editor and two anonymous reviewers for significantly improving the manuscript. References Aho, K., Roberts, D. W., & Weaver, T. (2008). Using geometric and non-geometric internal evaluators to compare eight vegetation classification methods. Journal of Vegetation Science, 19, 549−562. Alexander, R., & Millington, A. C. (Eds.). (2000). Vegetation mapping: From patch to planet. Chichester John Wiley and Sons Ltd. Amarnath, G., Murthy, M. S. R., Britto, S. J., Rajashekar, G., & Dutt, C. B. S. (2003). Diagnostic analysis of conservation zones using remote sensing and GIS techniques in wet evergreen forests of the Western Ghats — An ecological hotspot, Tamil Nadu, India. Biodiversity and Conservation, 12, 2331−2359. Aragon, R., & Oesterheld, M. (2008). Linking vegetation heterogeneity and functional attributes of temperate grasslands through remote sensing. Applied Vegetation Science, 11, 117−130. Armitage, R. P., Weaver, R. E., & Kent, M. (2000). Remote sensing of semi-natural upland vegetation: The relationship between species composition and spectral response. In R. Alexander, & A. C. Millington (Eds.), Vegetation mapping: From patch to planet (pp. 83−102). Chichester John Wiley and Sons Ltd. Arnot, C., & Fisher, P. (2007). Mapping the ecotone with fuzzy sets. In A. Morris, & S. Kokhan (Eds.), Geographic uncertainty in environmental security (pp. 19−32). Dordrecht, The Netherlands Springer. Asner, G. P., & Heidebrecht, K. B. (2002). Spectral unmixing of vegetation, soil and dry carbon cover in arid regions: Comparing multispectral and hyperspectral observations. International Journal of Remote Sensing, 23, 3939−3958. Austin, M. P. (1987). Models for the analysis of species response to environmental gradients. Vegetatio, 69, 35−45. Belbin, L., & Mcdonald, C. (1993). Comparing three classification strategies for use in ecology. Journal of Vegetation Science, 4, 341−348. Brogaard, S., & Ólafsdóttir, R. (1997). Ground-truths or ground-lies? Environmental sampling for remote sensing application exemplified by vegetation cover data. Lund Electronic Reports in Physical Geography, 1, 1−15. Brook, R. K., & Kenkel, N. C. (2002). A multivariate approach to vegetation mapping of Manitoba's Hudson Bay lowlands. International Journal of Remote Sensing, 23, 4761−4776. Burrough, P. A., van Gaans, P. F. M., & MacMillan, R. A. (2000). High-resolution landform classification using fuzzy k-means. Fuzzy Sets and Systems, 113, 37−52. Cao, Y., & Larsen, D. P. (2001). Rare species in multivariate analysis for bioassessment: Some considerations. Journal of the North American Benthological Society, 20, 144−153. Chen, Z. K., Elvidge, C. D., & Groeneveld, D. P. (1998). Monitoring seasonal dynamics of arid land vegetation using AVIRIS data. Remote Sensing of Environment, 65, 255−266. Chytrý, M., & Tichý, L. (2003). Diagnostic, constant and dominant species of vegetation classes and alliances of the Czech Republic: A statistical revision. Folia facultatis scientiarum naturalium Universitatis Masarykianae Brunensis. Biologia, 108, 1−231. Cihlar, J. (2000). Land cover mapping of large areas from satellites: Status and research priorities. International Journal of Remote Sensing, 21, 1093−1114. Clarke, K. R. (1993). Non-parametric multivariate analyses of changes in community structure. Australian Journal of Ecology, 18, 117−143. Cocks, T., Jenssen, R., Stewart, A., Wilson, I., & Shields, T. (1998). The HyMap hyperspectral sensor: The system, calibration and performance. 1st EARSEL Workshop on Imaging Spectroscopy (pp. 7). Zürich. Datt, B., McVicar, T. R., Van Niel, T. G., Jupp, D. L. B., & Pearlman, J. S. (2003). Preprocessing EO-1 Hyperion hyperspectral data to support the application of agricultural indexes. IEEE Transactions on Geoscience and Remote Sensing, 41, 1246−1259. J. Oldeland et al. / Remote Sensing of Environment 114 (2010) 1155–1166 Daughtry, C. S. T. (2001). Discriminating crop residues from soil by shortwave infrared reflectance. Agroclimatology, 93, 125−131. de la Cueva, A. V. (2008). Structural attributes of three forest types in central Spain and Landsat ETM+ information evaluated with redundancy analysis. International Journal of Remote Sensing, 29, 5657−5676. Dengler, J. (2009). A flexible multi-scale approach for standardised recording of plant species richness patterns. Ecological Indicators, 9, 1169−1178. Dengler, J. (2009). Which function describes the species–area relationship best? A review and empirical evaluation. Journal of Biogeography, 36, 728−744. Dengler, J., Löbel, S., & Dolnik, C. (2009). Species constancy depends on plot size — A problem for vegetation classification and how it can be solved. Journal of Vegetation Science, 20, 754−766. Dirnböck, T., Dullinger, S., Gottfried, M., Ginzler, C., & Grabherr, G. (2003). Mapping alpine vegetation based on image analysis, topographic variables and Canonical Correspondence Analysis. Applied Vegetation Science, 6, 85−96. Dobrowski, S. Z., Safford, H. D., Cheng, Y. B., & Ustin, S. L. (2008). Mapping mountain vegetation using species distribution modeling, image-based texture analysis, and object-based classification. Applied Vegetation Science, 11, 499−508. Dorigo, W., Bachmann, M., & Heldens, W. (2006). AS Toolbox & Processing of field spectra — User's manual. Wessling, Germany German Aerospace Center (DLR). Dorigo, W. A., Zurita-Milla, R., de Wit, A. J. W., Brazile, J., Singh, R., & Schaepman, M. E. (2007). A review on reflective remote sensing and data assimilation techniques for enhanced agroecosystem modeling. International Journal of Applied Earth Observation and Geoinformation, 9, 165−193. Dufrene, M., & Legendre, P. (1997). Species assemblages and indicator species: The need for a flexible asymmetrical approach. Ecological Monographs, 67, 345−366. Faith, D. P., Minchin, P. R., & Belbin, L. (1987). Compositional dissimilarity as a robust measure of ecological distance. Vegetatio, 69, 57−68. Fielding, A. H. (2007). Cluster and classification techniques for the biosciences. Cambridge, U.K. Cambridge University Press. Foody, G. M. (1992). A fuzzy sets approach to the representation of vegetation continua from remotely sensed data — An example from lowland heath. Photogrammetric Engineering and Remote Sensing, 58, 221−225. Foody, G. M. (1996). Approaches for the production and evaluation of fuzzy land cover classifications from remotely-sensed data. International Journal of Remote Sensing, 17, 1317−1340. Fortin, M. J., Drapeau, P., & Legendre, P. (1989). Spatial auto-correlation and sampling design in plant ecology. Vegetatio, 83, 209−222. Franco-Lopez, H., Ek, A. R., & Bauer, M. E. (2001). Estimation and mapping of forest stand density, volume, and cover type using the k-nearest neighbors method. Remote Sensing of Environment, 77, 251−274. Gamon, J. A., Huemmrich, K. F., Peddle, D. R., Chen, J., Fuentes, D., Hall, F. G., et al. (2004). Remote sensing in BOREAS: Lessons learned. Remote Sensing of Environment, 89, 139−162. Goodin, D. G., Gao, J., & Hutchinson, J. M. S. (2004). Seasonal, topographic and burn frequency effects on biophysical/spectral reflectance relationships in tallgrass prairie. International Journal of Remote Sensing, 25, 5429−5445. Hansen, P. M., & Schjoerring, J. K. (2003). Reflectance measurement of canopy biomass and nitrogen status in wheat crops using normalized difference vegetation indices and partial least squares regression. Remote Sensing of Environment, 86, 542−553. He, Y., Guo, X. L., & Wilmshurst, J. (2006). Studying mixed grassland ecosystems I: Suitable hyperspectral vegetation indices. Canadian Journal of Remote Sensing, 32, 98−107. Hill, M. O. (1979). TWINSPAN. A FORTRAN program for arranging multivariate data in an ordered two-way table by classification of the individuals and their attributes. New York: Microcomputer power. Hong, S. K., Kim, S., Cho, K. H., Kim, J. E., Kang, S., & Lee, D. (2004). Ecotope mapping for landscape ecological assessment of habitat and ecosystem. Ecological Research, 19, 130−139. Jensen, O. C., & Azofeifa, G. A. S. (2006). Satellite-derived ecosystems classification: Image segmentation by ecological region for improved classification accuracy, a boreal case study. International Journal of Remote Sensing, 27, 233−251. Jongman, R. H. G., ter Braak, C. J. F., & van Tongeren, O. F. R. (1995). Data analysis in landscape and community ecology. Cambridge Cambridge University Press. Jürgens, N. (1998). Biodiversity monitoring transect analysis. In M. W. Barthlott, & M. Gutmann (Eds.), Biodiversitätsforschung in Deutschland. Potentiale und Perspektiven (pp. 1−73). Bad Neuenahr-Ahrweiler. Justice, C. O., & Townshend, J. G. (1981). Integrating ground data with remote sensing. In J. G. Townshend (Ed.), Terrain analysis and remote sensing (pp. 232). London Allen and Unwin. Katila, M., & Tomppo, E. (2001). Selecting estimation parameters for the Finnish multisource National Forest Inventory. Remote Sensing of Environment, 76, 16−32. Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. New York Wiley. Kent, M. (2006). Numerical classification and ordination methods in biogeography. Progress in Physical Geography, 30, 399−408. Kercher, S. M., Frieswyk, C. B., & Zedler, J. B. (2003). Effects of sampling teams and estimation methods on the assessment of plant cover. Journal of Vegetation Science, 14, 899−906. Kerr, J. T., & Ostrovsky, M. (2003). From space to species: Ecological applications for remote sensing. Trends in Ecology and Evolution, 18, 299−305. Kim, M. S., Daughtry, C. S. T., Chappelle, E. W., McMurtrey, J. E., III, & Walthall, C. L. (1994). The use of high spectral resolution bands for estimating absorbed photosynthetically active radiation (APAR). Proceedings of the 6th Symposium on Physical Measurements and Signatures in Remote Sensing (pp. 299−306). France Val D'Isere. Lees, B. (2006). The spatial analysis of spectral data — Extracting the neglected data. Applied GIS, 2, 14.11−14.13. 1165 Legendre, P., & Gallagher, E. D. (2001). Ecologically meaningful transformations for ordination of species data. Oecologia, 129, 271−280. Legendre, L., & Legendre, P. (1998). Numerical ecology. Elsevier. Lewis, M. (1998). Numeric classification as an aid to spectral mapping of vegetation communities. Plant Ecology, 136, 133−149. Lewis, M. (2002). Mapping arid vegetation associations with HyMap imagery. IEEE Transactions on Geoscience and Remote Sensing, 5, 2805−2807. Liesenberg, V., Galvao, L. S., & Ponzoni, F. J. (2007). Variations in reflectance with seasonality and viewing geometry: Implications for classification of Brazilian savanna physiognomies with MISR/Terra data. Remote Sensing of Environment, 107, 276−286. Lu, D., & Weng, Q. (2007). A survey of image classification methods and techniques for improving classification performance. International Journal of Remote Sensing, 28, 823−870. Lucas, R., Bunting, P., Paterson, M., & Chisholm, L. (2008). Classification of Australian forest communities using aerial photography, CASI and HyMap data. Remote Sensing of Environment, 112, 2088−2103. Lucieer, A. (2006). Fuzzy classification of sub-Antarctic vegetation on Heard Island based on high-resolution satellite imagery. Geoscience and Remote Sensing Symposium, 2006. IGARSS 2006. IEEE International Conference on (pp. 2777−2780). Lucieer, A. (2008). Mapping grazed vegetation communities on MacQuarie Island using a binary ensemble classifier. 14th Australian Remote Sensing and Photogrammety Conference (ARSPC). Darwin SSI. Maechler, M., Rousseeuw, P., Struyf, A., & Hubert, M. (2005). Cluster analysis basics and extensions. Version 1.12.0 http://cran.r-project.org/web/packages/cluster/index.html. Malik, R. N., & Husain, S. Z. (2006). Spatial distribution of ecological communities using remotely sensed data. Pakistan Journal of Botany, 38, 571−582. Malik, R. N., & Husain, S. Z. (2008). Linking remote sensing and ecological vegetation communities: A multivariate approach. Pakistan Journal of Botany, 40, 337−349. Marchant, R. (2002). Do rare species have any place in multivariate analysis for bioassessment? Journal of the North American Benthological Society, 21, 311−313. Marignani, M., Del Vico, E., & Maccherini, S. (2007). Spatial scale and sampling size affect the concordance between remotely sensed information and plant community discrimination in restoration monitoring. Biodiversity and Conservation, 16, 3851−3861. McCune, B., Grace, J. B., & Urban, D. L. (2002). Analysis of ecological communities. MJM Design. McDermid, G. J., Franklin, S. E., & LeDrew, E. F. (2005). Remote sensing for large-area habitat mapping. Progress in Physical Geography, 29, 449−474. McGarigal, K., Cushman, S. A., & Stafford, S. (2000). Multivariate statistics for wildlife and ecology research. New York Springer. McGwire, K., Friedl, M., & Estes, J. E. (1993). Spatial structure, sampling design and scale in remotely-sensed imagery of a California savanna woodland. International Journal of Remote Sensing, 14, 2137−2164. Miura, T., Huete, A. R., Ferreira, L. G., & Sano, E. E. (2003). Discrimination and biophysical characterization of Cerrado physiognomies with EO-1 hyperspectral Hyperion. XI Simpósio Brasileiro de Sensoriamento Remoto — SBSR (pp. 1077−1082). Belo Horizonte INPE. Montgomery, D. C., Peck, E., & Vinning, G. G. (2001). Introduction to linear regression analysis. New York Wiley and Sons. Moraczewski, I. R. (1993). Fuzzy-Logic for phytosociology.1. Syntaxa as vague concepts. Vegetatio, 106, 1−11. Muche, G., Jürgens, N., Finckh, M., & Schmiedel, U. (2009). BIOTABase — Software for monitoring of biodiversity and environmental data. Hamburg, Germany http://www. biota-africa.org. Nagendra, H., & Rocchini, D. (2008). High resolution satellite imagery for tropical biodiversity studies: The devil is in the detail. Biodiversity and Conservation, 17, 3431−3442. Nagler, P. L., Inoue, Y., Glenn, E. P., Russ, A. L., & Daughtry, C. S. T. (2003). Cellulose absorption index (CAI) to quantify mixed soil–plant litter scenes. Remote Sensing of Environment, 87, 310−325. Nilsen, L., Elvebakk, A., Brossard, T., & Joly, D. (1999). Mapping and analysing arctic vegetation: Evaluating a method coupling numerical classification of vegetation data with SPOT satellite data in a probability model. International Journal of Remote Sensing, 20, 2947−2977. Ohmann, J. L., & Gregory, M. J. (2002). Predictive mapping of forest composition and structure with direct gradient analysis and nearest-neighbor imputation in coastal Oregon, USA. Canadian Journal of Forest Research-Revue Canadienne De Recherche Forestiere, 32, 725−741. Oksanen, J., Kindt, R., Legendre, P., O'Hara, R. B., Simpson, G. L., Sólymos, P., Stevens, M. H. H., & Wagner, H. (2008). Vegan: Community ecology package http://cran.r-project.org/, http://vegan.r-forge.r-project.org/. Oldeland, J., Wesuls, D., Rocchini, D., Schmidt, M., & Jürgens, N. (2009). Does using species abundance data improve estimates of species diversity from remotely sensed spectral heterogeneity? Ecological Indicators, doi:10.1016/j.ecolind.2009.1007.1012. Peel, M. J. S., Kruger, J. M., & S., M. (2007). Woody vegetation of a mosaic of protected areas adjacent to the Kruger National Park, South Africa. Journal of Vegetation Science, 18, 807−814. Pettorelli, N., Vik, J. O., Mysterud, A., Gaillard, J. M., Tucker, C. J., & Stenseth, N. C. (2005). Using the satellite-derived NDVI to assess ecological responses to environmental change. Trends in Ecology & Evolution, 20, 503−510. Pillar, V. D. (1999). How sharp are classifications? Ecology, 80, 2508−2516. Rahman, A. F., & Gamon, J. A. (2004). Detecting biophysical properties of a semi-arid grassland and distinguishing burned from unburned areas with hyperspectral reflectance. Journal of Arid Environments, 58, 597−610. Rahman, A. F., Gamon, J. A., Sims, D. A., & Schmidts, M. (2003). Optimum pixel size for hyperspectral studies of ecosystem function in southern California chaparral and grassland. Remote Sensing of Environment, 84, 192−207. 1166 J. Oldeland et al. / Remote Sensing of Environment 114 (2010) 1155–1166 Rao, C. R. (1995). A review of canonical coordinates and an alternative to correspondence analysis using Hellinger distance. Qüestiio, 19, 23−63. Ravan, S. A., Roy, P. S., & Sharma, C. M. (1995). Space remote sensing for spatial vegetation characterization. Journal of Biosciences, 20, 427−438. R Development Core Team (2008). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria http://www.r-project.or. Richards, J. A., & Xiuping, J. (2006). Remote sensing digital image analysis: An introduction. Berlin, Germany Springer. Richter, R., & Schläpfer, D. (2002). Geo-atmospheric processing of airborne imaging spectrometry data, Part 2: Atmospheric/topographic correction. International Journal of Remote Sensing, 23, 2631−2649. Roberts, D. W. (2007). labdsv: Ordination and multivariate analysis for ecology.R package version 1.3-1 http://ecology.msu.montana.edu/labdsv/R. Roleček, J., Tichy, L., Zelený, D., & Chytry, M. (2009). Modified TWINSPAN classification in which the hierarchy respects cluster heterogeneity. Journal of Vegetation Science, 20, 596−602. RSI (2005). ENVI version 4.2 — The environment for visualizing images. Boulder, CO Research Systems Inc. Schläpfer, D., & Richter, R. (2002). Geo-atmospheric processing of airborne imaging spectrometry data. Part 1: parametric orthorectification. International Journal of Remote Sensing, 23, 2609−2630. Schmidt, K. S., & Skidmore, A. K. (2003). Spectral discrimination of vegetation types in a coastal wetland. Remote Sensing of Environment, 85, 92−108. Schmidtlein, S., & Sassin, J. (2004). Mapping of continuous floristic gradients in grasslands using hyperspectral imagery. Remote Sensing of Environment, 92, 126−138. Schmidtlein, S., Zimmermann, P., Schupferling, R., & Weiss, C. (2007). Mapping the floristic continuum: Ordination space position estimated from imaging spectroscopy. Journal of Vegetation Science, 18, 131−140. Serrano, L., Penuelas, J., & Ustin, S. L. (2002). Remote sensing of nitrogen and lignin in Mediterranean vegetation from AVIRIS data: Decomposing biochemical from structural signals. Remote Sensing of Environment, 81, 355−364. Sokal, R. R., & Rohlf, F. T. (1962). The comparison of dendrograms by objective methods. Taxon, 11, 33−40. Stohlgren, T. J. (2007). Measuring plant diversity. USA Oxford University Press Inc. Ter Braak, C. J. F. (1987). The analysis of vegetation–environment relationships by canonical correspondence analysis. Vegetatio, 69, 69−77. Thomas, V., Treitz, P., Jelinski, D., Miller, J., Lafleur, P., & McCaughey, J. H. (2003). Image classification of a northern peatland complex using spectral and plant community data. Remote Sensing of Environment, 84, 83−99. Tichy, L. (2002). JUICE, software for vegetation classification. Journal of Vegetation Science, 13, 451−453. Todd, S. W., & Hoffman, M. T. (1999). A fence-line contrast reveals effects of heavy grazing on plant diversity and community composition in Namaqualand, South Africa. Plant Ecology, 142, 169−178. Treitz, P. M., & Howarth, P. J. (1999). Hyperspectral remote sensing for estimating biophysical parameters of forest ecosystems. Progress in Physical Geography, 23, 359−390. Ustin, S. L., Jacquemoud, S., Palacios-Oruetal, A., Li, L., & Whiting, M. L. (2005). Remote sensing based assesment of biophysical indicators. RGLDD — Remote Sensing and Geoinformation Processing in the Assessment and Monitoring of Land Degradation and Desertification (pp. 2−3). Remote Sensing Department, University of Trier, Germany. Ustin, S. L., Roberts, D. A., Gamon, J. A., Asner, G. P., & Green, R. O. (2004). Using imaging spectroscopy to study ecosystem processes and properties. BioScience, 54, 523−534. van den Wollenberg, A. L. (1977). Redundancy analysis. An alternative for canonical correlation analysis. Psychometrika, 42, 207−219. Van Rooyen, N., Van Rooyen, M. W., Bothma, J. D. P., & Van Den Berg, H. M. (2008). Landscapes in the Kalahari Gemsbok National Park, South Africa.Kodoe, 50, 99−112 www.sanparks.org. Yue, Y., Wang, K., Zhang, B., Chen, Z., Jiao, Q., Liu, B., et al. (2008). Exploring the relationship between vegetation spectra and eco-geo-environmental conditions in karst region. Southwest China. Environmental Monitoring and Assessment, doi:10.1007/s10661-0080665-z. Zak, M. R., & Cabido, M. (2002). Spatial patterns of the Chaco vegetation of central Argentina: Integration of remote sensing and phytosociology. Applied Vegetation Science, 5, 213−226. Zhang, J., & Foody, G. M. (2001). Fully-fuzzy supervised classification of sub-urban land cover from remotely sensed imagery: Statistical and artificial neural network approaches. International Journal of Remote Sensing, 22, 615−628. Zuur, A. F., Ieno, E. N., & Smith, G. M. (2007). Analysing ecological data. Berlin Springer.