Evaluation of atlas selection strategies for atlas
Transcription
Evaluation of atlas selection strategies for atlas
www.elsevier.com/locate/ynimg NeuroImage 21 (2004) 1428 – 1442 Evaluation of atlas selection strategies for atlas-based image segmentation with application to confocal microscopy images of bee brains Torsten Rohlfing, a,* Robert Brandt, b,c Randolf Menzel, b and Calvin R. Maurer Jr. a a Image Guidance Laboratories, Department of Neurosurgery, Stanford University, Stanford, CA 94305-5327, USA Institut für Neurobiologie, Freie Universität Berlin, Berlin, Germany c Indeed – Visual Concepts GmbH, Berlin, Germany b Received 17 July 2003; revised 3 November 2003; accepted 4 November 2003 This paper evaluates strategies for atlas selection in atlas-based segmentation of three-dimensional biomedical images. Segmentation by intensity-based nonrigid registration to atlas images is applied to confocal microscopy images acquired from the brains of 20 bees. This paper evaluates and compares four different approaches for atlas image selection: registration to an individual atlas image (IND), registration to an average-shape atlas image (AVG), registration to the most similar image from a database of individual atlas images (SIM), and registration to all images from a database of individual atlas images with subsequent multi-classifier decision fusion (MUL). The MUL strategy is a novel application of multi-classifier techniques, which are common in pattern recognition, to atlas-based segmentation. For each atlas selection strategy, the segmentation performance of the algorithm was quantified by the similarity index (SI) between the automatic segmentation result and a manually generated gold standard. The best segmentation accuracy was achieved using the MUL paradigm, which resulted in a mean similarity index value between manual and automatic segmentation of 0.86 (AVG, 0.84; SIM, 0.82; IND, 0.81). The superiority of the MUL strategy over the other three methods is statistically significant (two-sided paired t test, P < 0.001). Both the MUL and AVG strategies performed better than the best possible SIM and IND strategies with optimal a posteriori atlas selection (mean similarity index for optimal SIM, 0.83; for optimal IND, 0.81). Our findings show that atlas selection is an important issue in atlas-based segmentation and that, in particular, multi-classifier techniques can substantially increase the segmentation accuracy. D 2004 Elsevier Inc. All rights reserved. Keywords: Atlas-based segmentation; Atlas selection; Nonrigid image registration; Bee brain; Confocal microscopy imaging * Corresponding author. Image Guidance Laboratories, Department of Neurosurgery, Stanford University, MC 5327, Room S-012, 300 Pasteur Drive, Stanford, CA 94305-5327. Fax: +1-650-724-4846. E-mail address: rohlfing@stanford.edu (T. Rohlfing). Available online on ScienceDirect (www.sciencedirect.com.) 1053-8119/$ - see front matter D 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.neuroimage.2003.11.010 Introduction Segmentation of biomedical images, that is, the assignment of a tissue classification or label to each image voxel, is for many applications still mostly a manual or at best a semiautomatic task involving substantial user interaction. One promising approach to perform fully automatic segmentation of an image from an unknown subject is to compute an anatomically correct coordinate transformation (registration) between the image and an already segmented atlas image (Baillard et al., 2001; Collins et al., 1995; Crum et al., 2001; Dawant et al., 1999; Gee et al., 1993; Hartmann et al., 1999; Iosifescu et al., 1997; Miller et al., 1993). The more accurately the registration transformation maps the atlas onto the image to be segmented, the more accurate the result of the segmentation. There is typically considerable inter-individual variability in the shapes of anatomical structures in the brains of humans and animals, and thus an effective registration-based segmentation method requires a registration algorithm with a large number of parameters or degrees of freedom (i.e., a nonrigid registration algorithm). Many different nonrigid registration methods have been used for atlas-based segmentation. Most previously reported approaches used an optical flow registration algorithm (Baillard et al., 2001; Dawant et al., 1999; Hartmann et al., 1999), or fluid registration (Christensen et al., 1996; Crum et al., 2001). Both types of algorithms effectively compute the deformation between image and atlas based on local intensity gradients. Miller et al. (1993) used an algorithm with an elastic deformable solid model, which was later extended by Christensen and Johnson (2001) to preserve consistency between forward and backward transformations and to accept an initialization using a landmark-based nonrigid transformation. Very little attention, however, has been paid to the influence of the atlas image on the result of the atlas-based segmentation. The majority of published works use a single segmented individual image, usually randomly selected, as the atlas (Baillard et al., 2001; Dawant et al., 1999; Hartmann et al., 1999; Iosifescu et al., 1997). Often, the criteria used for atlas selection are not even mentioned. More than a single atlas is used by Thompson and Toga (1997) T. Rohlfing et al. / NeuroImage 21 (2004) 1428–1442 who use a database of atlases to generate a probabilistic segmentation of a new subject. Recently, atlases generated by averaging multiple subjects from a population have become increasingly popular. For the human heart, an average atlas derived from cardiac MR images by Rao et al. (2003) has been used for atlas-based segmentation (Lorenzo-Valdes et al., 2002). Similarly, an average atlas of the lung has been derived from CT images by Li et al. (2003). The present paper explicitly focuses on the atlas selection and compares different atlas selection strategies. In particular, we compute the accuracies of segmentations generated using (1) a single individual atlas, (2) an average-shape atlas, (3) the best individual atlas from a database, and (4) all atlases from a database, combined using multi-classifier decision fusion. For comparison, the accuracies of all atlas-based segmentations are evaluated by computing their accuracy with respect to a manually generated gold standard segmentation. The registration-based segmentation technique is applied to confocal microscopy images acquired from the brains of 20 bees (Fig. 1). Various artifacts in the images complicate the successful application of optical flow and fluid registration methods commonly used for atlas-based segmentation. Substantial intensity variations are caused by a combination of variability in the dissection, fixation, and staining process; temporal laser power fluctuation; and spatial distribution of the synapse proteins that are labeled fluorescent. Spurious image edges are furthermore introduced by tiled image acquisition and subsequent merging (see Imaging for a brief 1429 Table 1 The 22 anatomical structures of the bee brain that are labeled in this study with assigned abbreviations Abbreviation Anatomical structure Abbreviation Anatomical structure PL-SOG r-medBR CB protocerebral lobes central body l-latBR l-Med left medulla r-latBR r-Med l-Lob r-Lob l-AL r-AL l-vMB right medulla left lobula right lobula left antennal lobe right antennal lobe left ventral mushroom body right ventral mushroom body left medial basal ring l-medColl r-medColl l-latColl r-latColl l-medLip r-medLip right medial basal ring left lateral basal ring right lateral basal ring left medial collar right medial collar left lateral collar right lateral collar left medial lip right medial lip l-latLip left lateral lip r-latLip right lateral lip r-vMB l-medBR summary of the imaging process). For the above reasons, we have chosen to apply a nonrigid registration algorithm by Rueckert et al. (1999) that we have found to be reliable and efficient in previous applications (Rohlfing and Maurer, 2003; Rohlfing et al., 2003a,b). The algorithm uses a B-spline-based free-form deformation as the transformation model and a global image similarity measure with a penalty term to constrain the deformation to be smooth. A brief review of the registration method is provided in Image registration algorithm. This work is part of a larger project that aims to quantify the anatomy of the honeybee brain. In this project, the atlas-based segmentation is intended to facilitate volumetric measurements of certain brain compartments during development. We are also developing and applying nonrigid registration algorithms using free-form deformation and information-theoretic similarity measures to create a reference atlas of the honeybee brain used to integrate functional and structural data coming from different individuals (Brandt et al., submitted for publication; Rohlfing et al., 2001). Thus, we hope to use these registration methods to integrate into the atlases a variety of neurons, including optic and olfactory neurons, which are imaged after single cell injections. The atlases and shape models will be compared to look for gross volume and shape differences, and to compare the densities and type of projections. This methodology is also useful for medical image processing, and we are currently applying it to the construction of statistical shape models of human bones from CT images. Materials and methods Imaging Fig. 1. Example of bee brain confocal microscopy (top) and corresponding label image as defined by manual segmentation (bottom). Every gray level in the label image represents a different anatomical structure. Due to limitations of reproduction, different gray levels may look alike. The correspondence between anatomical structures and abbreviations is listed in Table 1. Note that two structures, the left and right antennal lobes (l-AL and r-AL), are not visible in this slice. For this study, 20 brains from adult, foraging honeybees served as subjects. Staining followed an adapted immunohistological protocol. Dissected and fixated brains were treated with antisera raised against synapse proteins (nc46, SYNORF1) (Klagges et al., 1996; Reichmuth et al., 1995) and labeled fluorescent using a Cy3conjugated secondary antibody. After dehydration and clearing, the specimens were mounted in double-sided custom slides. 1430 T. Rohlfing et al. / NeuroImage 21 (2004) 1428–1442 Whole mounts were imaged with a confocal laser scanning microscope (Leica TCS 4D) using a Leica HC PL APO 10/0.4 dry lens. The fluorescence was excited with the 568-nm line of an ArKr laser, detected using a 590-nm long-pass filter, and quantized with a resolution of 8 bits. Due to the size of the dissected and embedded brain, the specimens were scanned sequentially in 2 3 partially overlapping single scans, each using 512 512 pixels laterally. Stacks were combined and resampled laterally to half of the original dimensions so that the final image volume contained 84 – 114 slices (sections) with thickness of 8 Am, and each slice had 610 – 749 pixels in x direction and 379 – 496 pixels in y direction with pixel size of 3.8 Am. Subsequently, a gold standard for the automatic segmentation was created by manually tracing the neuropil areas of interest on each slice. This task was performed with the Amira 3-D scientific visualization and data analysis package (ZIB, Berlin, Germany; Indeed – Visual Concepts GmbH, Berlin, Germany; TGS Inc., San Diego, CA). We distinguished 22 major compartments of the bee brain, 20 of which are bilaterally symmetric on either brain hemisphere. The paired structures we labeled were medulla; lobula; the antennal lobe; the ventral mushroom body consisting of peduncle, a- and h-lobe; and medial and lateral lip, collar, and basal ring neuropil. The unpaired structures we considered were the central body with its upper and lower division and the protocerebral lobes including the subesophageal ganglion (see also Mobbs, 1985). An example of a confocal microscopy and the corresponding label image is shown in Fig. 1. The abbreviations for all anatomical structures are listed in Table 1. The manual segmentation was performed by two experts, each of whom segmented a different subset of the available bee brains. Repeated segmentation of the same individual by several experts was not feasible due to the large amount of data and limited resources. There is therefore no problem-specific estimate of the inter-observer segmentation variability for human experts, and no information regarding the accuracy of the gold standard. However, the segmentation problem was posed in a way to facilitate human segmentation, for example, by not separating substructures that cannot be visually distinguished. The protocerebral lobes and the subesophageal ganglion, for example, are treated as one structure for the purpose of segmentation. Image registration algorithm Atlas-based segmentation requires the computation of an accurate coordinate transformation between the image to be segmented and an already segmented atlas image. An initial alignment of both images is first achieved using an affine registration method with 9 degrees of freedom (DOFs). The method we use is an implementation of the technique for rigid and affine registration described by Studholme et al. (1997). It uses normalized mutual information (NMI) as the similarity measure (Studholme et al., 1999). In the first step, this method is employed directly for finding an initial affine transformation to capture the global displacement of both images. This transformation is then used as the initial estimate for the nonrigid registration. The nonrigid algorithm is a modified implementation of the technique introduced by Rueckert et al. (1999). It uses the same NMI similarity measure as the rigid registration. However, a different optimization technique is used to address the problem of the high dimensionality of the search space in the nonrigid case. Using adaptive grid refinement (Rohlfing and Maurer, 2001) and a parallel multiprocessor implementation (Rohlfing and Maurer, 2003), we are able to keep computation times within reasonable bounds. The confocal microscopy images in the present study suffer from substantial intensity variations, which frequently cause problems for intensity-based registration methods. The same anatomical structure may be imaged with a near-constant intensity in one subject, but cover a large range of intensities in an image from another subject. The nonrigid registration algorithm has a tendency to align homogeneous substructures in one image with homogeneous structures in the other image. In the presence of the aforementioned intensity distribution differences, the result is the mapping of unrelated image regions, producing a grossly incorrect coordinate transformation (Fig. 2). Analogously to Rueckert et al. (1999), we incorporate a regularizing penalty term in addition to the NMI similarity measure to constrain the deformation of the coordinate space, drive the registration process in areas of inconsistent image intensities, and thus prevent grossly incorrect deformations. An illustrative example is shown in Fig. 2. In detail, we force the deformation to be smooth by adding a biharmonic penalty term, which is based on the energy of a thin plate of metal that is subjected to bending deformations (Bookstein, 1989; Wahba, 1990). The penalty term is composed of second-order derivatives of the deformation, integrated over the domain D of the transformation T as follows: Econstraint ¼ Z D BT Bx2 " þ2 2 BT 2 BT 2 þ þ By2 Bz2 BT BxBy 2 # BT 2 BT 2 þ þ dx: ByBz BzBx ð1Þ With the constraint term incorporated, the total optimization function becomes a weighted sum of the data-dependent image similarity and the regularization constraint term: Etotal ¼ ð1 wÞENMI þ wEconstraint : ð2Þ An important issue with constrained nonrigid registration methods is the relative weighting of the image similarity measure and the deformation constraint penalty term in the cost function. Since the two terms represent fundamentally unrelated quantities, there is no obvious way to determine the correct weight w a priori. It would also be desirable in many cases to choose different weights for different images, or even for different regions within the same image, making w a function of location. Again, since there is no formal way to determine the weight globally, there is also no solution for the harder problem of selecting the weight locally. In the present study, a single global weight (w = 0.1) was chosen for all individuals by repeating the registrations with different values and choosing the one that produced the highest median segmentation accuracy. Importantly, the segmentation results turned out to be relatively insensitive to the value of the weight over a fairly wide range of values. Both properties of the relationship between registration accuracy and smoothness constraint weight, the peak at w = 0.1 and the relative insensitivity to the value of the weight, are nicely illustrated in Fig. 3. Atlas selection strategies A major point of this paper is the investigation of the influence of the choice of the reference atlas on the outcome T. Rohlfing et al. / NeuroImage 21 (2004) 1428–1442 1431 Fig. 2. Illustration of registration process and importance of constraining nonrigid registration. These microscopy images are magnified in the area of the left lobula (Fig. 1 shows a complete image). In the first step of the registration process, a floating image is initially globally aligned (b) to the reference image (a) using a rigid registration algorithm. The rigid transformation is then used as the initial estimate for the nonrigid registration. In the reference image (a), the lobula appears substantially darker on the lateral side (ellipse), while in the floating image (b) from another bee, the lobula has a more homogeneous intensity. An unconstrained intensity-based nonrigid registration (c) computes a grossly incorrect deformation (arrows). A constrained nonrigid registration (d) does not have this problem. of the registration-based segmentation. In particular, we evaluate four different strategies, which are described in detail below. We refer to the already segmented image as the atlas image and the image to be segmented as the raw image. The coordinates of the atlas image are mapped by way of nonrigid registration onto those of the raw image and thereby provide a segmentation of the latter. In the context of nonrigid registration, the atlas image is to be deformed while the raw image remains fixed. The correspondence between the common terms for both images in image registration and in the present context is such that the atlas image acts as the floating image during registration while the raw image acts as the reference image. For the present study, 20 manually segmented confocal microscopy images were available as candidates for use as both the raw and the atlas image. For each registration-based segmentation performed in this study, one of the images was used as the raw image. This image is automatically segmented using a registrationbased method. The manual segmentation in this case is used only for validation (see Validation study design). The remaining 19 images were then available as atlas images. We refer to each of these atlas images as an individual atlas, since it corresponds to an Fig. 3. Percentage of correctly labeled voxels (median over all segmented images) vs. smoothness constraint weight used for nonrigid registration. Note that to visually separate the plotted lines, the vertical axis only covers the top 10% range between 90% and 100%. For a description of the four segmentation methods, see Atlas selection strategies. 1432 T. Rohlfing et al. / NeuroImage 21 (2004) 1428–1442 actual individual subject. In addition, we consider below an average atlas, which is a segmented image generated by averaging the shapes of a population of subjects (Rohlfing et al., 2001). We evaluate registration-based segmentation results obtained using four different choices of the atlas image(s): Individual atlas image (IND). One of the 20 manually segmented images was chosen to serve as an individual atlas and registered to each of the 19 remaining images. The choice of the atlas was based on visual assessment of image quality and intensity uniformity. To compare segmentation results for the same set of raw images for all strategies and to avoid potential bias of the validation, the image used as the atlas for this paradigm is excluded from evaluation of the three other strategies described below; thus, a total of 19 eligible raw images are used for validation of all methods. Average-shape atlas image (AVG). An average-shape atlas image is registered to each of the 19 eligible raw images. The average-shape atlas image was generated from all 20 individual atlases using a method outlined in Appendix A. The motivation for using an average atlas is that, since it represents the average shape of a population, it should require less deformation than a randomly selected individual atlas when nonrigidly registered to a given raw image. Thus, it might provide higher segmentation accuracy. Most similar atlas image from a database (SIM). Each of the eligible raw images is registered to the remaining 19 atlas images. The most ‘‘similar’’ atlas image out of these 19 is then used as the actual atlas image for segmentation. In Appendix B, we compare four different criteria for selection of the most similar atlas image. Based on the results described there, the criterion chosen was the value of NMI after nonrigid registration. All atlas images from a database with multi-classifier decision fusion (MUL). Each of the eligible raw images is registered nonrigidly to the remaining 19 atlas images. Each registration produces a segmentation, a total of 19 segmentations per raw image. All segmentations are in the coordinate system of the raw image and can easily be combined into a final segmentation by assigning to each voxel the label that received the most ‘‘votes’’ from the individual atlases (Rohlfing et al., 2001). This technique is equivalent to decision fusion using the ‘‘Vote Rule’’ in a multiclassifier system (Xu et al., 1992). In more detail, we are applying partial volume interpolation (PVI) as described by Maes et al. (1997) to interpolate labels in the deformed atlas images. The classifications from all atlases are then combined using the ‘‘Sum Rule,’’ which is generally considered to be superior to the Vote Rule (Kittler and Alkoot, 2003; Kittler et al., 1998). Table 2 provides an overview of the conceptual differences between the four atlas selection strategies. (s) (s) where Vmanual and Vatlas denote the sets of voxels labeled as belonging to structure s by manual and atlas-based segmentation, respectively. For perfect mutual overlap of manual and atlas-based segmentation, the SI has a value of 1. Lesser overlap results in smaller values of SI. No overlap between the segmentations results in an SI value of 0. Note that the exact value of SI for a segmentation error of one voxel, for example, depends on parameters of the segmented object, such as its volume and its shape characteristics. Results Comparison of atlas selection strategies To enable comparison of the different atlas selection strategies, Fig. 4 shows the percentage of the segmentations using each method that achieved an SI value greater than given thresholds between 0.70 and 0.95. It is easy to see that the MUL paradigm produced segmentation results superior to the ones produced by the other methods. The MUL strategy achieved SI values of 0.7 or higher for 97% of all segmentations. The AVG paradigm produced SI values that are consistently lower than MUL, but with the exception of the 0.7 threshold also consistently higher than both IND and SIM. Finally, the SIM strategy produced slightly better results than the IND strategy. The mean SI values of registrationbased segmentations produced by the four atlas selection strategies are: MUL, 0.86; AVG, 0.84; SIM, 0.82; IND, 0.81. Each of these values is the mean of 418 segmentation SI values, one for each of 22 anatomical structures in each of 19 segmented brains. These observations are supported by statistical tests that were performed as follows: First, the mean SI value over all anatomical structures was computed for each segmented raw image. This was done because all anatomical objects in one raw image are segmented using the same nonrigid registration transformation and thus the SI values for all structures in one segmentation cannot be considered to be statistically independent. Then the four sets (one for each atlas selection strategy) of 19 mean SI values (one for each segmented image) were compared using two-tailed paired t tests for all combinations of atlas selection strategies. The results are Validation study design For every registration, the registration-based segmentation is compared with the manual segmentation. As a measure of segmentation quality, we compute the similarity index (SI) (Zijdenbos et al., 1994). For a structure s, the SI is defined as ðsÞ SIðsÞ ¼ ðsÞ 2jVmanual \ Vatlas j ðsÞ ðsÞ jVmanual j þ jVatlas j ; ð3Þ Fig. 4. Percentage of registration-based segmentations with similarity index SI better than given threshold, plotted by atlas selection strategy. T. Rohlfing et al. / NeuroImage 21 (2004) 1428–1442 1433 Table 2 Overview of atlas selection strategies Strategy No. of atlases per raw image Type of atlas Assignment of atlas to raw image IND SIM AVG MUL one one one multiple individual individual average individual fixed variable fixed fixed The four strategies evaluated in this paper can be categorized according to the number of atlases used per raw image (one or multiple), the type of atlas used (individual or average), and the assignment of atlases to raw images (fixed, i.e., same atlas(es) for all raw images, or variable, i.e., different atlas image selected for each raw image). See text for details. listed in Table 3. Segmentations produced using the MUL paradigm have SI values that are significantly better than segmentations generated using the other three paradigms. Upper and lower boundaries for IND and SIM strategies Given the ground truth segmentations, we computed the upper and lower boundaries of all possible atlas choices for the IND and SIM strategies. The results are shown in Fig. 5. The meanings of the columns are as follows: ‘‘Best IND’’ represents the best possible result that can be achieved by using a single fixed individual atlas to segment all raw images. ‘‘Worst IND’’ is the opposite, which is the worst possible result with a single fixed atlas. Likewise, ‘‘Best SIM’’ is the result achieved using the best possible choice of atlas for each raw image, where we allow the use of different atlases for different raw images. Similarly, ‘‘Worst SIM’’ is the worst possible outcome of such a strategy. Appendix B provides some more details on the computation of performance bounds, in particular ‘‘Best SIM’’. The center columns of Fig. 5, labeled ‘‘IND’’ and ‘‘SIM,’’ show the results achieved using the actual IND and SIM strategies. For IND, this is using the atlas selected based on visual assessment, and for SIM it is using the most similar atlas based on the NMI similarity measure after nonrigid registration. As Fig. 5 nicely shows, the actual IND strategy achieves results in between ‘‘Worst IND’’ and ‘‘Best IND.’’ Likewise, the performance of the actual SIM strategy is within the bounds provided by ‘‘Worst SIM’’ and ‘‘Best SIM.’’ The figure also shows that both IND and SIM perform close to the respective upper boundaries, indicating reasonable criteria for atlas selection within each strategy. Finally, it is interesting to note that the results of ‘‘Worst SIM’’ are worse than those of ‘‘Worst IND,’’ while those of ‘‘Best SIM’’ are better Fig. 5. Upper and lower accuracy boundaries of all possible atlas selections for the IND and SIM strategies. than those of ‘‘Best IND.’’ This is easily explained by the observation that, in fact, any given IND strategy is a special case of a general SIM strategy. The SIM strategy therefore, in theory, provides more freedom of choice to improve the result (or make it worse, when looking for the lower quality bound). We note that the MUL and AVG atlas selection strategies produce segmentation accuracies superior even to selection of the best possible atlas based on knowledge of the a posteriori SI values (Fig. 6). This strategy, which is obviously only available in a validation study with known ground truth, represents the upper limit of segmentation accuracy with a single individual atlas image. In other words, in our study, any strategy that selects a single individual atlas, even if one allows for a different atlas to be used Table 3 Results of paired t tests between the atlas selection strategies with respect to SI values over all 19 segmented brains IND IND SIM AVG MUL NS + ( P < 0.01) + ( P < 0.001) SIM AVG MUL NS ( P < 0.01) NS ( P < 0.001) ( P < 0.001) ( P < 0.001) NS + ( P < 0.001) + ( P < 0.001) The row strategies are compared to the column strategies. A table entry ‘‘+’’ denotes statistically significant superiority of the former over the latter; ‘‘’’ denotes inferiority; ‘‘NS’’ denotes a statistically insignificant difference ( P > 0.05). For significant differences, the respective confidence levels are given in parentheses. Fig. 6. Upper and lower accuracy boundaries for the IND and SIM strategies compared to AVG and MUL strategies. 1434 T. Rohlfing et al. / NeuroImage 21 (2004) 1428–1442 for each raw image, will produce results inferior to those achieved using a combination of multiple individual atlases or an average atlas. Overall segmentation quality Fig. 7 shows typical atlas-based segmentations of two frontal (axial) slices of one bee brain. Atlas-based segmentation produced results that are generally within two voxels of the manually assigned segmentation and often within one voxel. This is illustrated by the segmentation error images, in which voxels with different labels assigned by automatic and manual segmentation are shown in black. There are occasionally a few isolated areas where the differences between the automatic and manual segmentations are more substantial. The structures most and least accurately segmented (with the MUL method) are shown in Figs. 8 and 9, respectively. For spatial orientation purposes, these structures are also marked in a threedimensional rendering of a segmented bee brain in Fig. 10. The most accurately segmented structure (SI = 0.97) was a left lobula (Fig. 8). It was bright and well delineated. This structure was therefore easy to register correctly to the atlas images. The manual and automatic segmentation differ by no more than one voxel in most regions, which is nicely shown by the difference images. The least accurately segmented structure (SI = 0.55) was a right medial lip (Fig. 9). This structure was hard to segment due to its complex shape and faint boundaries. Furthermore, the right medial lip has the shape of a torus in most individuals. In the subject shown in Fig. 9, however, the torus is not closed. This deviation from the majority of atlases represents an additional challenge for the registration-based segmentation. Note that the large area of disagreement between automatic and manual segmentation in the horizontal slices (bottom row of Fig. 9) was mostly due to an outof-plane misalignment along the edge of the segmented structure. Similarity index vs. object size and shape To appreciate the SI values computed in this study and to compare them with other published values, we investigated the dependence of SI values on object size. We performed a numerical simulation in which discretely sampled spheres of various radii were dilated by one or two voxels and the SI values between the original and dilated spheres were computed. The resulting SI values are plotted vs. object radius in Fig. 11. It is also easy to Fig. 7. Results of segmentation using nonrigid image registration (MUL atlas selection paradigm). The two columns show frontal (i.e., axial) images at two different slice locations. Top row: microscopy images. Center row: overlays of segmentation contours (shown in white) after nonrigid image registration. To clearly show the contours, the dynamic range of the underlying microscopy image was reduced. Bottom row: segmentation error images. Voxels with different labels assigned by manual and automatic segmentation are shown in black. T. Rohlfing et al. / NeuroImage 21 (2004) 1428–1442 1435 Fig. 8. Most accurately segmented structure (out of 418): left lobula (SI = 0.97). Columns from left to right: microscopy image, contour from manual segmentation, contour from automatic segmentation (MUL paradigm), difference image between manual and automatic segmentation, and perspective surface rendering of the isolated structure. The white pixels in the difference image show where manual and automatic segmentation disagree. Rows from top to bottom: frontal, sagittal, and horizontal slice through the left lobula. The scale in the top left image represents 100 Am, or 25 voxels in x and y direction (12.5 voxels in z). Fig. 9. Least accurately segmented structure (out of 418): right medial lip (SI = 0.55). See Fig. 8 for row and column descriptions. 1436 T. Rohlfing et al. / NeuroImage 21 (2004) 1428–1442 Fig. 10. Three-dimensional surface rendering of an individual segmented bee brain with marked structures corresponding to those shown in Fig. 8 (left lobula), Fig. 9 (right medial lip), and Fig. 14 (right lateral basal ring). Note the considerable shape difference between the near-spherical lobula and the torus-like medial lip. derive a closed-form expression for the continuous case. The SI between two concentric spheres, one with radius R and the other dilated by d, that is, with a radius of R + d, is SI ¼ 2ðR=dÞ2 3 2ðR=dÞ þ 3ðR=dÞ2 þ 3ðR=dÞ þ 1 : ð4Þ The SI values for the discrete and continuous cases are almost identical (Fig. 11). The SI value between a sphere and a concentric dilated sphere approximates the SI value for a segmentation error consisting of a uniform thickness misclassification on the perimeter of a spherical object. Inspection of Fig. 11 and Eq. (4) shows that SI depends strongly on object size and is smaller for smaller objects. A one-voxel-thick misclassification on the perimeter of a spherical object with a radius of 50 voxels has an SI value of 0.97, but for a radius of 10 voxels the SI value is only 0.86. In Fig. 12, the average volumes of the anatomical structures in the bee brain images under consideration are shown with the segmentation accuracies achieved for them using the MUL paradigm. It is easy to see that the larger a structure, the more accurately it was segmented by the atlas-based segmentation. This confirms the theoretical treatment above and illustrates the varying bias of the SI metric when segmenting structures of different sizes. Structure volume is not the only geometric characteristic that causes a bias of SI values. Structure shape is also important. A characteristic shape parameter of structures that is relevant in this context is the surface-to-volume ratio (SVR). In the discrete case, this ratio can be determined by computing the fraction q of voxels on the surface of the structure relative to its total number of voxels. Given q we can, for example, compute the SI between a structure and the structure after erosion by one voxel, which corresponds to a misclassification of all surface voxels, as SIq ¼ Fig. 11. Dependence of SI values on size for spherical objects. Discretely sampled spheres of various radii were dilated by one or two voxels and the SI values between the original and dilated spheres were computed. The SI value between a sphere and a concentric dilated sphere approximates the SI value for a segmentation error consisting of a uniform thickness misclassification on the perimeter of a spherical object. The squares show SI values computed from discrete numerical simulation of dilation by one voxel. The solid line shows SI values for the continuous case (Eq. (4)). The triangles show SI values computed from discrete numerical simulation of dilation by two voxels. The broken line shows SI values for the continuous case. Note that while the units on the horizontal axis are voxels for the discrete case, they are arbitrary units for the continuous case. 2V ð1 qÞ 1q ¼ : V þ ð1 qÞV 1 q=2 ð5Þ In Fig. 13, this formula is plotted in comparison to the actual values in our study resulting from segmentation using the MUL strategy. For most structures, the worst segmentation over all individuals is near the simulated one voxel erosion. The average values over all individuals are consistently better than the one voxel erosion line. In general, the larger the value of the SVR q, the lower the value of the SI metric. This is consistent with the prediction of the theoretical treatment above, which suggests that better SI values are easier to achieve on structures with a smaller SVR. Most importantly, Fig. 13 shows that a substantial fraction of the structures in the bee brain have SVRs near 0.5, which means that for the purpose of interpreting SI values, they cannot be treated as spherical objects by considering their size alone. In total, 389 out of 418 structures (93%) were segmented with an SI value better than the theoretical one-voxel-erosion threshold determined from their respective SVR. When applying the above criteria to segmentations of individual structures, we also find the theoretical predictions confirmed. T. Rohlfing et al. / NeuroImage 21 (2004) 1428–1442 1437 Fig. 12. Volumes of anatomical structures and corresponding segmentation accuracies. The gray bars show the volumes (in numbers of voxels) of the 22 anatomical structures, averaged over the 20 subjects in the present study. The black vertical lines represent the ranges of SI values achieved by the automatic segmentation (MUL paradigm) over all subjects. The diamonds show the respective medians over all subjects. The structure that was most accurately segmented in our study, shown in Fig. 8, was fairly large (122,000 voxels) and near spherical (SVR q = 0.17). There was therefore no substantial negative bias from volume and shape, resulting in a high SI value of 0.97. The least accurately segmented structure, shown in Fig. 9, on the other hand, was rather small (29,000 voxels), and torusshaped (SVR q = 0.51), resulting in a strong negative bias of the SI measure. An example of a structure segmented with SI = 0.70, a right lateral basal ring, is shown in Fig. 14. The volume of this structure was 16,000 voxels with SVR q = 0.58. This example illustrates that, for small structures with a complex shape, an SI value of 0.70, achieved for 97% of all structures using the MUL strategy, still indicates satisfactory segmentation accuracy. Computational performance We did not perform a detailed formal analysis of the computation times required to obtain the nonrigid image registrations as part of the present study. Our computing resources were very heterogeneous. Thus, there was no meaningful way of Fig. 13. SI values vs. surface-to-volume ratio. The curved lines show the numerical simulations of erosion by 1 voxel and 1/2 voxel, respectively, according to Eq. (5). Each dot represents a single segmented structure from one image. In addition, the mean SI value over all subjects (using the MUL strategy) is marked as ‘‘’’ for each structure. 1438 T. Rohlfing et al. / NeuroImage 21 (2004) 1428–1442 Fig. 14. Example of a structure segmented with accuracy SI = 0.70: right lateral basal ring. See Fig. 8 for row and column descriptions. comparing computation times for the four approaches we investigated using different choices of the segmented reference atlas. Nonetheless, we can summarize the approximate execution time for a single nonrigid image registration on the various computers we used in this study. The time to complete one registration is approximately 3 h on a several-year-old workstation (Sun workstation with a single UltraSparc II processor at a clock speed between 300 and 440 MHz) and less than an hour on a current PC (about 20 min with a 3.0-GHz Intel Pentium 4 processor). We have implemented a parallel version of the nonrigid image registration algorithm that takes advantage of shared-memory multiprocessor computer architectures using multi-threaded programming (Rohlfing and Maurer, 2003). The time to complete one registration is approximately 10 min on a two-processor PC. Using 48 processors on a 128-processor SGI Origin 3800 supercomputer (MIPS R12K processors running at 400 MHz), the computation time per registration is about 1 min (Rohlfing and Maurer, 2003). Discussion The results presented in this paper indicate that the accuracy of atlas-based image segmentation is strongly influenced by the strategies employed for selection of the atlas image(s). The MUL strategy, which is a novel application of multi-classifier techniques to atlas-based segmentation, produced segmentations that are significantly better than those produced by the other three choices of single atlas images. This finding confirms the belief in the pattern recognition community that multiple-classifier systems are generally superior to single classifiers (see, e.g., Xu et al., 1992), if their misclassifications are somewhat independent of each other. More importantly, therefore, our results also demonstrate that multiple classifiers, generated in a straightforward way by using multiple atlases, can be sufficiently independent in real-world applications for decision fusion to benefit from their complementary behavior. The AVG paradigm produced the second best segmentations. This may be related to the observation that registration to the average atlas for most raw images required smaller deformations than registration to individual atlas images (Fig. 15). The IND and SIM approaches produced almost identical results, both clearly inferior to those of the MUL and AVG paradigms. It is somewhat surprising that the SIM paradigm, that is, segmentation using the most ‘‘similar’’ individual atlas image, performed so poorly. It is important to note this is not an effect of the criterion used to select the most similar atlas image, but rather a general weakness of the approach itself. As we showed by using a posteriori SI values to select the best possible out of 19 atlases for each raw image (‘‘Best SIM’’ results), at least in our study, no strategy that uses a single individual atlas could outperform the MUL and AVG paradigms. It needs to be pointed out that the average-shape atlas was generated from the same individuals that were segmented in the present study. As a result, there is most likely some bias in the evaluation of the AVG paradigm. However, the use of an averageshape atlas necessarily assumes that it is possible to create a meaningful shape average of a population. If a population is sufficiently homogeneous to allow generation of a stable, meaningful average, then we have good reason to believe that the results of our study apply to an independently generated average, since the difference between an independent and a dependent average shape can be expected to be small. While it produces better segmentations, the obvious disadvantage of the MUL approach relative to the three other strategies we investigated is that it requires the computationally expensive nonrigid registration algorithm to be applied to many atlas images instead of just one. However, because our implementation of a T. Rohlfing et al. / NeuroImage 21 (2004) 1428–1442 1439 Fig. 15. Comparison of deformation magnitudes between subjects vs. between a subject and the average-shape atlas. The diamonds show the average deformation (over all foreground voxels) in micrometers when registering the respective raw image to the average-shape atlas. The vertical lines show the range of average deformations when registering the respective raw image to the remaining 19 individual atlas images. The boxes show the 25th and 75th percentile of the respective distributions. nonrigid registration algorithm is relatively fast (relative to other published algorithms), because registrations with different atlas images can be run independently on different computers, and because our implementation takes full advantage of multiprocessor computer systems (Rohlfing and Maurer, 2003), computation time is not too serious an issue. The automatically generated segmentations are generally very good, but they are not yet sufficiently good to completely replace manual segmentation. The registration-based segmentations could be manually refined. Our method could also be used as the first step to initialize a subsequent segmentation method such as a deformable model (McInerney and Terzopoulos, 1996), active contours, or level sets (Malladi et al., 1995). The suitability for this purpose is illustrated by the fact that the automatic segmentations are generally within two voxels of the manual segmentations, and often within one voxel. Thus, although registrationbased segmentation is not perfect, it is extremely well suited for generating initial solutions that can then be refined by other, more locally operating techniques. Also, up to one voxel of error can be due to interpolation error. A possible solution to this problem is to use splines to construct better models of the structures of interest. Already however, in comparison with results published by others, the segmentation accuracies reported in this paper are very encouraging. For example, Dawant et al. (1999) reported mean SI values of 0.96 for segmentation of the human brain from MR images and mean SI values of only 0.85 for segmentation of smaller brain structures such as the caudate. The mean SI value of segmentations produced using the MUL method in this study is 0.86, which given the small size and complex shape of most of the structures in the bee brains considered in this study is comparable to the values reported by Dawant et al. and supports the visual assessment observation (Fig. 7) that the automatic segmentations of many structures in the present study differ from manual segmentations on average by approximately one voxel. In fact, Zijdenbos et al. (1994) state that ‘‘SI > 0.7 indicates excellent agreement’’ between two segmentations. This criterion (SI > 0.7) is satisfied by the vast majority (97%) of all contours generated by our segmentations using the MUL method (Fig. 4). How do our results translate to other segmentation problems, for example, segmenting structures of the human brain? In several ways, the image quality of confocal microscopy images is inferior to clinical MR images due to imaging and technical limitations such as the tiled acquisition. On the other hand, the bee brain has a less complex shape than, for example, the human cortex, therefore posing less problems for the mathematical treatment of the coordinate transformation. So the problem of segmenting confocal microscopy images of the bee brain is at the same time both harder and easier than segmenting a human brain. Overall, we believe that both problems are sufficiently similar for our results to be relevant. However, applying the evaluation of atlas selection strategies to human brain data constitutes an important next step in our work. Another interesting question is what influence the atlas selection has in the presence of abnormal data. A fundamental problem with atlas-based segmentation methods is their inability to segment objects that are not present in the atlas, for example, tumors in clinical images. When using a single atlas, it may or may not be from the correct population. The AVG strategy has limitations as the average atlas could either be generated for one population, thus being inappropriate for another, or cover several populations, thus being unspecific for each of them. The strategies most likely to be successful are the SIM and the MUL strategies, since the database of atlases used for both methods can easily be built to contain multiple samples of each population. As we showed recently (Rohlfing et al., 2003b), the accuracy of segmentation with multiple atlases can be further improved by applying more sophisticated methods for combining the individual segmentations. This includes, but is not limited to, extensions of an expectation maximization method for estimating expert performance parameters originally proposed by Warfield et al. (2002). The general idea of such methods is to estimate the accuracy of each individual segmentation. Using these estimates, the atlases believed to be more accurate can then be assigned a higher weight in the decision fusion. When using atlases from different populations, this would, at least in theory, automatically select the appropriate atlases for a given subject and disregard the inappropriate ones. Testing the effectiveness of this approach in 1440 T. Rohlfing et al. / NeuroImage 21 (2004) 1428–1442 practice will be another interesting direction for future work in the field. Conclusion The accuracy of atlas-based image segmentation can be substantially improved over that obtained using a single individual atlas. This paper presented two promising approaches, the use of an average-shape atlas, and especially the application of multi-classifier techniques based on integrating multiple segmentations generated using different independent individual atlases. Acknowledgments Torsten Rohlfing was supported by the National Science Foundation under Grant No. EIA-0104114. Robert Brandt was supported by the BMBF under Grant No. 0310961. Most computations were performed on an SGI Origin 3800 supercomputer in the Stanford University Bio-X core facility for Biomedical Computation. Some computations were performed on a dual-processor PC with AMD Athlon processors courtesy of Dolf Pfefferbaum at SRI International, Menlo Park, CA. The authors thank Andreas Steege and Charlotte Kaps for manually tracing the microscopy images. The authors thank the anonymous reviewers for their numerous helpful comments and suggestions that we feel have substantially improved this paper. Appendix A . Generation of the average-shape atlas The average-shape atlas used for segmentation with the AVG paradigm was generated from all 20 bee brains using a method suggested by Ashburner (2000) and applied to the bee brain by Rohlfing et al. (2001). The algorithm is a simple iterative procedure that, unlike other methods, does not require inverting nonrigid coordinate transformations. The first iteration selects one arbitrary individual image as a reference and registers each of the remaining images to the reference using an affine transformation. Using these transformations, an average image is computed. In the second iteration, all individuals including the initial reference are registered to the average image by nonrigid transformations. A new average image is generated using the new transformations and used as the reference for the following registration iteration. The procedure is repeated until convergence. Because the first iteration of the algorithm is an affine registration only, the shape of the arbitrary reference image does not predetermine the shape of the resulting final average image. There is admittedly little strict mathematical foundation for the algorithm, but for the purpose of this paper, we are only interested in a rather operational definition of ‘‘average shape’’: the average-shape atlas should minimize the overall deformations required to match all individuals of the population to it. For the average-shape atlas generated as described above and used in the present study, Fig. 15 illustrates that, in fact, the differences between a raw image and an individual atlas are on average substantially larger than the differences between a raw image and the average atlas. Most raw images are more similar in shape to the average-shape atlas than to any (or at least the majority) of the remaining 19 individual atlas images. Appendix B . Criteria for selecting the most similar atlas We compare four different criteria for selecting the individual atlas image that is ‘‘most similar’’ to a given raw image, that is, which is expected to provide the best segmentation accuracy out of all individual atlases for this particular raw image. These criteria are: Value of NMI after affine registration (NMIaffine). The similarity between the raw image and each individual atlas image is quantified by the final value of the NMI image similarity measure after completing the affine registration. The atlas image with the highest NMI value after registration is selected and used for segmentation of the respective raw image. This criterion requires only an affine registration to be computed between the raw image and each of the individual atlases. It is therefore considerably less computationally expensive than the remaining three criteria described below. Value of NMI after nonrigid registration (NMInonrigid). This criterion compares the NMI image similarities between the raw image and all individual atlases after nonrigid registration. Again, the atlas with the highest NMI value after registration is selected and used for segmentation. Average deformation of the atlas over all voxels (DEFavg). After nonrigid registration, the magnitude of the deformation between the raw image and each individual atlas is computed and averaged over all voxels. The atlas with the smallest average deformation is selected and used for segmentation. Whereas the above criteria are based on intensity similarity, this criterion is based on geometric (i.e., shape) similarity. Maximum deformation of the atlas over all voxels (DEFmax). This criterion is identical to the previous one, except that it uses Fig. 16. Plot of percentages of structures segmented with accuracy equal to or better than given SI thresholds. Each column represents one criterion for choosing the most ‘‘similar’’ individual atlas for a given raw image. The stacked bars show the percentages of structures that were segmented with SI better than 0.95 through 0.70 from bottom to top. For comparison, the leftmost column shows the results when the atlas with the best a posteriori segmentation result is used for each raw image. This is the upper bound for the accuracy achievable with any criterion for selection of the most similar atlas. T. Rohlfing et al. / NeuroImage 21 (2004) 1428–1442 the maximum deformation over all voxels rather than the average. This criterion pays more attention to outliers. The idea is that atlases that match well overall may be substantially inaccurate in some regions. Segmentations were generated for each of the criteria above. The accuracy of a segmentation was computed as the SI between the segmentation and the manual gold standard. Fig. 16 shows a graph of the percentages of structures segmented with varying levels of accuracy. For comparison, this graph includes results achieved when using the best atlas according to the a posteriori SI values for each raw image (leftmost column, Best SIM). In other words, this column shows the best possible result that can be achieved using only a single individual atlas, where the selection of this atlas is governed by the knowledge of the resulting segmentation accuracy (SI value). Obviously, this is not a strategy that could be applied in practice. However, it provides an upper bound for the segmentation accuracy that can be achieved using a single individual atlas image, albeit a different one for each raw image, chosen from the database of individual atlas images. Among the four criteria that do not depend on a posteriori accuracy evaluation, the NMI image similarity after nonrigid registration performed slightly better than the others. It was therefore selected as the criterion used for the SIM atlas selection strategy in this paper. We note that the selection of the most similar atlas based on nonrigidly registered raw image and atlas depends on the registration method used, as well as to some extent on its parameterization. This is desirable, since the best-matching atlas may in fact be different, depending on the exact registration technique. However, it is our experience that the atlas selection is fairly stable. Specifically, when using different smoothness constraint weights (see Fig. 3), we found that in 8 out of 20 cases the atlas selected based on the final value of NMI after nonrigid registration was identical for all six constraint weights. In another five cases, there was one constraint weight value for which a different atlas would have been selected. References Ashburner, J., 2000. Computational Neuroanatomy. PhD thesis, University College London. Baillard, C., Hellier, P., Barillot, C., 2001. Segmentation of brain 3D MR images using level sets and dense registration. Med. Image Anal. 5 (3), 185 – 194. Bookstein, F.L., 1989. Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11 (6), 567 – 585. Brandt, R., Rohlfing, T., Steege, A., Westerhoff, M., Menzel, R., 2004. An average three-dimensional atlas of the honeybee brain based on confocal images of 20 subjects. J. Comp. Neurol. (submitted for publication). Christensen, G.E., Johnson, H.J., 2001. Consistent image registration. IEEE Trans. Med. Imag. 20 (7), 568 – 582. Christensen, G.E., Rabbitt, R.D., Miller, M.I., 1996. Deformable templates using large deformation kinematics. IEEE Trans. Image Process. 5 (10), 1435 – 1447. Collins, D.L., Holmes, C.J., Peters, T.M., Evans, A.C., 1995. Automatic 3-D model-based neuroanatomical segmentation. Hum. Brain Mapp. 3 (3), 190 – 208. Crum, W.R., Scahill, R.I., Fox, N.C., 2001. Automated hippocampal segmentation by regional fluid registration of serial MRI: validation and application in Alzheimer’s disease. NeuroImage 13 (5), 847 – 855. 1441 Dawant, B.M., Hartmann, S.L., Thirion, J.P., Maes, F., Vandermeulen, D., Demaerel, P., 1999. Automatic 3-D segmentation of internal structures of the head in MR images using a combination of similarity and freeform transformations: Part I, methodology and validation on normal subjects. IEEE Trans. Med. Imag. 18 (10), 909 – 916. Gee, J.C., Reivich, M., Bajcsy, R., 1993. Elastically deforming a threedimensional atlas to match anatomical brain images. J. Comput. Assist. Tomogr. 17 (2), 225 – 236. Hartmann, S.L., Parks, M.H., Martin, P.R., Dawant, B.M., 1999. Automatic 3-D segmentation of internal structures of the head in MR images using a combination of similarity and free-form transformations: Part II, validation on severely atrophied brains. IEEE Trans. Med. Imag. 18 (10), 917 – 926. Iosifescu, D.V., Shenton, M.E., Warfield, S.K., Kikinis, R., Dengler, J., Jolesz, F.A., McCarley, R.W., 1997. An automated registration algorithm for measuring MRI subcortical brain structures. NeuroImage 6 (1), 13 – 25. Kittler, J., Alkoot, F.M., 2003. Sum versus vote fusion in multiple classifier systems. IEEE Trans. Pattern Anal. Mach. Intell. 25 (1), 110 – 115. Kittler, J., Hatef, M., Duin, R.P.W., Matas, J., 1998. On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20 (3), 226 – 239. Klagges, B.R.E., Heimbeck, G., Godenschwege, T.A., Hofbauer, A., Pflugfelder, G.O., Reifegerste, R., Reisch, D., Schaupp, M., Buchner, E., 1996. Invertebrate synapsins: a single gene codes for several isoforms in Drosophila. J. Neurosci. 16, 3154 – 3165. Li, B., Christensen, G.E., Hoffman, E.A., McLennan, G., Reinhardt, J.M., 2003. Establishing a normative atlas of the human lung: intersubject warping and registration of volumetric CT images. Acad. Radiol. 10 (3), 255 – 265. Lorenzo-Valdes, M., Sanchez-Ortiz, G.I., Mohiaddin, R., Rueckert, D., 2002. Atlas-based segmentation and tracking of 3D cardiac MR images using non-rigid registration. In: Dohi, T., Kikinis, R. (Eds.), Medical Image Computing and Computer-Assisted Intervention—MICCAI 2002: 5th International Conference, Tokyo, Japan, September 25 – 28, 2002, Proceedings, Part I. Lecture Notes in Computer Science, vol. 2488. Springer-Verlag, Heidelberg, pp. 642 – 650. Maes, F., Collignon, A., Vandermeulen, D., Marchal, G., Suetens, P., 1997. Multi-modality image registration by maximisation of mutual information. IEEE Trans. Med. Imag. 16 (2), 187 – 198. Malladi, R., Sethian, J.A., Vemuri, B.C., 1995. Shape modelling with front propagation: a level set approach. IEEE Trans. Pattern Anal. Mach. Intell. 17 (2), 158 – 175. McInerney, T., Terzopoulos, D., 1996. Deformable models in medical image analysis: a survey. Med. Image Anal. 1 (2), 91 – 108. Miller, M.I., Christensen, G.E., Amit, Y., Grenander, U., 1993. Mathematical textbook of deformable neuroanatomies. Proc. Natl. Acad. Sci. U.S.A. 90 (24), 11944 – 11948. Mobbs, P.G., 1985. Brain structure. In: Kerkut, G.A., Gilbert, L.I. (Eds.), Comprehensive Insect Physiology Biochemistry and Pharmacology. Nervous System: Structure and Motor Function, vol. 5. Pergamon, Oxford, pp. 299 – 370. Rao, A., Sanchez-Ortiz, G.I., Chandrashekara, R., Lorenzo-Valdes, M., Mohiaddin, R., Rueckert, D., 2003. Construction of a cardiac motion atlas from MR using non-rigid registration. In: Magnin, I.E., Montagnat, J., Clarysse, P., Nenonen, J., Katila, T. (Eds.), Functional Imaging and Modeling of the Heart—Second International Workshop, FIMH 2003, Lyon, France, June 5 – 6, 2003, Proceedings. Lecture Notes in Computer Science, vol. 2674. Springer-Verlag, Heidelberg, pp. 141 – 150. Reichmuth, C., Becker, S., Benz, M., Reisch, D., Heimbeck, G., Hofbauer, A., Klagges, B.R.E., Pflugfelder, G.O., Buchner, E., 1995. The sap47 gene of Drosophila melanogaster codes for a novel conserved neuronal protein associated with synaptic terminals. Mol. Brain Res. 32, 45 – 54. Rohlfing, T., Maurer Jr., C.R., 2001. Intensity-based non-rigid registration using adaptive multilevel free-form deformation with an incompressibility constraint. In: Niessen, W., Viergever, M.A. (Eds.), Proceedings of Fourth International Conference on Medical Image Computing and 1442 T. Rohlfing et al. / NeuroImage 21 (2004) 1428–1442 Computer-Assisted Intervention (MICCAI 2001). Lecture Notes in Computer Science, vol. 2208. Springer-Verlag, Berlin, pp. 111 – 119. Rohlfing, T., Maurer Jr., C.R., 2003. Non-rigid image registration in shared-memory multiprocessor environments with application to brains, breasts, and bees. IEEE Trans. Inf. Technol. Biomed. 7 (1), 16 – 25. Rohlfing, T., Brandt, R., Maurer Jr., C.R., Menzel, R., 2001. Bee brains, Bsplines and computational democracy: generating an average shape atlas. In: Staib, L. (Ed.), IEEE Workshop on Mathematical Methods in Biomedical Image Analysis. IEEE Computer Society, Los Alamitos, CA, pp. 187 – 194. Kauai, HI. Rohlfing, T., Maurer Jr., C.R., Bluemke, D.A., Jacobs, M.A., 2003a. Volume-preserving nonrigid registration of MR breast images using freeform deformation with an incompressibility constraint. IEEE Trans. Med. Imag. 22 (6), 730 – 741. Rohlfing, T., Russakoff, D.B., Maurer Jr., C.R., 2003b. Expectation maximization strategies for multi-atlas multi-label segmentation. In: Taylor, C., Noble, J.A. (Eds.), Information Processing in Medical Imaging. 18th International Conference, IPMI 2003, Ambleside, UK, July 2003. Lecture Notes in Computer Science, vol. 2732. Springer-Verlag, Berlin, Heidelberg, pp. 210 – 221. Rueckert, D., Sonoda, L.I., Hayes, C., Hill, D.L.G., Leach, M.O., Hawkes, D.J., 1999. Nonrigid registration using free-form deformations: application to breast MR images. IEEE Trans. Med. Imag. 18 (8), 712 – 721. Studholme, C., Hill, D.L.G., Hawkes, D.J., 1997. Automated three-dimen- sional registration of magnetic resonance and positron emission tomography brain images by multiresolution optimization of voxel similarity measures. Med. Phys. 24 (1), 25 – 35. Studholme, C., Hill, D.L.G., Hawkes, D.J., 1999. An overlap invariant entropy measure of 3D medical image alignment. Pattern Recogn. 32 (1), 71 – 86. Thompson, P.M., Toga, A.W., 1997. Detection, visualization and animation of abnormal anatomic structure with a deformable probabilistic brain atlas based on random vector field transformations. Med. Image Anal. 1 (4), 271 – 294. Wahba, G., 1990. Spline models for observational data. CBMS-NSF Regional Conference Series, vol. 59. Society for Industrial and Applied Mathematics, Philadelphia, PA. Warfield, S.K., Zou, K.H., Wells, W.M., 2002. Validation of image segmentation and expert quality with an expectation-maximization algorithm. In: Dohi, T., Kikinis, R. (Eds.), Proceedings of Fifth International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Part I. Lecture Notes in Computer Science, vol. 2488. Springer-Verlag, Berlin, pp. 298 – 306. Xu, L., Krzyzak, A., Suen, C.Y., 1992. Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans. Syst. Man Cybern. 22 (3), 418 – 435. Zijdenbos, A.P., Dawant, B.M., Margolin, R.A., Palmer, A.C., 1994. Morphometric analysis of white matter lesions in MR images: method and validation. IEEE Trans. Med. Imag. 13 (4), 716 – 724.