Color and Geometrical Structure in Images
Transcription
Color and Geometrical Structure in Images
Color and Geometrical Structure in Images Applications in microscopy Jan-Mark Geusebroek This book was typeset by the author using LATEX 2ε . Cover: Victory Boogie Woogie, by Piet Mondriaan, 1942–1944, oil-painting with pieces of plastic and paper. Reproduction and permission for printing kindly provided by Gemeentemuseum Den Haag. c Copyright °2000 by Jan-Mark Geusebroek. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission from the author (geusebroek@science.uva.nl). ISBN 90-5776-057-6 Color and Geometrical Structure in Images Applications in microscopy ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Universiteit van Amsterdam, op gezag van de Rector Magnificus prof. dr J. J. M. Franse ten overstaan van een door het College voor Promoties ingestelde commissie, in het openbaar te verdedigen in de Aula der Universiteit op donderdag 23 november 2000 te 12.00 uur door Jan-Mark Geusebroek geboren te Amsterdam Promotiecommissie: Prof. dr ir A. W. M. Smeulders Dr H. Geerts Prof. dr J. J. Koenderink Prof. dr G. D. Finlayson Prof. dr ir L. van Vliet Prof. dr ir C. A. Grimbergen Prof. dr ir F. C. A. Groen Prof. dr P. van Emde Boas Faculteit: Natuurwetenschappen, Wiskunde & Informatica Kruislaan 403 1098 SJ Amsterdam Nederland The investigations described in this thesis were carried out at the Janssen Research Foundation, Beerse, Belgium. The study was supported by the Janssen Research Foundation. Advanced School for Computing and Imaging The work described in this thesis has been carried out at the Intelligent Sensory Information Systems group. This work was carried out in graduate school ASCI. ASCI dissertation series number 54. Contents 1 Introduction 1.1 Part I: Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Part II: Geometrical Structure . . . . . . . . . . . . . . . . . . . . . . 2 Color and Scale 2.1 Color and Observation Scale . . . . . . . . . 2.1.1 The Spectral Structure of Color . . . 2.1.2 The Spatial Structure of Color . . . 2.2 Colorimetric Analysis of the Gaussian Color 2.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 4 13 14 14 16 17 19 3 A Physical Basis for Color Constancy 3.1 Color Image Formation Model . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Color Formation for Reflection of Light . . . . . . . . . . . . . 3.1.2 Color Formation for Transmission of Light . . . . . . . . . . . . 3.1.3 Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Illumination Invariant Properties of Object Reflectance or Transmittance 3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Small-Band Experiment . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Broad-Band Experiment . . . . . . . . . . . . . . . . . . . . . . 3.3.4 Colorimetric Experiment . . . . . . . . . . . . . . . . . . . . . . 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 25 25 27 29 30 32 32 35 36 36 38 4 Measurement of Color Invariants 4.1 Color Image Formation Model . . . . . . . . . . . . . 4.2 Determination of Color Invariants . . . . . . . . . . . 4.2.1 Invariants for White but Uneven Illumination 4.2.2 Invariants for White but Uneven Illumination Surfaces . . . . . . . . . . . . . . . . . . . . . 43 45 46 46 i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . and Matte, Dull . . . . . . . . . . 48 ii CONTENTS 4.2.3 4.3 4.4 Invariants for White, Uniform Illumination and Matte, Dull Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.4 Invariants for Colored but Uneven Illumination . . . . . . . . . 4.2.5 Invariants for a Uniform Object . . . . . . . . . . . . . . . . . . 4.2.6 Summary of Color Invariants . . . . . . . . . . . . . . . . . . . 4.2.7 Geometrical Color Invariants in Two Dimensions . . . . . . . . Measurement of Color Invariants . . . . . . . . . . . . . . . . . . . . . 4.3.1 Measurement of Geometrical Color Invariants . . . . . . . . . . 4.3.2 Discriminative Power for RGB Recording . . . . . . . . . . . . 4.3.3 Evaluation of Scene Geometry Invariance . . . . . . . . . . . . 4.3.4 Localization Accuracy for the Geometrical Color Invariants . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Robust Autofocusing in Microscopy 5.1 Material and Methods . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 The Focus Score . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Measurement of the Focus Curve . . . . . . . . . . . . . . 5.1.3 Sampling the Focus Curve . . . . . . . . . . . . . . . . . . 5.1.4 Large, Flat Preparations . . . . . . . . . . . . . . . . . . . 5.1.5 Preparation and Image Acquisition . . . . . . . . . . . . . 5.1.6 Evaluation of Performance for High NA . . . . . . . . . . 5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Autofocus Performance Evaluation . . . . . . . . . . . . . 5.2.2 Evaluation of Performance for High NA . . . . . . . . . . 5.2.3 Comparison of Performance with Small Derivative Filters 5.2.4 General Observations . . . . . . . . . . . . . . . . . . . . 5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 51 52 53 54 55 56 61 63 64 66 . . . . . . . . . . . . . 73 74 74 75 77 77 78 81 82 82 83 85 85 86 6 Segmentation of Tissue Architecture by Distance Graph Matching 6.1 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Hippocampal Tissue Preparation . . . . . . . . . . . . . . . . . 6.1.2 Image Acquisition and Software . . . . . . . . . . . . . . . . . . 6.1.3 K-Nearest Neighbor Graph . . . . . . . . . . . . . . . . . . . . 6.1.4 Distance Graph Matching . . . . . . . . . . . . . . . . . . . . . 6.1.5 Distance Graph Comparison . . . . . . . . . . . . . . . . . . . 6.1.6 Cost Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.7 Evaluation of Robustness on Simulated Point Patterns . . . . . 6.1.8 Algorithm Robustness Evaluation . . . . . . . . . . . . . . . . . 6.1.9 Robustness for Scale Measure . . . . . . . . . . . . . . . . . . . 6.1.10 Cell Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.11 Hippocampal CA Region Segmentation . . . . . . . . . . . . . 91 93 93 93 94 94 96 97 98 99 100 100 100 . . . . . . . . . . . . . . . . . . . . . . . . . . iii CONTENTS 6.2 . . . . . . . . . . . . . . . . . . 101 101 105 105 107 109 Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 116 116 116 118 119 120 121 122 122 125 125 125 125 126 127 8 Discussion 8.1 Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Geometrical Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 General Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 137 139 140 Samenvatting 143 6.3 6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Algorithm robustness evaluation . . . . . . . . . . . . . . 6.2.2 Robustness for Scale Measure . . . . . . . . . . . . . . . . 6.2.3 Hippocampal CA Region Segmentation . . . . . . . . . . Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix: Dynamic Programming Solution for String Matching 7 A Minimum Cost Approach for Segmenting Networks 7.1 Network Extraction Algorithm . . . . . . . . . . . . . . 7.1.1 Vertex Detection . . . . . . . . . . . . . . . . . . 7.1.2 Line Point Detection . . . . . . . . . . . . . . . . 7.1.3 Line Tracing . . . . . . . . . . . . . . . . . . . . 7.1.4 Graph Extraction . . . . . . . . . . . . . . . . . . 7.1.5 Edge Saliency and Basin Coverage . . . . . . . . 7.1.6 Thresholding the Saliency Hierarchy . . . . . . . 7.1.7 Overview . . . . . . . . . . . . . . . . . . . . . . 7.1.8 Error Analysis . . . . . . . . . . . . . . . . . . . 7.2 Illustrations . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Heart Tissue Segmentation . . . . . . . . . . . . 7.2.2 Neurite Tracing . . . . . . . . . . . . . . . . . . . 7.2.3 Crack Detection . . . . . . . . . . . . . . . . . . 7.2.4 Directional Line Detection . . . . . . . . . . . . . 7.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . of . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 1 Introduction When looking at Victory Boogie Woogie, by the Dutch painter Piet Mondrian, the yellow blocks appear jumpy and unstable, as if they move [33]. As the painting hangs firmly fixed to the wall, the visual effect results from within the brain as it processes the incoming visual information. In fact, a visual scene which enters the brain fed into three subsystems [24, 34]. One subsystem segments the scene in parts by the apparent color contrast. The subsystem gives the ability to see the various colored patches as different entities. A second subsystem provides us with the color of the parts. The subsystem is used for identifying the patches based on their color. The third subsystem localizes objects in the world. It tells us where the patches are in the scene. In contrast, the latter system is color blind, judging the scene on intensity variations only. Cooperation between the first subsystem, segmenting the different colored parts, and the latter subsystem, localizing the different patches, results in ambiguity when the intensity of neighboring color patches is similar. The phenomenon is in effect in Victory Boogie Woogie by the yellow stripes on a white background, as described by Livingstone [33]. Apart from the color appearance of the blocks, Mondrian arranged blocks to form a pattern of perpendicular lines. The visual arrangement is sifted out by the third, monochromatic subsystem which extracts the spatial organization of the scene. The lines are effectuated by an intensity contrast with the background. The yellow stripes have no such contrast, but lines appear as the gaps are supplemented by the brain. In Victory Boogie Woogie, Mondrian combined local color contrast and the geometrical arrangement of details to stimulate a visual sensation in the brain. Like Victory Boogie Woogie, this thesis deals with both color and spatial structure. Part I describes the spatial interaction between colors. Color is discussed in its physical environment of light. Consequently, the physics of light reflection are included in the human subsystem dealing with shape extraction. Part II describes the quantification of geometrical structure specifically applied to microscopy, although some 1 2 Introduction of the concepts may have a broader application span. Tissue at the microscopical level often exhibits a regular pattern. Automatic extraction of such arrangements is considered, aiming at drug screening for pharmaceutical research. The two parts are mostly separated from one another, as is the case for perception. Using the parts in future research in conjunction may have synergy on color image processing. 1.1 Part I: Color Color seems to be an unalienable property of objects. It is the orange that has that color. However, the heart of the matter is quite different. Human perception actively assigns colors to an observed scene. There is a discrepancy between the physics of light, and color as signified by the brain. One undeniable fact is that color perception is bootstrapped by a physical cause: it results from light falling onto the eye. Objects in the world respond to daylight by reflecting differently part of the incoming light spectrum. The specific component of reflection mainly instantiates the color appearance of the object. Another fact is that color perception results from experience. We assign the color of an orange that label as we have learned by experience, as we are capable to do so by the biological mechanism. Experience has led to the denominations of signs to colors. It would have given language no advantage to label colors when we could not compare them with memory. A last contribution to color as we know it is evolution that has shaped the actual mechanism of color vision. Evolution, such that a species adapts to its environment, has driven the use of color by perception. Color is one of the main cues for segmenting objects in a scene. The difference in color of the green leaves that obscure oranges allows for easy detection of fruit. Color has a high identification power for an object. Orange things in a tree are clearly no lemons, although the shape is similar. Color in combination with shading provides a clue for depth perception, hence geometry, of an object. For monochromatic vision, such clues are highly ambiguous. Hence, color perception gives primates advantage over monochromatic species. In terms of physics, daylight is reflected by an object and reaches the eye. It is the reflectance ratio over the wavelengths of radiant energy that is an object property, hence the reflection function for an orange indeed is a physical characteristic of the fruit. However, the amount of radiant energy falling onto the retina depends on both the reflectance function and the light source illuminating the object. Still, we observe an orange to be orange in sunlight, by candlelight, independent of shadow, frontal illumination, or oblique illumination. All these variables influence the energy distribution as it enters the eye, the variability being imposed by the physical laws of light reflection. Human color vision has adapted to include these physical laws, due to which we neglect the scene induced variations. Observation of color by the human visual system proceeds by absorption of light 1.1. Part I: Color 3 by three different receptors. The pigment sensitivities extent over a broad range of wavelengths, but they can be characterized by the spectral region for which sensitivity is maximum. The maximum absorption of the pigments is at short, middle, and long wavelengths, for which reason the receptors are named blue, green, and red cones. Before transmission of the image to the brain, the output of the receptors are combined in two ways. First, the output of the three cone types at each position on the retina are combined to represent three opponent color axes. One axis describes intensity, the black to white colors. A second axis yields yellow to blue colors, and a third axis results in red to green colors. The combinations are known as opponent colors, first described by Hering [22]. A second combination yields the comparison of neighboring opponent color responses. Such a spatial comparison is performed within circular areas, called receptive fields. The opponent colors are spatially compared, yielding black–white, yellow–blue, and red–green receptive fields. Receptive fields are found in primates at different sizes, and for different opponent pathways [4, 5, 10, 36, 57]. The existence of receptive fields implies that color is a spatial property. Hence color perception is the result of contrast between opponent spectral responses. In computer vision, as opposed to machine vision [48], one would like to mimic human color perception to analyze and interpret color images. For example, in biological and medical research, tissue sections are evaluated by light microscopy. These sections are stained with standard methods, often over one-hundred years old, and especially designed to be discriminated by the human eye. Hence, segmentation of classically stained preparations can best be based on human color perception. However, the understanding of the human visual system has not yet reached the level to describe the world with the mathematical precision of computer vision [53]. A color image may be defined as a coherent measurement of the spatio-spectral energy density reflected by a scene. Hence, a color image is the result of the interaction between light source and scene, observed by a measurement probe. From a computer vision perspective, this definition raises two fundamental problems. First, how to combine spectral measurements and spatial structure? Common practise in color image processing is to use the color information without considering the spatial coherence [11, 14, 16, 17, 37, 42, 45]. Some early attempts to include spatial scale into color observation is the work by Land [31], and the work of Di Zenzo [58] and Cumani [3]. Application of color receptive fields in computer vision include [15, 19, 40, 50, 55]. Although these methods intuitively capture the structure of color images, no foundation for color observation is available. A solid physical basis for combining color information with spatial resolution would solve a fundamental problem of how to probe the spatio-spectral influx of information bearing energy. A second fundamental question is how to integrate the physical laws of light reflection into color measurement? Modeling the physical process of color image formation provides a clue to the object-specific parameters [6, 25, 28, 29, 30, 39, 43, 46, 56, 59]. The question boils down to deriving the invariant properties of color vision, [1, 9, 13, 4 Introduction 16, 17, 20, 46]. With invariance we mean a property f of object t which receives value f (t) regardless unwanted conditions W in the appearance of t [47]. For human color vision, the group of disturbing conditions W 0 are categorized by shadow, highlights, light source, and scene geometry. Scene geometry is determined by the number of light sources, light source directions, viewing direction, and object shape. The invariant class W 0 is referred to as photometric invariance. For observation of images, geometric invariance is of importance [12, 18, 26, 32, 49]. The group of spatial disturbing conditions is given by translation, rotation, and observation scale. Since the human eye projects the three-dimensional world onto a two-dimensional image, the group may be extended with projection invariance. Both photometric and geometric invariance are required for a color vision system to reduce the complexity intrinsic to color images. In this thesis, these two fundamental questions are considered, aiming at robust measurement of color invariants. Here, color invariance represents the combined photometric and geometric invariance class. The aim is to describe the local structure of color images in a systematic, irreducible, and complete sense. The problem is approached from a measurement theoretic viewpoint, by using aperture functions as given by linear scale-space theory. Robust color measurement is achieved by selecting the appropriate scale for the aperture function. Conventional scale-space theory observes the world without imposing a priori information. As a result, the spatial operators defined in scale-space theory are translation, rotation, and scale invariant. More importantly, classical scale-space apertures introduce no spurious details due to the measurement process [26, 27]. In Chapter 2, we use the general scale-space assumptions to formulate a theory of color measurement [7]. However, our visual system is the result of evolution. When concerned with color, evolution is guided by the physical laws of light reflection, imposing the effects of shadows, shading, and highlights [29, 30, 56]. Hence, human color perception is constrained by the physical laws of light. Chapter 3 describes the physics of color image formation, and makes a connection between color invariance derived from physics and color constancy as characteristic for human color perception. In Chapter 4 the physical laws for color image formation are exploited to derive a complete, irreducible system of color invariants. 1.2 Part II: Geometrical Structure The second part of this thesis is concerned with the extraction of geometrical arrangement of local structure. The processes of cell differentiation and apoptosis in growing tissue result in the clustering of cells forming functional parts [2, 8, 23]. Often, these functional parts exhibit a regular structure, which is the result of cell division and specialization. The minimization of occupied space, a natural constraint imposed by 1.2. Part II: Geometrical Structure 5 gravity [51], yields dense packing of cells into somewhat regular arrays and henceforth they lead to regularly shaped cell patterns. The geometrical arrangement of structures in tissues may reveal differences between physiological and pathological conditions. Classical light microscopy is often used to observe tissue structure. The tissue of interest is cut into the necessary thin slices to observe the structures by transmission of light. Contrast is added to the slices by staining procedures, resulting in the highlighting of structures against a uniform background. The chemical state of cells is quantified by color analysis after staining procedures. Tissue architecture is analyzed by the spatial arrangement of cells, neurites, bloodvessels, fibers, and other cellular objects. The regularity of cell aggregates in tissues does not imply that the quintessence of the arrangement can be captured in an algorithm. Biological variety causes clusters to be irregular. Observation by light microscopy demands the extraction of sliced samples from the tissue. The deformation caused by cutting the three-dimensional structure into two-dimensional transections again results in spatial distortion of cluster regularity. These distortions impose high demands on the robustness for the algorithm. We consider the fundamental problem of geometric structure: how to capture the arrangement of local structures? For example, a tissue may be considered, at a very naive level, as a cluster of cells. Hence, a cell may be considered a local marker, whereas the arrangement of cells is characteristic for the tissue. Such arrangements impose a grammar of local structures. Graph morphology is the basic tool to describe these grammars [21, 35, 38, 41, 44, 52, 54]. Chapter 6 describes the extraction of architectures by example structures. For a regular architecture, a small sample of the geometric arrangement captures the essential information for automatic extraction. This fact is exploited in the segmentation of tissue architecture for histological preparations. The method is validated by comparing algorithm performance with the performance of an expert. Chapter 7 presents an algorithm for the extraction of line networks from local image structure. A network is given by knots and their interconnections. The extraction of knots and line points yields a localized description of the network. A graph-based method is applied to the extraction of cardiac myocytes from heart muscle sections. To derive tissue architecture related parameter, as described in Chapter 6 and Chapter 7, the tissue need to be digitized into the computer. For tissue sections, often large compared to the microscope field of view, the automatic acquisition involves a scanning process. During scanning, the microscope need to be focused when tissue surface is not planar, as is often the case. Since sufficiently accurate methods are not available for autofocusing, the second part of this thesis starts with Chapter 5 describing a robust method for focusing preparations in scanning light microscopy. 6 Introduction Bibliography [1] E. Angelopoulou, S. Lee, and R. Bajcsy. Spectral gradient: A material descriptor invariant to geometry and incident illumination. In Proceedings of the Seventh IEEE International Conference on Computer Vision, pages 861–867. IEEE Computer Society, 1999. [2] R. Chandebois. Cell sociology: A way of reconsidering the current concepts of morphogenesis. Acta Bioth., 25:71–102, 1976. [3] A. Cumani. Edge detection in multispectral images. CVGIP: Graphical Models and Image Processing, 53(1):40–51, 1991. [4] D. M. Dacey and B. B. Lee. The “blue-on” opponent pathway in primate retina originates from a distinct bistratified ganglion cell type. Nature, 367:731–735, 1994. [5] D. M. Dacey, B. B. Lee, D. K. Stafford, J. Pokorney, and V. C. Smith. Horizontal cells of the primate retina: Cone specificity without spectral opponency. Science, 271:656–659, 1996. [6] K. J. Dana, B. van Ginneken, S. K. Nayar, and J. J. Koenderink. Reflectance and texture of real world surfaces. ACM Trans Graphics, 18:1–34, 1999. [7] A. Dev and R. van den Boomgaard. Color and scale: The spatial structure of color images. Technical report, ISIS institute, Department of Computer Science, University of Amsterdam, Amsterdam, The Netherlands, 1999. [8] K. J. Dormer. Fundamental Tissue Geometry for Biologists. Cambridge Univ. Press, London, 1980. [9] M. D’Zmura and P. Lennie. Mechanisms of color constancy. J. Opt. Soc. Am. A, 3(10):1662–1672, 1986. [10] S. Engel, X. Zhang, and B. Wandell. Colour tuning in human visual cortex measured with functional magnetic resonance imaging. Nature, 388:68–71, 1997. [11] G. D. Finlayson. Color in perspective. IEEE Trans. Pattern Anal. Machine Intell., 18(10):1034–1038, 1996. [12] L. Florack. Image Structure. Kluwer Academic Publishers, Dordrecht, 1997. [13] D. H. Foster and S. M. C. Nascimento. Relational colour constancy from invariant cone-excitation ratios. Proc. R. Soc. London B, 257:115–121, 1994. [14] B. V. Funt and G. D. Finlayson. Color constant color indexing. IEEE Trans. Pattern Anal. Machine Intell., 17(5):522–529, 1995. Bibliography 7 [15] C. Garbay, G. Brugal, and C. Choquet. Application of colored image analysis to bone marrow cell recognition. Analyt. Quantit. Cytol., 3:272–280, 1981. [16] R. Gershon, D. Jepson, and J. K. Tsotsos. Ambient illumination and the determination of material changes. J. Opt. Soc. Am. A, 3:1700–1707, 1986. [17] T. Gevers and A. W. M. Smeulders. Color based object recognition. Pat. Rec., 32:453–464, 1999. [18] L. J. Van Gool, T. Moons, E. J. Pauwels, and A. Oosterlinck. Vision and Lie’s approach to invariance. Image Vision Comput., 13(4):259–277, 1995. [19] D. Hall, V. Colin de Verdière, and J. L. Crowley. Object recognition using coloured receptive fields. In Proceedings Sixth Europian Conference on Computer Vision (ECCV), volume 1, pages 164–177, LNCS 1842. Springer-Verlag, 26th June-1st July, 2000. [20] G. Healey and A. Jain. Retrieving multispectral satellite images using physicsbased invariant representations. IEEE Trans. Pattern Anal. Machine Intell., 18:842–848, 1996. [21] H. J. A. M. Heijmans, P. Nacken, A. Toet, and L. Vincent. Graph morphology. J. Visual Communication Image Representation, 3:24–38, 1992. [22] E. Hering. Outlines of a Theory of the Light Sense. Harvard University Press, Cambridge, MS, 1964. [23] H. Honda. Geometrical models for cells in tissues. Int. Rev. Cytol., 81:191–248, 1983. [24] D. H. Hubel. Eye, Brain, and Vision. Scientific American Library, New York, NY, 1988. [25] D. B. Judd and G. Wyszecki. Color in Business, Science, and Industry. Wiley, New York, NY, 1975. [26] J. J. Koenderink. The structure of images. Biol. Cybern., 50:363–370, 1984. [27] J. J. Koenderink and A. J. van Doorn. Receptive field families. Biol. Cybern., 63:291–297, 1990. [28] J. J. Koenderink and A. J. van Doorn. Illuminance texture due to surface mesostructure. J. Opt. Soc. Am. A, 13:452–463, 1996. [29] P. Kubelka. New contribution to the optics of intensely light-scattering materials. part I. J. Opt. Soc. Am., 38(5):448–457, 1948. 8 Introduction [30] P. Kubelka and F. Munk. Ein beitrag zur optik der farbanstriche. Z. Techn. Physik, 12:593, 1931. [31] E. H. Land. The retinex theory of color vision. Sci. Am., 237:108–128, 1977. [32] T. Lindeberg. Scale-Space Theory in Computer Vision. Kluwer Academic Publishers, Boston, 1994. [33] M. Livingstone. Art, illusion and the visual system. Sci. Am., 258:78–85, 1988. [34] M. Livingstone and D. Hubel. Segregation of form, color, movement, and depth: Anatomy, physiology, and perception. Science, 240:740–749, 1988. [35] R. Marcelpoil and Y. Usson. Methods for the study of cellular sociology: Voronoı̈ diagrams and parametrization of the spatial relationships. J. Theor. Biol., 154:359–369, 1992. [36] R. H. Masland. Unscrambling color vision. Science, 271:616–617, 1996. [37] B. A. Maxwell and S. A. Shafer. Physics-based segmentation of complex objects using multiple hypotheses of image formation. Comput. Vision Image Understanding, 65(2):269–295, 1997. [38] F. Meyer. Skeleton and perceptual graphs. Signal Processing, 16:335–363, 1989. [39] K. D. Mielenz, K. L. Eckerle, R. P. Madden, and J. Reader. New reference spectrophotometer. Appl. Optics, 12(7):1630–1641, 1973. [40] M. Mirmehdi and M. Petrou. Segmentation of color textures. IEEE Trans. Pattern Anal. Machine Intel., 22(2):142–159, 2000. [41] A. Mojsilović, J. Kovačević, J. Hu, R. J. Safranek, and S. K. Ganapathy. Matching and retrieval based on the vocabulary and grammar of color patterns. IEEE Trans. Image Processing, 9(1):38–54, 2000. [42] S. K. Nayar and R. M. Bolle. Computing reflectance ratios from an image. Pat. Rec., 26:1529–1542, 1993. [43] M. Oren and S. K. Nayar. Generalization of the Lambertian model and implications for machine vision. Int. J. Computer Vision, 14:227–251, 1995. [44] J. Palmari, C. Dussert, Y. Berthois, C. Penel, and P. M. Martin. Distribution of estrogen receptor heterogeneity in growing MCF–7 cells measured by quantitative microscopy. Cytometry, 27:26–35, 1997. [45] G. Sapiro. Color and illuminant voting. IEEE Trans. Pattern Anal. Machine Intel., 21(11):1210–1215, 1999. Bibliography 9 [46] S. A. Shafer. Using color to separate reflection components. Color Res. Appl., 10(4):210–218, 1985. [47] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content based image retrieval at the end of the early years. submitted to IEEE Trans. Pattern Anal. Machine Intell. [48] H. M. G. Stokman. Robust Photometric Invariance in Machine Color Vision. PhD thesis, University of Amsterdam, Amsterdam, The Netherlands, 2000. [49] B. M. ter Haar Romeny, editor. Geometry-Driven Diffusion in Computer Vision. Kluwer Academic Publishers, Boston, 1994. [50] B. Thai and G. Healey. Modeling and classifying symmetries using a multiscale opponent color representation. IEEE Trans. Pattern Anal. Machine Intell., 20(11):1224–1235, 1998. [51] D. W. Thompson. On Growth and Form. Cambridge University Press, London, England, 1971. [52] A. Toet. Hierarchical clustering through morphological graph transformation. Pat. Rec. Let., 12:391–399, 1991. [53] D. Travis. Effective Color Displays, Theory and Practice. Academic Press, 1991. [54] L. Vincent. Graphs and mathematical morphology. Signal Processing, 16:365– 388, 1989. [55] S. G. Wolf, R. Ginosar, and Y. Y. Zeevi. Spatio-chromatic model for colour image processing. In Proceedings 12th IAPR International Conference on Pattern Recognition, volume 1, pages 599–601. IEEE, October 9–13, 1994. [56] G. Wyszecki and W. S. Stiles. Color Science: Concepts and Methods, Quantitative Data and Formulae. Wiley, New York, NY, 1982. [57] R. A. Young. The Gaussian derivative theory of spatial vision: Analysis of cortical cell receptive field line-weighting profiles. Technical Report GMR-4920, General Motors Research Center, Warren, MI, 1985. [58] S. Di Zenzo. A note on the gradient of a multi-image. Comput. Vision Graphics Image Processing, 33:116–125, 1986. [59] R. Zhou, E. H. Hammond, and D. L. Parker. A multiple wavelength algorithm in color image analysis and its applications in stain decomposition in microscopy images. Med. Phys., 23(12):1977–1986, 1996. Part I Color Chapter 2 Color and Scale The Spatial Structure of Color Images appeared in the proceedings of the sixth Europian Conference on Computer Vision, vol. 1, pp. 331–341, 2000. “Lightness and color are field phenomena, not point phenomena.” – –Edwin H. Land. There has been a recent revival in the analysis of color in computer vision. This is mainly due to the common knowledge that more visual information leads to easier interpretation of the visual scene. A color image is easier to segment than a greyvalued image since some edges are only visible in the color domain and will not be detected in the grey-valued image. An area of large interest is searching for particular objects in images and image-databases, for which color is a feature with high reach in its data-values and hence high potential for discriminability. Color can thus be seen as an additional cue in image interpretation. Moreover, color can be used to extract object reflectance robust for a change in imaging conditions [4, 5, 14, 15]. Therefore color features are well suited for the description of an object. Colors are only defined in terms of human observation. Modern analysis of color has started in colorimetry where the spectral content of tri-chromatic stimuli are matched by a human, resulting in the well-known XYZ color matching functions [17]. However, from the pioneering work of Land [13] we know that a perceived color does not directly correspond to the spectral content of the stimulus; there is no one-to-one mapping of spectral content to perceived color. For example, a colorimetry purist will not consider brown to be a color, but as computer vision practisers would like to be able to define brown in an image when searching on colors. Hence, it is not only the spectral energy distribution coding color information, but also the spatial 13 14 Chapter 2. Color and Scale configuration of colors. We aim at a physical basis for the local interpretation of color images. Common image processing sense tells us that the grey-value of a particular pixel is not a meaningful entity. The value 42 by itself tells us little about the meaning of the pixel in its environment. It is the local spatial structure of an image that has a close geometrical interpretation [10]. Yet representing the spatial structure of a color image is an unsolved problem. The theory of scale-space [10, 16] adheres to the fact that observation and scale are intervened; a measurement is performed at a certain resolution. Differentiation is one of the fundamental operations in image processing, and one which is nicely defined [3] in the context of scale-space. In this chapter we discuss how to represent color as a scalar field embedded in a scale-space paradigm. As a consequence, the differential geometry framework is extended to the domain of color images. We demonstrate color invariant edge detectors which are robust to shadow and highlight boundaries. The chapter is organized as follows. Section 2.1 considers the embedding of color in the scale-space paradigm. In section 2.2 we derive estimators for the parameters in the scale-space model, and give optimal values for these parameters. The resulting sensitivity curves are colorimetrical compared with human color vision. 2.1 Color and Observation Scale A spatio-spectral energy distribution is only measurable at a certain spatial resolution and a certain spectral bandwidth. Hence, physical realizable measurements inherently imply integration over spectral and spatial dimensions. The integration reduces the infinitely dimensional Hilbert space of spectra at infinitesimally small spatial neighborhood to a limited amount of measurements. As suggested by Koenderink [11], general aperture functions, or Gaussians and its derivatives, may be used to probe the spatio-spectral energy distribution. We emphasize that no essentially new color model is proposed here, but rather a theory of color measurement. The specific choice of color representation is irrelevant for our purpose. For convenience we first concentrate on the spectral dimension, later on we show the extension to the spatial domain. 2.1.1 The Spectral Structure of Color From scale space theory we know how to probe a function at a certain scale; the probe should have a Gaussian shape in order to prevent the creation of extra details into the function when observed at a higher scale (lower resolution) [10]. As suggested by Koenderink [11], we can probe the spectrum with a Gaussian. In this section, we consider the Gaussian as a general probe for the measurement of spatio-spectral 15 2.1. Color and Observation Scale differential quotients. No essentially new color model is proposed, but rather a theory of color measurement. Formally, let E(λ) be the energy distribution of the incident light, where λ denotes wavelength, and let G(λ0 ; σλ ) be the Gaussian at spectral scale σλ positioned at λ0 . The spectral energy distribution may be approximated by a Taylor expansion at λ 0 , 1 λ0 + ... . E(λ) = E λ0 + λEλλ0 + λ2 Eλλ 2 (2.1) Measurement of the spectral energy distribution with a Gaussian aperture yields a weighted integration over the spectrum. The observed energy in the Gaussian color model, at infinitely small spatial resolution, approaches in second order to 1 λ0 ,σλ Ê σλ (λ) = Ê λ0 ,σλ + λÊλλ0 ,σλ + λ2 Êλλ + ... 2 (2.2) where Ê λ0 ,σλ = Z E(λ)G(λ; λ0 , σλ )dλ (2.3) E(λ)Gλ (λ; λ0 , σλ )dλ (2.4) measures the spectral intensity, Êλλ0 ,σλ = Z measures the first order spectral derivative, and λ0 ,σλ = Êλλ Z E(λ)Gλλ (λ; λ0 , σλ )dλ (2.5) measures the second order spectral derivative. Further, Gλ and Gλλ denote derivatives of the Gaussian with respect to λ. Note that, throughout the thesis, we assume scale normalized Gaussian derivatives to probe the spectral energy distribution. Definition 1 (Gaussian Color Model) The Gaussian color model measures the λ0 ,σλ , . . . of the Taylor expansion of the Gaussian coefficients Ê λ0 ,σλ , Êλλ0 ,σλ , Êλλ weighted spectral energy distribution at λ0 and scale σλ . One might be tempted to consider a higher, larger than two, order structure of the smoothed spectrum. However, the subspace spanned by the human visual system is of dimension 3, and hence higher order spectral structure cannot be observed by the human visual system. 16 Chapter 2. Color and Scale Figure 2.1: The probes for spatial color consists of probing the product of the spatial and the spectral space with a Gaussian aperture. 2.1.2 The Spatial Structure of Color Introduction of spatial extent in the Gaussian color model yields a local Taylor expansion at wavelength λ0 and position x~0 . Each measurement of a spatio-spectral energy distribution has a spatial as well as spectral resolution. The measurement is obtained by probing an energy density volume in a three-dimensional spatio-spectral space, where the size of the probe is determined by the observation scale σ λ and σx , see fig. 2.1. It is directly clear that we do not separately consider spatial scale and spectral scale, but actually probe an energy density volume in the 3d spectral-spatial space where the “size” of the volume is specified by the observation scales. We can describe the observed spatial-spectral energy density Ê(λ, ~x) of light as a Taylor series for which the coefficients are given by the energy convolved with Gaussian derivatives: #µ ¶ µ ¶T " # µ ¶T " 1 ~x Ê~x Ê~x~x Ê~xλ ~x ~x + + ... (2.6) Ê(λ, ~x) = Ê + λ λ 2 λ Êλ Êλ~x Êλλ where Ê~xi λj (λ, ~x) = E(λ, ~x) ∗ G~xi λj (λ, ~x; σλ , σx ) . (2.7) Here, G~xi λj (λ, ~x; σλ , σx ) are the spatio-spectral probes, or color receptive fields. The coefficients of the Taylor expansion of Ê(λ, ~x) represent the local image structure completely. Truncation of the Taylor expansion results in an approximate representation, optimal in least squares sense. For human vision, it is known that the Taylor expansion is spectrally truncated at second order [8]. Hence, higher order derivatives do not affect color as observed by the 2.2. Colorimetric Analysis of the Gaussian Color Model 17 human visual system. Therefore, three receptive field families should be considered; the luminance receptive fields as known from luminance scale-space [12] extended with a yellow-blue receptive field family measuring the first order spectral derivative, and a red-green receptive field family probing the second order spectral derivative. For human vision, the Taylor expansion for luminance is spatially truncated at fourth order [18]. 2.2 Colorimetric Analysis of the Gaussian Color Model The eye projects the infinitely dimensional spectral density function onto a 3d ‘color’ space. Not any 3d subspace of the Hilbert space of spectra equals the subspace that nature has chosen. Any subspace we create with an artificial color model should be reasonably close in some metrical sense to the spectral subspace spanned by the human visual system. Formally, the infinitely dimensional spectrum e is projected onto a 3d space c by c = AT e, where AT = (XY Z) represents the color matching matrix. The subspace ¡ ¢ in which c resides, is defined by the color matching functions AT . The range < AT ¡ T¢ defines what spectral distributions e can be reached from c, and the nullspace ℵ A defines which spectra e cannot be observed in c. Since any spectrum e = e< + eℵ ¡ ¢ ¡ ¢ decomposed into a part that resides in < AT and a part that resides in ℵ AT , we define Definition 2 The observable part of the spectrum equals e< = Π< e where Π< is the projection onto the range of the human color matching functions A T . Definition 3 The non-observable (or metameric black) part of the spectrum equals eℵ = Πℵ e where Πℵ is the projection onto the nullspace of the human color matching functions AT . ¡ ¢ The projection on the range < AT is given by [1] ¡ ¢−1 T ¡ ¢ Π< : AT 7→ < AT = A AT A A (2.8) ¡ ¢ ¡ ¢−1 T Πℵ : AT 7→ ℵ AT = I − A AT A A = Π⊥ < . (2.9) and the projection on the nullspace Any spectral probe B T that has the same range as AT is said to be colorimetric with AT and hence differs only in an affine transformation. An important property of the range projector Π< is that it uniquely specifies the subspace. Thus, we can rephrase the previous statement into: 18 Chapter 2. Color and Scale ¡ ¢ Proposition 4 The human color space is uniquely defined by < AT . Any color ¡ ¢ ¡ ¢ model B T is colorimetric with AT if and only if < AT = < B T . In this way we can tell if a certain color model is colorimetric with the human visual system. Naturally this is a formal definition. It is not well suited for a measurement approach where the color subspaces are measured with a given precision. A definition of the difference between subspaces is given by [7, Section 2.6.3], Proposition 5 The largest principle angle θ between color subspaces given by their color matching functions AT and B T equals ° ¡ ¢ ¡ ¢° sin θ(AT , B T ) = °< AT − < B T °2 . Up to this point we did establish expressions describing similarity between different subspaces. We are now in a position to compare the subspace of the Gaussian color model with the human visual system by using the XYZ color matching functions. Hence, parameters for the Gaussian color model may be optimized to capture a similar spectral subspace as spanned by human vision, see fig. 2.2. Let the Gaussian color matching functions be given by G(λ0 , σλ ). We have 2 degrees of freedom in positioning the subspace of the Gaussian color model; the mean λ0 and scale σλ of the Gaussian. We wish to find the optimal subspace that minimizes the largest principle angle between the subspaces, i.e.: B(λ0 , σλ ) = sin θ = [G(λ; λ0 , σλ ) Gλ (λ; λ0 , σλ ) Gλλ (λ; λ0 , σλ )] ° ´ ³° ¡ ¢ ° T ° argmin °< AT − < (B(λ0 , σλ ) )° λ0 ,σλ T 2 An approximate solution is obtained for λ0 = 520 nm and σλ = 55 nm. The corresponding angles between the principal axes of the Gaussian sensitivities and the 1931 and 1964 CIE standard observers are given in tab. 2.1. Figure 2.3 shows the different sensitivities, together with the optimal (least square) transform from the XYZ sensitivities to the Gaussian basis, given by Ê X̂ −0.48 1.2 0.28 = (2.10) 0.48 0 −0.4 Ê λ Ŷ . 1.18 −1.3 0 Êλλ Ẑ Since the transformed sensitivities are a linear (affine) transformation of the original XYZ sensitivities, the transformation is colorimetric with human vision. The transform is close to the Hering basis for color vision [8], for which the yellow-blue pathway indeed is found in the visual system of primates [2]. 19 2.3. Conclusion (a) (b) Figure 2.2: Cohen’s fundamental matrix < for the CIE 1964 standard observer (a), and for the Gaussian color model (λ0 = 520 nm, σλ = 55 nm) (b). A RGB-camera approximates the CIE 1931 XYZ basis for colorimetry by the linear transform [9] X̂ 0.62 0.11 0.19 R (2.11) 0.3 0.56 0.05 G . Ŷ = −0.01 0.03 1.11 B Ẑ The best linear transform from XYZ values to the Gaussian color model is given by (eq. 2.10). Hence, the product of (eq. 2.11) and (eq. 2.10) gives the desired implementation of the Gaussian color model in RGB terms, Ê 0.06 0.63 0.27 R (2.12) G . Êλ = 0.3 0.04 −0.35 B 0.34 −0.6 0.17 Êλλ A better approximation to the Gaussian color model may be obtained for known camera sensitivities. Figure 2.4 shows an example image and its Gaussian color model components. 2.3 Conclusion We have established the measurement of spatial color information from RGB-images, based on the Gaussian scale-space paradigm. We have shown that the formation of color images yield a spatio-spectral integration process at a certain spatial and spectral resolution. Hence, measurement of color images implies probing a three-dimensional energy density at a spatial scale σx and spectral scale σλ . The Gaussian aperture may be used to probe the spatio-spectral energy distribution. 20 Chapter 2. Color and Scale Table 2.1: Angles between the principal axes for various color systems. For determining the optimal values λ0 , σλ , the largest angle θ1 is minimized. The distance between the Gaussian sensitivities for the optimal values λ0 = 520 nm, σλ = 55 nm and the different CIE colorimetric systems is comparable. Note the difference between the CIE systems is 9.8◦ . Gauss – XYZ 1931 26◦ 21.5◦ 3◦ θ1 θ2 θ3 Gauss – XYZ 1964 23.5◦ 17.5◦ 3◦ XYZ 1931 – 1964 9.8◦ 3.9◦ 1◦ 1 2 1 0.5 1.5 0.5 0 1 0 -0.5 0.5 -0.5 -1 0 400 450 500 550 600 650 700 -1 400 (a) 450 500 550 600 650 700 (b) 400 450 500 550 600 650 700 (c) Figure 2.3: The Gaussian sensitivities at λ0 = 520 nm and σλ = 55 nm (a). The The best linear transformation from the CIE 1964 XYZ sensitivities (b) to the Gaussian bases is shown in (c). Note the correspondence between the transformed sensitivities and the Gaussian color model. (a) (b) (c) (d) Figure 2.4: The example image (a) and its color components Ê (b), Êλ (c), and Êλλ (d), respectively. Note that for the color component Êλ achromaticity is shown in grey, negative bluish values are shown in dark, and positive yellowish in light. Further, for Êλλ achromaticity is shown in grey, negative greenish in dark, and positive reddish in light. Bibliography 21 We have achieved a spatial color model, founded in physics as well as in measurement science. The parameters of the Gaussian color model have been estimated such that a similar spectral subspace as human vision is captured. The Gaussian color model solves the fundamental problem of color and scale by integrating the spatial and color information. The model measures the coefficients of the Taylor expansion of the spatio-spectral energy distribution. Hence, the Gaussian color model describes the local structure of color images. As a consequence, the differential geometry framework is extended to the domain of color images. Spatial differentiation of expressions derived from the Gaussian color model is inherently well-posed, in contrast with often ad-hoc methods for detection of hue edges and other color edge detectors. Application areas include physics-based vision [5], image database searches [6], and object tracking. Bibliography [1] J. B. Cohen and W. E. Kappauff. Color mixture and fundamental metamer: Theory, algebra, geometry, application. Am. J. Psych., 98:171–259, 1985. [2] D. M. Dacey and B. B. Lee. The “blue-on” opponent pathway in primate retina originates from a distinct bistratified ganglion cell type. Nature, 367:731–735, 1994. [3] L. M. J. Florack, B. M. ter Haar Romeny, J. J. Koenderink, and M. A. Viergever. Cartesian differential invariants in scale-space. Journal of Mathematical Imaging and Vision, 3(4):327–348, 1993. [4] R. Gershon, D. Jepson, and J. K. Tsotsos. Ambient illumination and the determination of material changes. J. Opt. Soc. Am. A, 3:1700–1707, 1986. [5] T. Gevers and A. W. M. Smeulders. Color based object recognition. Pat. Rec., 32:453–464, 1999. [6] T. Gevers and A. W. M. Smeulders. Content-based image retrieval by viewpointinvariant image indexing. Image Vision Comput., 17(7):475–488, 1999. [7] G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins Press Ltd., London, 1996. [8] E. Hering. Outlines of a Theory of the Light Sense. Harvard University Press, Cambridge, MS, 1964. [9] ITU-R Recommendation BT.709. Basic parameter values for the HDTV standard for the studio and for international programme exchange. Technical Report BT.709 [formerly CCIR Rec. 709], ITU, 1211 Geneva 20, Switzerland, 1990. 22 Chapter 2. Color and Scale [10] J. J. Koenderink. The structure of images. Biol. Cybern., 50:363–370, 1984. [11] J. J. Koenderink and A. Kappers. Color Space. Utrecht University, The Netherlands, 1998. [12] J. J. Koenderink and A. J. van Doorn. Receptive field families. Biol. Cybern., 63:291–297, 1990. [13] E. H. Land. The retinex theory of color vision. Sci. Am., 237:108–128, 1977. [14] K. D. Mielenz, K. L. Eckerle, R. P. Madden, and J. Reader. New reference spectrophotometer. Appl. Optics, 12(7):1630–1641, 1973. [15] S. A. Shafer. Using color to separate reflection components. Color Res. Appl., 10(4):210–218, 1985. [16] B. M. ter Haar Romeny, editor. Geometry-Driven Diffusion in Computer Vision. Kluwer Academic Publishers, Boston, 1994. [17] G. Wyszecki and W. S. Stiles. Color Science: Concepts and Methods, Quantitative Data and Formulae. Wiley, New York, NY, 1982. [18] R. A. Young. The Gaussian derivative theory of spatial vision: Analysis of cortical cell receptive field line-weighting profiles. Technical Report GMR-4920, General Motors Research Center, Warren, MI, 1985. Chapter 3 A Physical Basis for Color Constancy Part of this work has appeared in the proceedings of the Second International Conference on Scale-Space Theory in Computer Vision, 1999, pp. 459–464. “As organisms grew more intricate, their sense organs multiplied and became both more complex and more delicate. More messages of greater variety were received from and about the external environment. Along with that (whether as cause or effect we cannot tell), there developed an increasing complexity of the nervous system, the living instrument that interpreted and stored the data collected by the sense organs.” – –Isaac Asimov. A well known property of human vision, known as color constancy, is the ability to correct for color deviations caused by a difference in illumination. Although the effect is a long standing research topic [13, 15, 21], the mechanism involved is only partly resolved. A common approach to investigate color constant behavior is by psychophysical experiments [1, 13, 14]. Despite the exact nature of such experiments, there are intrinsic difficulties to explain the experimental results. For relatively simple experiments, the results may not explain in enough detail the mechanism underlying color constancy. For example, in [14] the same stimulus patch, either illuminated by the test illuminant, or by the reference illuminant, was presented to the left and right eye. The subject was asked to match the appearance of the color under the reference illuminant to the color under the test illuminant. As discussed by the authors, the experiment is synthetical in that the visual scene lacks a third dimensions. Although the results correspond to their predictions, they are unable to prove their theory on natural scenes, the scenes where shadow plays an important role. On the other hand, 23 24 Chapter 3. A Physical Basis for Color Constancy for complex experiments, with inherently a large amount of variables involved, the results does not describe color constancy isolated from other perceptual mechanisms. In [1], a more natural scene is used, in that objects were placed in the experimentation room. The observer judged the appearance of a test patch mounted on the far wall of the room. The observer was asked to vary the chromaticity of the test patch so that it appeared achromatic. The color constancy reported is excellent, but the experiments could not be interpreted in enough detail to explain the results. Hence, a fundamental problem in experimental colorimetry is that the complex experimental environment necessary to examine color constancy makes it hard to draw conclusions. An alternative approach to reveal the mechanisms involved in color constancy is by considering the spectral image formation. Modeling the physical process of spectral image formation provides insight into the effect of different parameters on object reflectance [2, 3, 4, 5, 6, 19]. In this chapter, we aim at a physical basis for color constancy rather than a psychophysical one. Object reflectance is well modeled by Shafer [20], based on the older Kubelka-Munk theory [11, 12]. The Kubelka-Munk theory models the reflected and transmitted spectrum of a colored layer, based on a material dependent scattering and absorption function, under the assumption that light is isotropically scattered within the material. The theory has proven to be successful for a wide variety of materials and applications [8, 22]. The theory unites spectral color formation for both reflecting materials as well as transparent materials into one photometric model. Therefore, the Kubelka-Munk theory is well suited for determining material properties from color measurements. In Chapter 4, the use of the Kubelka-Munk model is demonstrated for the measurement of object reflectance from color images, under various general assumptions regarding imaging conditions. In this chapter, we concentrate on color constant measurement of object color under both reflectance of light as well as light transmission. When considering the estimation of material properties on the basis of local measurements, differential equations constitute a natural framework to describe the physical process of image formation. A well known technique from scale-space theory [10] is the convolution of a signal with a derivative of the Gaussian kernel to obtain the derivative of the signal. The Gaussian function regularizes the underlying distribution, resulting in robustness against noise. The standard deviation σ of the Gaussian determines the observation scale. Introduction of wavelength in the scale-space paradigm, as suggested by Koenderink [9], leads to a spatio-spectral family of Gaussian aperture functions. These color receptive fields are introduced in Chapter 2 as the Gaussian color model. The Gaussian color model provides a physical basis, which is compatible with colorimetry, for the measurement of color constant object properties. The color constancy problem is often posed as retrieving the unknown illuminant from a given scene [2, 3, 14, 19]. Different from their approach, features invariant to a change in illuminant can be developed [4, 5, 6]. In this chapter, we focus on differential expressions which are robust to a change in illumination color. The per- 25 3.1. Color Image Formation Model formance of these color invariants is demonstrated by experiments on spectral data. Additionally, robustness against changes in the imaging conditions, such as camera viewpoint, illumination direction, and object geometry is achieved, as demonstrated in Chapter 4. The organization of the chapter is as follows. In section 3.1, color image formation is modeled by means of the Kubelka-Munk theory. Invariant differential expressions which meet the given constraints are derived from the model in section 3.2. Application of the Gaussian color model described in Chapter 2 implies measurement of the derived spatio-spectral differential invariants. Section 3.3 describes experimental setup and color constancy results for the proposed method compared to well-known methods from literature. Finally, a confrontation between physics based and perception based color constancy is given in section 3.4. 3.1 Color Image Formation Model In this section, image formation is modeled by means of the Kubelka-Munk theory [8, 22] for colorant layers. Under the assumption that light within the material is isotropically scattered, the material layer may be characterized by a wavelength dependent scatter coefficient and absorption coefficient. The class of materials for which the theory is useful ranges from dyed paper and textiles, opaque plastics, paint films, up to enamel and dental silicate cements [8]. The model may be applied to both reflecting and transparent material. 3.1.1 Color Formation for Reflection of Light Consider a homogeneously colored material patch of uniform thickness d and infinitesimal area, characterized by its absorption coefficient k(λ) and scatter coefficient s(λ). When illuminated by incident light with spectral distribution e(λ), light scattering within the material causes diffuse body reflection (fig. 3.1), while Fresnel interface reflectance occurs at the surface boundaries. When the thickness of the layer is such that further increase in thickness does not affect the reflected color, Fresnel reflectance at the back surface may be neglected. The incident light is partly reflected at the front surface, and partly enters the material, is isotropically scattered, and a part again passes the front-surface boundary. The reflected spectrum in the viewing direction ~v , ignoring secondary scattering after internal boundary reflection, is given by [8, 22]: 2 ER (λ) = e(λ) (1 − ρf (λ, ~n, ~s, ~v )) R∞ (λ) + e(λ)ρf (λ, ~n, ~s, ~v ) (3.1) where ~n is the surface patch normal and ~s the direction of the illumination source, and ρf the Fresnel front surface reflectance coefficient in the viewing direction. The 26 Chapter 3. A Physical Basis for Color Constancy Figure 3.1: Illustration of the photometric model. The object, refractive index n 2 , is illuminated by e(λ) (medium refractive index n1 ), and light is reflected and scattered in the viewing direction. body reflectance R∞ (λ) = a(λ) − b(λ) (3.2) depends on the absorption and scattering coefficient by a(λ) = 1 + k(λ) , s(λ) b(λ) = p a(λ)2 − 1 . (3.3) Simplification is obtained by considering neutral interface reflection, assuming that the Fresnel reflectance coefficient has a constant value over the spectrum. For commonly used materials, interface reflection is constant with respect to wavelength within a few percent across the visible spectrum [8, 18]. Equation (3.1) reduces to 2 ER (λ) = e(λ) (1 − ρf (~n, ~s, ~v )) R∞ (λ) + e(λ)ρf (~n, ~s, ~v ) . (3.4) The influence of the Fresnel reflectance varies from perfectly diffuse body reflectance ρf = 0, or Lambertian reflection, to total mirroring of the illuminating source (ρf = 1). Hence, the spectral color of ER is an additive mixture of the color of the light source and the perfectly diffuse body reflectance color. Because of projection of the energy distribution on the image plane, vectors ~n, ~s and ~v will depend on the position at the imaging plane. The energy of the incoming spectrum at a point ~x on the image plane is then related to 2 ER (λ, ~x) = e(λ, ~x) (1 − ρf (~x)) R∞ (λ, ~x) + e(λ, ~x)ρf (~x) (3.5) 27 3.1. Color Image Formation Model Figure 3.2: Illustration of the photometric model. The object, refractive index n 2 , is illuminated by e(λ) (medium refractive index n1 ). When the material is transparent, light is transmitted through the material, enters medium n3 , and is observed. where the spectral distribution at each point x is generated off a specific material patch. The major assumption made for the model of (eq. 3.5) is that locally planar surface patches are examined, for which the material is homogeneously colored. These constraints are imposed by the Kubelka-Munk theory, resulting in isotropic scattering of light within the material. The assumption is valid when the resolution is fine enough to consider locally uniform colored patches, whereas individual staining particles are not resolved. Further, the thickness of the layer is assumed to be such that no light reaches the other side of the material. For every day scenes, these assumptions seems to be justified. Concerning the Fresnel reflectance, the photometric model assumes a neutral interface at the surface patch. As discussed in [18, 20], deviations of ρ f over the visible spectrum are small for commonly used materials, therefore the Fresnel reflectance coefficient may be considered constant. The internally Fresnel reflected light contributes little in many cases [22], and is ignored in the model. 3.1.2 Color Formation for Transmission of Light Consider a homogeneously colored material patch of uniform thickness d and infinitesimal area, characterized by its absorption coefficient k(λ) and scatter coefficient s(λ). When illuminated by incident light with spectral distribution e(λ), absorption and scattering by the material determines its transmission color (fig. 3.2), while Fresnel interface reflectance occurs at both the front and back surface boundaries. When the layer is thin, such that the material is transparent, the transmitted spec- 28 Chapter 3. A Physical Basis for Color Constancy trum through the layer in the viewing direction ~v , ignoring the effect of interreflections between the material surfaces, is given by [8, 22]: ET (λ) = e(λ) (1 − ρf (λ, ~n, ~s, ~v )) (1 − ρb (λ, ~n, ~s, ~v )) b(λ) a(λ) sinh[b(λ)s(λ)l(~n, ~s, ~v )c] + b(λ) cosh[b(λ)s(λ)l(~n, ~s, ~v )c] (3.6) where again ~n is the material patch normal and ~s is the direction of the illumination source. Further, c is the staining concentration and l the distance traveled by the light through the material. The terms ρf and ρb denote the Fresnel front and back surface reflectance coefficient, respectively. The factors a and b depend on the absorption and scattering coefficients as given by (eq. 3.3). Simplification is obtained by considering neutral interface reflection, assuming that the Fresnel reflectance coefficients have a constant value over the spectrum. In that case, the Fresnel reflectance affects the intensity of the transmitted light only. Further, by considering a small angle of incidence at the transparent layer, the path length l(~n, ~s, ~v ) = d. Equation (3.6) reduces to ET (λ) = e(λ) (1 − ρf (~n, ~s, ~v )) (1 − ρb (~n, ~s, ~v )) b(λ) . a(λ) sinh[b(λ)s(λ)dc] + b(λ) cosh[b(λ)s(λ)dc] (3.7) Because of projection of the energy distribution on the image plane, vectors ~n, ~s and ~v will depend on the position ~x at the imaging plane, ET (λ, ~x) = e(λ, ~x)(1 − ρf (~x))(1 − ρb (~x))b(λ, ~x) a(λ, ~x) sinh[b(λ, ~x)s(λ, ~x)d(~x)c(~x)] + b(λ, ~x) cosh[b(λ, ~x)s(λ, ~x)d(~x)c(~x)] (3.8) where the spectral distribution at each point x is generated off a specific transparent patch. One of the assumptions made for the model of (eq. 3.8) is that locally planar material patches are examined, with parallel sides, for which the material is homogeneously colored. The assumption is valid when the material is non-fluorescent nor in any sense optically active, and the resolution is fine enough to consider locally uniform colored patches, while individual stain particles are not resolved. Again, these constraints are imposed by the Kubelka-Munk theory. Further, normal incidence of light at the layer is assumed, so that the optical path length through the layer approximates its thickness. In transmission light microscopy, the preparation and observation conditions fairly justify these assumptions. Concerning the Fresnel reflectance, the photometric model assumes a neutral interface at the transparent patch. As discussed in [18], deviations of ρf , ρb over the visible spectrum are small for commonly used materials. For example, the refractive index of immersion oil often used in microscopy only varies 3.3% over the visible spectrum. Therefore, the Fresnel reflectance coefficients ρf and ρb may be considered constant over the spectrum. The contribution of internally Fresnel reflected light is small in many cases [22], and is therefore ignored in the model. 29 3.1. Color Image Formation Model 3.1.3 Special Cases Thus far, we have achieved a photometric model for spectral color formation, which is applicable for both reflecting and transmitting materials, and valid under a wide variety of circumstances and materials. The following special cases can be derived. For matte, dull surfaces, the Fresnel coefficient can be considered neglectable, ρf (~x) ≈ 0, for which ER (eq. 3.5) reduces to the Lambertian model for diffuse body reflection, ER (λ, ~x) = e(λ, ~x)R∞ (λ, ~x) (3.9) as expected. 2 By introducing cb (λ) = e(λ)R∞ (λ), ci (λ) = e(λ), mb (~n, ~s, ~v ) = (1 − ρf (~n, ~s, ~v )) and mi (~n, ~s, ~v ) = ρf (~n, ~s, ~v ), (eq. 3.4) may be reformulated as ER (λ) = mb (~n, ~s, ~v )cb (λ) + mi (~n, ~s, ~v )ci (λ) (3.10) which corresponds to the dichromatic reflection model proposed by Shafer [20]. For light transmission, when the scattering coefficient is low compared to the absorption coefficient, s(λ) ¿ k(λ), ET (eq. 3.8) reduces to Bouguer’s or LambertBeer’s law for absorption [22], ET (λ, ~x) = e(λ, ~x) (1 − ρf (~x)) (1 − ρb (~x)) exp (−k(λ, ~x)d(~x)c(~x)) (3.11) as expected. Further, a unified model for both reflection and transmission of light is obtained when considering Lambertian reflection and a uniform illumination for both cases. For matte, dull surfaces, and a uniform illumination affected by shading, E R (eq. 3.5) reduces to a multiplicative (Lambertian) model for body reflection, ER (λ, ~x) = e(λ)i(~x)R∞ (λ, ~x) (3.12) where e(λ) is the colored but spatially uniform illumination and i(~x) denotes the intensity distribution due to the surface geometry. Similar, for a uniform illuminated transparent material, intensity affected by shading and Fresnel reflectance, E T (eq. 3.8) may be rewritten as ET (λ, ~x) = e(λ)i(~x)C(λ, ~x) (3.13) where e(λ) is the uniform illumination, i(~x) denotes the intensity distribution, including Fresnel reflectance at front and back surface, and C(λ, ~x) represents the total extinction coefficient, that is the total absorption- and scattering coefficient, within the transparent layer. A general model for spectral image formation useful in both reflectance and transmission of light may now be written as a multiplicative model, E(λ, ~x) = e(λ)i(~x)m(λ, ~x) (3.14) 30 Chapter 3. A Physical Basis for Color Constancy where m(λ, ~x) denotes the material transmittance or reflectance function. Again, e(λ) is the colored but spatially uniform illumination and i(~x) denotes the intensity distribution. The validness of the model may be derived from models (eq. 3.5) and (eq. 3.8). For reflectance of light, the model is valid for matte, dull surfaces, for which the Fresnel reflectance is neglectable, and for isotropic light scattering within the material. For light transmission, the model is valid for neutral interface reflection, small angle of incidence to the surface normal, and isotropic light scattering within the material. The model as such is used in the next sections to derive color invariant material properties. 3.2 Illumination Invariant Properties of Object Reflectance or Transmittance Any method for finding invariant color properties relies on a photometric model and on assumptions about the physical variables involved. For example, hue is known to be insensitive to surface orientation, illumination direction, intensity and highlights, under a white illumination [6]. Normalized rgb is an object property for matte, dull surfaces illuminated by white light. When the illumination color varies or is not white, other object properties which are related to constant physical parameters should be measured. In this section, expressions for determining material changes in images will be derived, robust to a change in illumination color over time. Therefore, the photometric model derived in section 3.1 is taken into account. Consider the photometric reflection model (eq. 3.14) and an illumination with locally constant color, E(λ, ~x) = e(λ)i(~x)m(λ, ~x) (3.15) where e(λ) represents the illumination spectrum. The assumption allows for the extraction of expressions describing material changes independent of the illumination. Without loss of generality, we restrict ourselves to the one dimensional case; two dimensional expressions may be derived according to Chapter 4. Differentiation of (eq. 3.15) with respect to λ results in ∂E ∂e ∂m = i(x)m(λ, x) + i(x)e(λ) . ∂λ ∂λ ∂λ (3.16) Dividing (eq. 3.16) by (eq. 3.15) gives the relative differential, 1 ∂E 1 ∂e 1 ∂m = + . E(λ, x) ∂λ e(λ) ∂λ m(λ, x) ∂λ (3.17) The result consists of two terms, the former depending on the illumination color and the latter depending on material properties. Since the illumination color is constant 3.2. Illumination Invariant Properties of Object Reflectance or Transmittance with respect to x, differentiation to x yields a material property only, ½ ¾ ½ ¾ ∂ 1 1 ∂ ∂E ∂m = . ∂x E(λ, x) ∂λ ∂x m(λ, x) ∂λ 31 (3.18) Within the Kubelka-Munk model, assuming matte, dull surfaces or transparent layers, and assuming a single light source, Nλx determines changes in object reflectance or transmittance, Nλx = 1 ∂2E 1 ∂E ∂E − E(λ, x) ∂λ∂x E(λ, x)2 ∂λ ∂x (3.19) which determines material changes independent of the viewpoint, surface orientation, illumination direction, illumination intensity and illumination color. The expression results from differentiation of (eq. 3.18). The expression given by (eq. 3.19) is the fundamental lowest order illumination invariant. Any spatio-spectral derivative of (eq. 3.19) inherently depends on the body reflectance or object transmittance only. According to [17], a complete and irreducible set of differential invariants is obtained by taking all higher order derivatives of the fundamental invariant, ½ ¾ ∂ m+n ∂E ∂E 1 1 ∂2E Nλxλm xn = − (3.20) ∂λm ∂xn E(λ, x) ∂λ∂x E(λ, x)2 ∂λ ∂x for m ≥ 0, n ≥ 0. Application of the chain rule for differentiation yields the higher order expressions in terms of the spatio-spectral energy distribution. For instance, the spectral derivative of Nλx is given by Nλλx = Eλλx E 2 − Eλλ Ex E − 2Eλx Eλ E + 2Eλ2 Ex E3 (3.21) where E(λ, x) is written as E for simplicity and indices denote differentiation. Note that these expressions are valid everywhere E(λ, x) > 0. These invariants may be interpreted as the spatial derivative of the normalized spectral slope Nλ and curvature Nλλ of the reflectance function R∞ . Expressions for higher order derivatives are straightforward. A special case of (eq. 3.20) is for Lambert-Beer absorption (eq. 3.11) and slices of locally constant thickness. Under these circumstances, ratios of invariants from set N, N0 = N m,n N p,q (3.22) for m, p ≥ 1 and n, q ≥ 0, are independent of the slice thickness. The property is proven by considering differentiation with respect to λ of (eq. 3.11), and division by 32 Chapter 3. A Physical Basis for Color Constancy (eq. 3.11), which results in 1 ∂e ∂k 1 ∂ET = − dc(x) . ET (λ, x) ∂λ e(λ) ∂λ ∂λ Differentiation of the expression with respect to x yields ½ ¾ ∂k ∂c ∂2k ∂ ∂ET 1 −d . = −dc(x) ∂x ET (λ, x) ∂λ ∂λ∂x ∂λ ∂x (3.23) (3.24) By taking ratios of higher order derivatives, the constant thickness d is eliminated. Summarizing, we have derived a complete set of color constant expressions determining object reflectance or transmittance. The expressions are invariant for a change of illumination over time. The major assumption underlying the proposed invariants is a single colored illumination, effectuating a spatially constant illumination spectrum. For an illumination color varying slowly over the scene with respect to the spatial variation of the object reflectance or transmittance, simultaneous color constancy is achieved by the proposed invariant. We have proven that spatial differentiation is necessary to achieve color constancy when pre-knowledge about the illuminant is not available. Hence, any color constant system should perform both spectral as well as spatial comparison in order to be invariant against illumination changes, which confirms the theory of relational color constancy as proposed in [4]. Accurate estimates of spatio-spectral differential quotients can be obtained by applying the Gaussian color model as described in Chapter 2. 3.3 3.3.1 Experiments Overview The transmission of 168 patches from a calibration grid (IT8.7/1, Agfa, Mortsel, Belgium) were measured (Spectrascan PR-713PC, Photo Research, Chatsworth, CA) from 390 nm to 730 nm, resampled at 5 nm intervals. The patches include achromatic colors, skin like tints and full colors (fig. 3.3). Each patch i will be represented by its spectral transmission m̂i . For the case of daylight, incandescent and halogen light, the emission spectra are known to be a one parameter function of color temperature. For these important classes of illuminants, the spectral energy distribution ek (λ) were calculated according to the CIE method as described in [22]. Daylight illuminants were calculated in the range of 4,000K up to 10,000K color temperature in steps of 500K. The 4,000K and 10,000K illuminants represent extremes of daylight, whereas 6,500K represents average daylight. Emission spectra of halogen and incandescent lamps are equivalent 33 3.3. Experiments 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Figure 3.3: The CIE 1964 chromaticity diagram of the colors in the calibration grid used for the experiments, illuminated by average daylight D65 to blackbody radiators, generated from 2,000K up to 5,000K according to [22, Section 1.2.2]. For the case of fluorescent light, illuminants F1–F12 are used, as given by [7]. These are 12 representative spectral power distributions for fluorescent lamps. Considering (eq. 3.14), the spectrum ski (λ) transmitted by a planar patch i under illuminant k is given by ski (λ) = ek (λ)mi (λ) (3.25) where mi (λ) is the spectral transmittance and ek (λ) the illumination spectrum. Color values are calculated by the weighted summation over the transmitted spectrum ski at 5 nm intervals. For the CIE 1964 XYZ sensitivities, the XYZ value is obtained by [22, Section 3.3.8] X = Y = Z = 1 X x̄10 (λ)ek (λ)mi (λ) k10 λ 1 X ȳ10 (λ)ek (λ)mi (λ) k10 λ 1 X z̄10 (λ)ek (λ)mi (λ) k10 (3.26) λ where k10 is a constant to normalize Yw = 100, Yw being the intensity of the light 34 Chapter 3. A Physical Basis for Color Constancy source. Similarly, for the Gaussian color model (see Chapter 2) we have E = ∆λ X G(λ; λ0 , σλ )ek (λ)mi (λ) λ Eλ = σλ ∆λ X Gλ (λ; λ0 , σλ )ek (λ)mi (λ) λ Eλλ = σλ2 ∆λ X Gλλ (λ; λ0 , σλ )ek (λ)mi (λ) (3.27) λ where ∆λ = 5 nm. Further, σλ = 55 nm and λ0 = 520 nm to be colorimetric with human vision (see Chapter 2). Color constancy is examined by evaluating edge strength under different simulated illumination conditions. Borders are formed by combining all patches with one another, yielding 14,028 different color combinations. A ground truth is obtained by taking a perfect white light illuminant. The reference boils down to an equal energy spectrum. The ground truth represents the patch transmission function up to multiplication by a constant a, sref i (λ) = ami (λ) . (3.28) The difference in edge strength for two patches illuminated by the test illuminant and the reference illuminant indicates the error in color constancy. We define the color constancy ratio as ¯ k ¯ ¯ d (i, j) − dref (i, j) ¯ ¯ ²k = 1 − ¯¯ ¯ dref (i, j) (3.29) where dk is the color difference between two patches i, j under the test illuminant k, and dref is the difference between the same two patches under the reference illuminant, that is equal energy illumination. The color constancy ratio ²k measures the deviation in edge strength between two patches i, j due to illuminant k relative to the edge strength under the reference illuminant. Three experiments are performed. One experiment evaluates the performance of the proposed invariant (eq. 3.19) under ideal conditions. That is, it evaluates N λx at scale σλ = 5 nm, for multiple small-band measurements with ∆λ = 5 nm covering the visual spectrum. A second experiment assesses the influence of broad-band filters on color constancy. The first experiment is repeated but now for σλ = 55 nm filters, again with ∆λ = 5 nm covering the visual spectrum. The final experiment evaluates color constancy for a colorimetric system detecting color differences. Three broadband measures (σλ = 55 nm) are taken at λ0 = 515 nm. The proposed invariant is evaluated against the performance for color constancy of [21] and uv color space [22]. 35 3.3. Experiments Table 3.1: Results for the small-band experiment for invariant Nλx with σλ = 5 nm and 69 spectral samples, δλ = 5 nm apart. Average percentage constancy ²̄ over 14.028 color edges is given, together with standard deviation σ. Daylight T [K] ²̄ [%] 4000 99.9 4500 99.9 5000 99.9 5500 99.9 6000 99.9 6500 99.9 7000 99.9 7500 99.9 8000 99.9 8500 99.9 9000 99.9 9500 99.9 10000 99.9 3.3.2 (σ) (0.2) (0.2) (0.2) (0.2) (0.2) (0.2) (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) Blackbody T [K] ²̄ [%] 2000 99.9 2500 99.9 3000 99.9 3500 99.9 4000 100.0 4500 100.0 5000 100.0 (σ) (0.1) (0.1) (0.0) (0.0) (0.0) (0.0) (0.0) Fluorescent T [K] ²̄ [%] (σ) F1 99.4 (0.5) F2 99.1 (0.8) F3 98.7 (1.1) F4 98.2 (1.6) F5 99.5 (0.5) F6 99.0 (0.9) F7 99.5 (0.5) F8 99.1 (0.7) F9 98.8 (1.0) F10 95.6 (1.7) F11 94.4 (2.0) F12 93.2 (2.4) Small-Band Experiment For each patch transmission, 69 Gaussian weighted samples were taken every 5 nm with σλ = 5 nm. Invariant Nλx was calculated between each combination of two patches for each central wavelength λ0 of the filters. For the experiment, color difference is defined by ds = sX λc (i, j)2 Nλx (3.30) λc λc (i, j) where λc denotes the central wavelength of the cth filter (σλ = 5 nm), and Nλx the edge strength (eq. 3.19) between patch i and j for filter c. Color constancy is determined by (eq. 3.29), using ds as measure for color difference. The results for the experiment are shown in tab. 3.1. Average constancy for daylight and blackbody is 99.9 ± 0.2%, which yields perfect color constancy. For the fluorescent illuminants, average constancy is 97.9 ± 1.3%, almost perfect color constant. The small error is caused by the spectral spikes in the fluorescent emission spectra, smoothed to the filter size of σλ = 5 nm. The experiment demonstrates that perfect illumination invariance can be achieved by using the proposed invariants and a spectrophotometer. 36 Chapter 3. A Physical Basis for Color Constancy Table 3.2: Results for the broad-band experiment for invariant Nλx with σλ = 55 nm and 69 spectral samples, δλ = 5 nm apart. Average percentage constancy ²̄ and standard deviation σ is given over the 14.028 different edges. T [K] 4000 4500 5000 5500 6000 6500 7000 7500 8000 8500 9000 9500 10000 3.3.3 Daylight ²̄ [%] 97.3 99.8 99.2 99.5 99.7 99.8 99.9 99.0 99.1 99.1 99.1 99.1 99.1 (σ) (2.3) (1.9) (1.6) (1.5) (1.3) (1.3) (1.2) (1.2) (1.2) (1.2) (1.2) (1.2) (1.2) Blackbody T [K] ²̄ [%] 2000 93.2 2500 95.8 3000 97.1 3500 97.9 4000 98.4 4500 98.7 5000 98.9 (σ) (5.3) (3.3) (2.2) (1.5) (1.1) (0.8) (0.7) Fluorescent T [K] ²̄ [%] (σ) F1 89.6 (9.2) F2 86.1 (11.3) F3 84.0 (12.0) F4 82.2 (12.3) F5 89.1 (9.3) F6 85.1 (11.7) F7 94.7 (7.1) F8 95.4 (6.8) F9 94.0 (7.6) F10 88.0 (9.8) F11 87.5 (9.8) F12 86.8 (9.8) Broad-Band Experiment The experiment investigates the influence of broad-band filters by repeating the previous experiment but now for σλ = 55 nm. Hence, 69 largely overlapping Gaussian weighted samples of the transmission spectrum are obtained. The results show (tab. 3.2) constancy for daylight of 98.7 ± 1.5%. For blackbody radiators, a constancy of 97.1±2.6% is achieved. These numbers are almost similar to the results obtained for small-band filters. For fluorescent illuminants error increases to 15% (average constancy 88.5 ± 9.9%) by using broad-band filters. Hence, approximation of derivatives with broad-band filters is valid under daylight and blackbody illumination. 3.3.4 Colorimetric Experiment For the colorimetric experiment, Gaussian weighted samples are taken at λ 0 = 520 nm and σλ = 55 nm. Color difference is defined by p dN = Nλx (i, j)2 + Nλλx (i, j)2 (3.31) were Nλx (i, j) (eq. 3.19) and Nλλx (i, j) (eq. 3.21) measures total chromatic edge strength between patch i and j. Color constancy is determined by (eq. 3.29), using dN as measure for color difference. 37 3.3. Experiments For comparison, the experiment is repeated with the CIE XYZ 1964 sensitivities for observation. Color difference is defined by the Euclidian distance in the CIE 1976 u0 v 0 color space [22, Section 3.3.9], duv = q (u0i − u0j )2 + (vi0 − vj0 )2 (3.32) where i, j represent the different patches. Color constancy is determined by (eq. 3.29), using duv as measure for color difference. Note that for the u0 v 0 color space no information about the light source is included. Further, u0 v 0 space is similar to uv space up to a transformation of the achromatic point. The additive transformation of the white point makes uv space a color constant space. Differences in u0 v 0 are equal to differences in uv space. Hence, duv is an illumination invariant measure of color difference. As a well known reference, the von Kries transform for chromatic adaptation [21] is evaluated in a similar experiment. Von Kries method is based on Lambertian reflection, assuming that the (known) sensor responses to the illuminant may be used to eliminate the illuminant from the measurement. For the experiment, von Kries adaptation is applied on the measured color values, and the result is transformed to the equal energy illuminant [7]. Thereafter, color difference between patches i and j taken under the test illuminant is calculated according to (eq. 3.32). Comparison to the color difference between the same two patches under the reference illuminant is obtained by (eq. 3.29), using the von Kries transformed u’v’ distance as measure for color distance. Results for the color constancy measurements are given for daylight illumination (tab. 3.3), blackbody radiators (tab. 3.4), and fluorescent light (tab. 3.5). Average constancy over the different phases of daylight is for the proposed invariant 91.8 ± 6.1%. Difference in u0 v 0 color space performs similar with an average of 91.9 ± 6.3%. The von Kries transform is 5% more color constant, 96.0 ± 3.3%. As expected, the von Kries transform has a better performance given that the color of the illuminant is taken into account. For blackbody radiators, the proposed invariant is on average 88.9 ± 12.5% color constant. The proposed invariant is more color constant than u0 v 0 differences, average 82.4±15.1%. Again, von Kries transform is even better with an average of 93.4±6.8%. For these types of illuminants, often running at a low color temperature, variation due to illumination color is drastically reduced by the proposed method. The proposed method is less color constant than von Kries adaptation, which requires knowledge on the color of the light source. In comparison to u0 v 0 color differences, the proposed invariant offers better performance for low color temperature illuminants. Color constancy for fluorescent illuminants is on average 85.0 ± 11.8% for the proposed invariant, 84.7 ± 10.5% for u0 v 0 difference, and 89.4 ± 8.8% for the von 38 Chapter 3. A Physical Basis for Color Constancy Table 3.3: Results for the different colorimetric experiments with daylight illumination, ranging from 4,000K to 10,000K color temperature. Average percentage constancy ²̄ and standard deviation σ for the proposed invariant N , the von Kries transform, and u0 v 0 difference. N T [K] 4000 4500 5000 5500 6000 6500 7000 7500 8000 8500 9000 9500 10000 ²̄ [%] 92.2 94.5 94.9 94.1 93.2 92.5 91.8 91.2 90.6 90.1 89.6 89.2 88.8 (σ) (5.6) (4.2) (2.8) (1.8) (2.7) (4.0) (5.2) (6.2) (7.0) (7.6) (8.2) (8.7) (9.1) von Kries ²̄ [%] (σ) 96.1 (3.2) 97.9 (1.8) 99.2 (0.7) 98.9 (1.0) 97.9 (1.7) 96.9 (2.4) 96.1 (2.9) 95.4 (3.4) 94.8 (3.8) 94.3 (4.2) 93.8 (4.5) 93.4 (4.8) 93.0 (5.1) u’v’ ²̄ [%] 86.9 91.1 94.5 96.6 97.6 96.1 94.3 92.7 91.2 89.9 88.8 87.8 86.9 (σ) (10.0) (7.1) (4.6) (2.0) (1.8) (2.7) (3.8) (4.9) (6.0) (6.9) (7.7) (8.4) (9.1) Kries transform. As already pointed out for tab. 3.2, the large integration filters are not capable in offering color constancy for the class of fluorescent illuminants. The use of broad-band filters limits the applicability to smooth spectra, for which the Gaussian weighted differential quotients as derived in Chapter 2 are accurate estimations. For outdoor scenes, halogen illumination and incandescent light, the illumination spectra may be considered smooth, as shown by the experimental results tab. 3.2 versus tab. 3.1. 3.4 Discussion This chapter presents a physics-based background for color constancy, valid for both light reflectance as well as light transmittance. To achieve that goal, the KubelkaMunk theory is used as a model for color image formation. By considering spatial and spectral derivatives of the formation model, object reflectance properties are derived independent of the spectral energy distribution of the illuminant. Knowledge about the spectral power distribution of the illuminant is not required for the proposed invariant, as opposed to the well known von Kries transform for color constancy [21]. The robustness of our invariant (eq. 3.19) is assured by using the Gaussian color model, introduced in Chapter 3. The Gaussian color model is considered an adequate 39 3.4. Discussion Table 3.4: Results for the different colorimetric experiments with blackbody radiators from 2,000K to 5,000K color temperature. Average percentage constancy ²̄ and standard deviation σ for the proposed invariant N , the von Kries transform, and u0 v 0 difference. N T [K] 2000 2500 3000 3500 4000 4500 5000 ²̄ [%] 75.6 82.5 87.1 90.8 93.7 96.0 96.9 (σ) (24.5) (16.5) (11.2) (7.5) (4.9) (3.0) (1.7) von Kries ²̄ [%] (σ) 85.6 (12.4) 89.0 (9.4) 91.9 (6.8) 94.3 (4.7) 96.3 (3.0) 97.9 (1.7) 99.1 (0.7) u’v’ ²̄ [%] 65.8 72.3 78.3 83.7 88.4 92.5 95.9 (σ) (24.9) (20.6) (16.3) (12.4) (8.9) (6.0) (3.4) Table 3.5: Results for the colorimetric experiments with representative fluorescent illuminants. Average percentage constancy ²̄ and standard deviation σ for the proposed invariant N , the von Kries transform, and u0 v 0 difference. N T [K] F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 ²̄ [%] 82.4 82.7 79.9 77.2 81.1 80.6 90.2 93.6 93.3 87.2 87.1 84.9 (σ) (14.4) (12.4) (13.5) (15.4) (15.4) (13.7) (7.8) (3.1) (4.4) (9.1) (10.1) (13.6) von ²̄ [%] 89.4 87.8 85.5 83.6 88.1 85.9 95.2 97.8 95.3 91.1 88.3 85.0 Kries (σ) (7.9) (7.9) (9.8) (11.6) (8.6) (8.9) (3.7) (1.6) (3.6) (8.8) (11.2) (13.8) u’v’ ²̄ [%] 88.6 82.9 76.4 71.4 87.4 79.7 93.7 94.6 90.1 91.7 85.5 74.9 (σ) (7.9) (7.5) (11.4) (14.9) (8.9) (8.8) (3.9) (4.4) (7.8) (9.2) (12.6) (18.7) approximation of the human tri-stimulus sensitivities. The Gaussian color model measures the intensity, first, and second order derivative of the spectral energy distribution, combined in a well-established spatial observation theory. Application of the Gaussian color model in color constancy ensures compatibility with colorimetry, while inherently physically sound and robust measurements are derived. From a different perspective, color constancy was considered in [1, 14]. The back- 40 Chapter 3. A Physical Basis for Color Constancy ground is experimental colorimetry, where subjects are asked to match the reference and test illumination condition. As a consequence their experiments do not include shadow and shading. The result of their approach shows approximate color constancy under natural illuminants. However, their approach is unable to cope with color constancy of three dimensional scenes, where shadow plays an important role. The advantage of our physical approach over an empirical colorimetric approach, is that invariant properties are deduced from the image formation model. Our proposed (eq. 3.19) is designed to be insensitive to intensity changes due to the scene geometry. The proposed invariant (eq. 3.19) is evaluated by experiments on spectral data of 168 transparent patches, illuminated by daylight, blackbody, and fluorescent illuminants. Average constancy is 90 ± 5% for daylight, 90 ± 10% for blackbody radiators, and 85 ± 10% for fluorescent illuminants. The performance of the proposed method is slightly less than that of the von Kries transform. Average constancy for von Kries on the 168 patches is 95 ± 3% for daylight, 95 ± 5% for blackbody radiators, and 90 ± 10% for fluorescent illuminants. This is explained from the fact that the von Kries transform requires explicit knowledge of material and illuminant, and even than the difference is small. There are many circumstances where such a knowledge of material and illuminant is missing, especially in image retrieval from large databases, or when calibration is not practically feasible as is frequently the case in light microscopy. The proposed method requires knowledge about the material only, hence is applicable under a larger set of imaging circumstances. As an alternative for color constancy under an unknown illuminant, one could use Luv color space differences [22] instead of the proposed method. We have evaluated color constancy for both methods. The proposed invariant offers similar performance to u0 v 0 color differences. This is remarkable, given the different background against which the methods are derived. Whereas u0 v 0 is derived from colorimetric experiments, hence from human perception, the proposed invariant N is derived from measurement theory –the physics of observation– and physical reflection models. Apparently, it is the physical cause of color, and the environmental variation in physical parameters, to which the human visual system adapts. As pointed out in [14], mechanisms responding to cone-specific contrast offer a better correspondence with human vision than by a system that estimates illuminant and reflectance spectra. The research presented here raises the question whether the illuminant is estimated at all in pre-attentive vision. The physical model presented demands spatial comparison in order to achieve color constancy, thereby confirming relational color constancy as a first step in color constant vision [4, 16]. Hence, lowlevel mechanisms as color constant edge detection reported here may play a role in front-end vision. Bibliography 41 Bibliography [1] D. H. Brainard. Color constancy in the nearly natural image: 2. Achromatic loci. J. Opt. Soc. Am. A, 15:307–325, 1998. [2] M. D’Zmura and P. Lennie. Mechanisms of color constancy. J. Opt. Soc. Am. A, 3(10):1662–1672, 1986. [3] G. D. Finlayson. Color in perspective. IEEE Trans. Pattern Anal. Machine Intell., 18(10):1034–1038, 1996. [4] D. H. Foster and S. M. C. Nascimento. Relational colour constancy from invariant cone-excitation ratios. Proc. R. Soc. London B, 257:115–121, 1994. [5] B. V. Funt and G. D. Finlayson. Color constant color indexing. IEEE Trans. Pattern Anal. Machine Intell., 17(5):522–529, 1995. [6] T. Gevers and A. W. M. Smeulders. Color based object recognition. Pat. Rec., 32:453–464, 1999. [7] R. W. G. Hunt. Measuring Colour. Ellis Horwood Limited, Hertfordshire, England, 1995. [8] D. B. Judd and G. Wyszecki. Color in Business, Science, and Industry. Wiley, New York, NY, 1975. [9] J. J. Koenderink and A. Kappers. Color Space. Utrecht University, The Netherlands, 1998. [10] J. J. Koenderink and A. J. van Doorn. Receptive field families. Biol. Cybern., 63:291–297, 1990. [11] P. Kubelka. New contribution to the optics of intensely light-scattering materials. part I. J. Opt. Soc. Am., 38(5):448–457, 1948. [12] P. Kubelka and F. Munk. Ein beitrag zur optik der farbanstriche. Z. Techn. Physik, 12:593, 1931. [13] E. H. Land. The retinex theory of color vision. Sci. Am., 237:108–128, 1977. [14] M. P. Lucassen and J. Walraven. Color constancy under natural and artificial illumination. Vision Res., 37:2699–2711, 1996. [15] L. T. Maloney and B. A. Wandell. Color constancy: a method for recovering surface spectral reflectance. J. Opt. Soc. Am. A, 3:29–33, 1986. [16] S. M. C. Nascimento and D. H. Foster. Relational color constancy in achromatic and isoluminant images. J. Opt. Soc. Am. A, 17(2):225–231, 2000. 42 Chapter 3. A Physical Basis for Color Constancy [17] P. Olver, G. Sapiro, and A Tannenbaum. Differential invariant signatures and flows in computer vision: A symmetry group approach. In B. M. ter Haar Romeny, editor, Geometry-Driven Diffusion in Computer Vision. Kluwer Academic Publishers, Boston, 1994. [18] M. Pluta. Advanced Light Microscopy, volume 1. Elsevier, Amsterdam, 1988. [19] G. Sapiro. Color and illuminant voting. IEEE Trans. Pattern Anal. Machine Intel., 21(11):1210–1215, 1999. [20] S. A. Shafer. Using color to separate reflection components. Color Res. Appl., 10(4):210–218, 1985. [21] J. von Kries. Influence of adaptation on the effects produced by liminous stimuli. In D. L. MacAdam, editor, Sources of Color Vision. MIT Press, Cambridge, MS, 1970. [22] G. Wyszecki and W. S. Stiles. Color Science: Concepts and Methods, Quantitative Data and Formulae. Wiley, New York, NY, 1982. Chapter 4 Measurement of Color Invariants submitted∗ to IEEE Transactions on Pattern Analysis and Machine Intelligence. “Attaching significance to invariants is an effort to recognize what, because of its form or colour or meaning or otherwise, is important or significant in what is only trivial or ephemeral.” – –H.W. Turnbull. It is well known that color is a powerful cue in the distinction and recognition of objects. Segmentation based on color, rather than just intensity, provides a broader class of discrimination between material boundaries. Modeling the physical process of color image formation provides a clue to the object-specific parameters [6, 8, 19]. To reduce some of the complexity intrinsic to color images, parameters with known invariance are of prime importance. Current methods for the measurement of color invariance require a fully sampled spectrum as input data usually derived by a spectrometer. Angelopoulou et al. [1] use the spectral gradient to estimate surface reflectance from multiple images of the same scene, captured with different spectral narrow band filters. The assumptions underlying their approach require a smoothly varying illumination. Their method is able to accurately estimate surface reflectance independent of the scene geometry. Stokman and Gevers [20] propose a method for edge classification from spectral images. Their method aims in detecting edges and assigning one of the types: shadow or geometry, highlight, or a material edge. Under the assumption of spectral narrow band filters, and for a known illumination spectrum, they prove their method to be accurate in edge classification. These approaches ∗ Part of this work has appeared in the proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2000, vol. 1, pp. 50–57. 43 44 Chapter 4. Measurement of Color Invariants hampers broad use as spectrometers are both slow and expensive. In addition they do not provide two-dimensional spatial resolution easily. In this chapter we aim at a broad range of color invariants measured from RGB-cameras. To that end, differential geometry is adopted as the framework for feature detection and segmentation of images. Its impact in computer vision is overwhelming but mostly limited to grey-value images [5, 15, 21]. Embedding the theory in the scalespace paradigm [11, 13] resulted in well-posed differential operators robust against noisy measurements, with the Gaussian aperture as the fundamental operator. Only a few papers are available on color differential geometry [7, 18], which are mainly based on the color gradient proposed by Di Zenzo [23]. In the paper, an expression for the color gradient is derived by analysis of the eigensystem of the color structure tensor. In [2], curvature and zero-crossing detection is investigated for the directional derivative of the color gradient. For these geometrical invariants no physical model is taken into account, yielding measurements which are highly influenced by the specific imaging circumstances as shadow, illumination, and viewpoint. We consider the introduction of wavelength in the scale-space paradigm, as suggested by Koenderink [12]. This leads to a spatio-spectral family of Gaussian aperture functions, in Chapter 2 introduced as the Gaussian color model. Hence, the Gaussian color model may be considered an extension of the differential geometry framework into the spatiospectral domain. We apply the spatio-spectral scale-space to the measurement of photometric and geometric invariants. In Chapter 3, the authors discuss the use of the Shafer model [19], effectively based on the older Kubelka-Munk theory [14], to measure object reflectance independent of illumination color. The Kubelka-Munk theory models the reflected spectrum of a colored body [10, 22], based on a material-dependent scattering and absorption function, under the assumption that light is isotropically scattered within the material. The theory has proven to be successful for a wide variety of materials and applications [10]. Therefore, the Kubelka-Munk theory is well suited for determining material properties from color measurements. We use the Kubelka-Munk theory for the definition of object reflectance properties, for a wide range of assumptions regarding imaging conditions. The measurement of invariance involves a balance between constancy of the measurement regardless of the disturbing influence of the unwanted transform on the one hand, and retained discriminating power between truly different states of the objects on the other. As a general rule, features allowing ignorance of a larger set of disturbing factors, less discriminative power can be expected. We refer to such features as broad features. Hence, both invariance and discriminating power of a method should be investigated simultaneously. Only this allows to asses the practical performance of the proposed method. In this chapter we extensively investigate invariant properties and discriminative power. The chapter is organized as follows. Section 4.1 describes a physical model for im- 45 4.1. Color Image Formation Model age formation, based on the Kubelka-Munk theory. First contribution of this chapter is a complete set of invariant expressions derived for basically three different imaging conditions (section 4.2). A second important contribution considers the robust measurement of invariant expressions from RGB-images (section 4.3). Further, section 4.3 demonstrates the performance of the features as invariance and discriminative power between different colored patches, which may be considered as a third contribution. 4.1 Color Image Formation Model In Chapter 3, image formation is modeled by means of the Kubelka-Munk theory [10, 22] for colorant layers. Under the assumption that light within the material is isotropically scattered, the material layer may be characterized by a wavelength dependent scatter coefficient and absorption coefficient. The model unites both reflectance of light and transparent materials. The class of materials for which the theory is useful ranges from dyed paper and textiles, opaque plastics, paint films, up to enamel and dental silicate cements [10]. In the sequel we will derive color invariant expressions under various imaging conditions. Therefore, an image formation model adequate for reflectance of light in real-world scenes is considered. We consider the Kubelka-Munk theory as a general model for color image formation. The photometric reflectance model resulting from the Kubelka-Munk theory is given by (see Chapter 3) 2 E(λ, ~x) = e(λ, ~x) (1 − ρf (~x)) R∞ (λ, ~x) + e(λ, ~x)ρf (~x) (4.1) where x denotes the position at the imaging plane and λ the wavelength. Further, e(λ, ~x) denotes the illumination spectrum and ρf (~x) the Fresnel reflectance at ~x. The material reflectivity is denoted by R∞ (λ, ~x). The reflected spectrum in the viewing direction is given by E(λ, ~x). When redefining symbols cb (λ, ~x) = e(λ, ~x)R∞ (λ, ~x), 2 ci (λ, ~x) = e(λ, ~x), mb (~x) = (1 − ρf (~x)) and mi (~x) = ρf (~x), (eq. 4.1) reduces to E(λ, ~x) = mb (~x)cb (λ, ~x) + mi (~x)ci (λ, ~x) (4.2) which is the dichromatic reflection model by Shafer [19]. Concerning the Fresnel reflectance, the photometric model assumes a neutral interface at the surface patch. As discussed in [17, 19], deviations of ρf over the visible spectrum are small for commonly used materials, therefore the Fresnel reflectance coefficient may be considered constant. The following special case can be derived. For matte, dull surfaces, the Fresnel coefficient can be considered neglectable, ρf (~x) ≈ 0, for which E(λ, ~x) (eq. 4.1) reduces to the Lambertian model for diffuse body reflection E(λ, ~x) = e(λ, ~x)R∞ (λ, ~x) as expected. (4.3) 46 4.2 Chapter 4. Measurement of Color Invariants Determination of Color Invariants Any method for finding invariant color properties relies on a photometric model and on assumptions about the physical variables involved. For example, hue is known to be insensitive to surface orientation, illumination direction, intensity and highlights, under a white illumination [8]. Normalized rgb is an object property but only for matte, dull surfaces and only when illuminated by white light. When the illumination color is not white, other object properties should be measured. In this section, expressions for determining invariant properties in color images will be derived for three different imaging conditions, taking into account the photometric model derived in section 4.1. The imaging conditions are assumed to be the 5 relevant out of 8 combinations of: a. white or colored illumination, b. matte, dull object or general object, or c. uniformly stained object or generally colored object. Further specialization as uniform illumination or a single illumination spectrum may be considered. Note that each essentially different condition of the scene, object or recording circumstances results in suited different invariant expressions. For notational convenience, we first concentrate on the one dimensional case; two dimensional expressions will be derived later when introducing geometrical invariants. 4.2.1 Invariants for White but Uneven Illumination Consider the photometric reflection model (eq. 4.1). For white illumination, the spectral components of the source are approximately constant over the wavelengths. Hence, a spatial component i(x) denotes intensity variations, resulting in n o 2 E(λ, x) = i(x) ρf (x) + (1 − ρf (x)) R∞ (λ, x) . (4.4) The assumption allows the extraction of expressions describing object reflectance independent of the Fresnel reflectance. Let indices of λ and x indicate differentiation, and from now on dropping (λ, x) from E(λ, x) when such will cause no confusion. Lemma 6 Within the Kubelka-Munk model, assuming dichromatic reflection and white illumination, H= Eλ Eλλ is an object reflectance property independent of viewpoint, surface orientation, illumination direction, illumination intensity and Fresnel reflectance coefficient. Proof: Differentiating (eq. 4.4) with respect to λ twice results in Eλ = i(x)(1 − ρf (x))2 ∂R∞ (λ, x) ∂λ 47 4.2. Determination of Color Invariants and Eλλ = i(x)(1 − ρf (x))2 ∂ 2 R∞ (λ, x) . ∂λ2 Hence, their ratio depends on derivatives of the object reflectance functions R ∞ (λ, x) only, which proves the lemma. t u To interpret H, consider the local Taylor expansion at λ0 truncated at second order, 1 E(λ0 + ∆λ) ≈ E(λ0 ) + ∆λEλ (λ0 ) + ∆λ2 Eλλ (λ0 ) . 2 (4.5) The function extremum of Eλ (λ0 + ∆λ) is at ∆λ for which the first order derivative is zero, d {E(λ0 + ∆λ)} = Eλ (λ0 ) + ∆λEλλ (λ0 ) = 0 . dλ (4.6) Hence, for ∆λ near the origin λ0 , ∆λmax = − Eλ (λ0 ) . Eλλ (λ0 ) (4.7) In conclusion, the property H is related to the hue (i.e. arctan (λmax )) of the material. For Eλλ (λ0 ) < 0 the result is at a maximum and describes a Newtonian (prism) color, whereas for Eλλ (λ0 ) > 0 the result is at a minimum and indicates a non-Newtonian (slit) color. Of significant importance is the derivation of a complete set Ψ of functionally independent (irreducible) differential invariants Ψi . Completeness states that all possible independent invariants for the unwanted distortion are present in the set Ψ. From Olver et al. [16], the basic method for constructing a complete set of differential invariants is to use invariant differential operators. A differential operator is said to be invariant under a given distortion if it maps differential invariants to higher order differential invariants. Hence, by iteration, such an operator produces a hierarchy of differential invariants of arbitrarily large order n, given a lowest order invariant. The lowest order invariant is referred to as the fundamental invariant. Summarizing, for a lowest order color invariant, a differential operator may be defined to construct complete, irreducible sets of color invariants under the same imaging conditions by iteration. Proposition 7 A complete and irreducible set of color invariants, up to a given differential order, is given by all derivatives of the fundamental color invariant. In the sequel we will define the generating differential operator given the lowest order fundamental invariant. 48 Chapter 4. Measurement of Color Invariants The expression given by Lemma 6 is a fundamental lowest order invariant. As a result of Proposition 7, differentiation of the expression for H with respect to x or λ results in object reflectance properties under a white illumination. Note that H is ill-defined when the second order spectral derivative vanishes. We prefer to compute differentials of the arctan (H) a monotonic function of H, for which the spatial derivatives yield better numerical stability. Corollary 8 Within the Kubelka-Munk model, a complete and irreducible set of invariants for dichromatic reflection and a white illumination is given by ¶¾ ½ µ ∂ m+n Eλ (4.8) Hλ m x n = arctan ∂λm ∂xn Eλλ for m, n ≥ 0. Application of the chain rule for differentiation yields the higher order expressions in terms of the spatio-spectral energy distribution. For illustration, we give all expressions for first spatial derivative and second spectral order. The hue spatial derivative is given by Hx = Eλλ Eλx − Eλ Eλλx 2 Eλ2 + Eλλ 2 admissible for Eλ2 + Eλλ > 0. In the sequel we also need an expression for color saturation S, q 1 S= Eλ 2 + Eλλ 2 . E(λ, x) 4.2.2 (4.9) (4.10) Invariants for White but Uneven Illumination and Matte, Dull Surfaces A class of tighter invariants may be derived when the object is matte and dull. Consider the photometric reflection model (eq. 4.4), for matte, dull surfaces with low Fresnel reflectance, ρf (~x) ≈ 0, E = i(x)R∞ (λ, x) . (4.11) These assumptions allow the derivation of expressions describing object reflectance independent of the intensity distribution. Lemma 9 Within the Kubelka-Munk model, assuming matte, dull surfaces, and a white illumination, Eλ E is an object reflectance property independent of the viewpoint, surface orientation, illumination direction and illumination intensity. Cλ = 49 4.2. Determination of Color Invariants Proof: Differentiation of (eq. 4.11) with respect to λ and normalization by (eq. 4.11) results in an equation depending on object property only, Eλ ∂R∞ (λ, x) 1 = E R∞ (λ, x) ∂λ which proves the lemma. t u The property Cλ may be interpreted as describing object color regardless intensity. As a result of Proposition 7, all normalized higher order spectral derivatives of Cλ , and their spatial derivatives, result in object reflectance properties under white illumination. The normalization by E is to be evaluated at the spectral wavelength of interest, and therefore is considered locally constant with respect to λ. Corollary 10 Within the Kubelka-Munk model, a complete and irreducible set of invariants for matte, dull surfaces, under a white illumination is given by ½ ¾ ∂ n Eλ m m n Cλ x = (4.12) ∂xn E for m ≥ 1, n ≥ 0. Specific first spatial and second spectral order expressions are given by Eλλ E Eλx E − Eλ Ex = E2 Eλλx E − Eλλ Ex = . E2 Cλλ = Cλx Cλλx (4.13) Note that these expressions are valid everywhere E > 0. These invariants may be interpreted as the spatial derivative of the intensity normalized spectral slope C λ and curvature Cλλ . 4.2.3 Invariants for White, Uniform Illumination and Matte, Dull Surfaces For uniform illumination, consider again the photometric reflection model (eq. 4.11) for matte, dull surfaces, and a white and uniform illumination with intensity i, E(λ, x) = iR∞ (λ, x) . (4.14) The assumption of a white and uniformly illuminated object may be achieved under well defined circumstances, such as photography of art. These assumptions allow the derivation of expressions describing object reflectance independent of the intensity level. 50 Chapter 4. Measurement of Color Invariants Lemma 11 Within the Kubelka-Munk model, assuming matte, dull surfaces, planar objects, and a white and uniform illumination, Wx = Ex E determines changes in object reflectance independent of the illumination intensity. Proof: Differentiation of (eq. 4.14) with respect to x and normalization by (eq. 4.14) results in Ex ∂R∞ (λ, x) 1 = . E R∞ (λ, x) ∂x This is an object reflectance property. t u The property Wx may be interpreted as an edge detector specific for changes in spectral distribution. Under common circumstances, a geometry dependent intensity term is present, hence Wx does not represent pure object properties but will include shadow edges where present. As a result from Proposition 7, all normalized higher order derivatives of Wx yield object reflectance properties under a white and uniform illumination. The normalization by E is to be evaluated at the spatial and spectral point of interest. Hence it is considered locally constant. Corollary 12 Within the Kubelka-Munk model, a complete and irreducible set of invariants for matte, dull surfaces, planar objects, under a white and uniform illumination is given by Wλ m x n = Eλ m x n E (4.15) for m ≥ 0, n ≥ 1. Specific expressions for E > 0 up to first spatial and second spectral order are given by Eλx E Eλλx . = E Wλx = Wλλx (4.16) These invariants may be interpreted as the intensity normalized spatial derivatives of the spectral intensity E, spectral slope Eλ and spectral curvature Eλλ . 51 4.2. Determination of Color Invariants 4.2.4 Invariants for Colored but Uneven Illumination For colored illumination, when the spectral energy distribution of the illumination does not vary over the scene, the illumination may be decomposed into a spectral component e(λ) representing the illumination color, and a spatial component i(x) denoting variations in intensity due to the scene geometry. Hence, for matte, dull surfaces ρf → 0, E = e(λ)i(x)R∞ (λ, x) . (4.17) The assumption allows us to derive expressions describing object reflectance independent of the illumination. Lemma 13 Within the Kubelka-Munk model, assuming matte, dull surfaces and a single illumination spectrum, Nλx = Eλx E − Eλ Ex E2 determines changes in object reflectance independent of the viewpoint, surface orientation, illumination direction, illumination intensity and illumination color. Proof: Differentiation of (eq. 4.17) with respect to λ results in Eλ = i(x)R∞ (λ, x) ∂R∞ (λ, x) ∂e(λ) + e(λ)i(x) . ∂λ ∂λ Dividing by (eq. 4.17) gives the relative differential, Eλ 1 ∂e(λ) 1 ∂R∞ (λ, x) = + . E e(λ) ∂λ R∞ (λ, x) ∂λ The result consists of two terms, the former depending on the illumination color only and the latter depending on body and Fresnel reflectance only. Differentiation to x yields ½ ¾ ½ ¾ ∂ Eλ ∂R∞ (λ, x) ∂ 1 = . ∂x E ∂x R∞ (λ, x) ∂λ The right hand side is depending only on object property. This proves the lemma. t u The invariant Nλx may be interpreted as the spatial derivative of the spectral change of the reflectance function R∞ (λ, x) and therefore indicates transitions in object reflectance. Hence, Nλx determines material transitions regardless illumination color and intensity distribution. As a result of Proposition 7, further differentiation of Nλx results in object reflectance properties under a colored illumination. 52 Chapter 4. Measurement of Color Invariants Corollary 14 Within the Kubelka-Munk model, a complete and irreducible set of invariants for matte, dull surfaces, and a single illumination spectrum, is given by ½ ¾ ∂ m+n−2 Eλx E − Eλ Ex (4.18) Nλ m x n = ∂λm−1 ∂xn−1 E2 for m ≥ 1, n ≥ 1. The third order example is the spectral derivative of Nλx for E(λ, x) > 0, Nλλx = 4.2.5 Eλλx E 2 − Eλλ Ex E − 2Eλx Eλ E + 2Eλ2 Ex . E3 (4.19) Invariants for a Uniform Object For uniformly colored planar surface, the reflectance properties are spatially constant. Hence the reflectance function R∞ and Fresnel coefficient ρf are independent of x, n o 2 E = e(λ, x) ρf + (1 − ρf ) R∞ (λ) . (4.20) For a single illumination source, expressions describing interreflections may be extracted, i.e. the reflected spectrum of surrounding materials. Lemma 15 Within the Kubelka-Munk model, assuming dichromatic reflection, a single illumination source, and a uniformly colored planar surface, Uλx = Eλx E − Eλ Ex E2 determines interreflections of colored objects, independent of the object spectral reflectance function. Proof: Differentiating (eq. 4.20) with respect to λ results in © ª ∂e(λ, x) ∂R∞ (λ) Eλ = ρf + (1 − ρf )2 R∞ (λ) + e(λ, x)(1 − ρf )2 . ∂λ ∂λ Normalization by (eq. 4.20) results in 1 ∂e(λ, x) (1 − ρf )2 ∂R∞ (λ) Eλ = + . E e(λ, x) ∂λ ρf + (1 − ρf )2 R∞ (λ) ∂λ Differentiation with respect to x results in ½ ¾ ½ ¾ ∂ 1 ∂e(λ, x) ∂ Eλ = ∂x E ∂x e(λ, x) ∂λ which depends on the illumination only. Differentiation yields the lemma. t u 53 4.2. Determination of Color Invariants The property Uλx may be interpreted as describing edges due to interreflections and specularities. When ambient illumination is present casting a different spectral distribution, the invariant describes shadow edges due to the combined ambient illumination and incident illumination. Note that the expression for Lemma 15 is identical to the expression in Lemma 13. Consequently, changes in object reflectance cannot be distinguished from interreflections in single images. Further differentiation of Uλx yield interreflections when assuming a uniform colored planar surface. The result is identical to (eq. 4.19). 4.2.6 Summary of Color Invariants In conclusion, within the Kubelka-Munk model, various sets of invariants are derived as summarized in tab. 4.1. The class of materials for which the invariants are useful ranges from dyed paper and textiles, opaque plastics, paint films, up to enamel and dental silicate cements [10]. The invariant sets may be ordered by broadness of invariance, where broader sets allow ignorance of a larger set of disturbing factors than tighter sets. Table 4.1: Summary of the various color invariant sets and their invariance to specific imaging conditions. Invariance is denoted by “+”, whereas sensitivity to the imaging condition is indicated by “–”. Note that the reflected spectral energy distribution E is sensitive to all the conditions cited. H N U C W E viewing direction surface orientation highlights illumination direction illumination intensity illumination color inter reflection + + + + – – + + + + – – + – – – – – + + + + – – + + + + + – – + + – – – – – – – – – The table offers the solution of using the narrowest set of invariants for known imaging conditions, since H ⊂ N = U ⊂ C ⊂ W ⊂ E. In the case that recording circumstances are unknown the table offers a broad to narrow hierarchy. Hence, an incremental strategy of invariant feature extraction may be applied. Combination of invariants open up the way to edge type classification as suggested in [9]. The vanishing of edges for certain invariants indicate if their cause is shading, specular reflectance, or material boundaries. 54 4.2.7 Chapter 4. Measurement of Color Invariants Geometrical Color Invariants in Two Dimensions So far, we have established color invariant descriptors, based on differentials in the spectral and the spatial domain in one spatial dimension. When applied in two dimensions, the result is depending on the orientation of the image content. In order to obtain meaningful image descriptions it is crucial to derive descriptors which are invariant with respect to translation, rotation and scaling. For the grey-value luminance L geometrical invariants are well established [5]. Translation and scale invariance is obtained by examining the (Gaussian) scale-space, which is a natural representation for investigating the scaling behavior of image features [11]. Florack et al. [5] extent the Gaussian scale-space with rotation invariance, by considering in a systematic manner local gauge coordinates. The coordinate axis w and v are aligned to the gradient and isophote tangents directions, respectively. Hence, the first order gradient gauge invariant is the magnitude of the luminance gradient, q Lw = L2x + L2y . (4.21) Note that the first order isophote gauge invariant is zero by definition. The second order invariants are given by Lvv = L2x Lyy − 2Lx Ly Lxy + L2y Lxx L2w (4.22) related to isophote curvature, Lvw ¡ ¢ Lx Ly (Lyy − Lxx ) − L2x − L2y Lxy = L2w (4.23) related to flow-line curvature, and Lww = L2x Lxx + 2Lx Ly Lxy + L2y Lyy L2w (4.24) related to isophote density. Note that the Laplacian operator ∆L = Lxx + Lyy is an invariant and hence equal to ∆L = Lvv + Lww . On the basis of these spatial results, we combine (eq. 4.21)—(eq. 4.24) with the color invariants for the 1D-case established before. The resulting first order expressions are given in tab. 4.2. Two or three measures for edge strength are derived, one for each spectral differential order. The only exception is H. Total edge strength due to differences in the 55 4.3. Measurement of Color Invariants Table 4.2: Summary of the first order geometrical invariants for the various color invariant sets. See tab. 4.1 for invariant class. ∂ ∂w H Cλ Cλλ W Wλ Wλλ Nλ Nλλ q Hx2 + Hy2 q = 2 1 2 (Eλλ Eλx − Eλ Eλλx )2 + (Eλλ Eλy − Eλ Eλλy )2 E +E λ λλ q 2 + C2 Cλw = Cλx λy q = E12 (Eλx E − Eλ Ex )2 + (Eλy E − Eλ Ey )2 q 2 2 Cλλw = Cλλx + Cλλy q 1 = E 2 (Eλλx E − Eλλ Ex )2 + (Eλλy E − Eλλ Ey )2 q Iw = Wx2 + Wy2 q 1 2 + E2 =E Ex y q 2 + W2 Wλw = Wλx λy q 1 2 + E2 =E Eλx λy q 2 2 Wλλw = Wλλx + Wλλy q 1 2 2 =E Eλλx + Eλλy q 2 + N2 Nλw = Nλx λy q 1 = E 2 (Eλx E − Eλ Ex )2 + (Eλy E − Eλ Ey )2 q 2 2 Nλλw = Nλλx + Nλλy √ = E13 A2 + B 2 Hw = 2 Ex where A = Eλλx E 2 − Eλλ Ex E − 2Eλx Eλ E + 2Eλ 2 Ey and B = Eλλy E 2 − Eλλ Ey E − 2Eλy Eλ E + 2Eλ energy distribution may be defined by the root squared sum of the edge strengths under a given imaging condition. A summary of total edge strength measures, ordered by degree of invariance, is given in tab. 4.3. For completeness, spatial second order derivatives in two dimensions are given in tab. 4.4 and tab. 4.5. The derivation of higher order invariants is straightforward. Usually many derivatives are involved here, raising some doubt on the sustainable computational accuracy of the result. 4.3 Measurement of Color Invariants Up to this point we have considered invariant expressions describing material properties under some general assumptions. They are derived from expressions exploring the infinitely dimensional Hilbert space of spectra at an infinitesimally small spatial neighborhood. As shown in Chapter 2, the spatio-spectral energy distribution is mea- 56 Chapter 4. Measurement of Color Invariants Table 4.3: Summary of the total edge strength measures for the various color invariant sets, ordered by degree of invariance. The edge strength Ew is not invariant to any change in imaging conditions. See tab. 4.1 for invariant class. E W C N H Ew = q 2 2 2 + E2 + E2 2 Ex λx λλx + Ey + Eλy + Eλλy q 2 + W2 2 2 2 Wx2 + Wλx λλx + Wy + Wλy + Wλλy q 2 2 2 2 Cw = Cλx + Cλλx + Cλy + Cλλy q 2 + N2 2 2 Nw = Nλx λλx + Nλy + Nλλy q Hw = Hx2 + Hy2 Ww = surable only at a certain spatial extend and a certain spectral bandwidth. Hence, physical measurements imply integration over spectral and spatial dimensions. In this section we exploit the Gaussian color model as presented in Chapter 2 to define measurable color invariants. 4.3.1 Measurement of Geometrical Color Invariants Measurement of the geometrical color invariants is obtained by substitution of the Gaussian basis, as derived from the RGB measurements (see Chapter 2), in the invariant expressions derived in section 4.2. Measured values for the geometrical color invariants given in tab. 4.2 and tab. 4.4 are obtained by substitution of E, E λ and Eλλ for the measured values Ê, Êλ and Êλλ at given scale σ~x . In this section, we demonstrate the color invariant properties for each of the assumed imaging conditions by applying the invariants for an example image. The invariants regarding a uniform object are not demonstrated separately, since the expressions are included in the invariants for colored illumination. Measurement of invariants for white illumination The invariant Ĥ is representative for hue or dominant color of the material, disregarding intensity and highlights. The pseudo invariant Ŝ (eq. 4.10) denotes the purity of the color, and therefore is sensitive to highlights since at these points color is desaturated. An example is shown in fig. 4.1. The invariant Ĥw represents the hue gradient magnitude, detecting color edges independent of intensity and highlights, as demonstrated in fig. 4.1. Common expressions for hue are known to be noise sensitive. In the scale frame, Gaussian regularization offers a trade-off between noise and detail sensitivity. The influence of noise on hue gradient magnitude Hw for various σ~x is shown in fig. 4.2. The influence of noise on the hue edge detection is drastically reduced for larger observational scale σ~x . ∂ ∂xx H Hxx = Cλ Cλxx = Cλλ W Wλ Wλλ Nλ Nλλ Cλλxx = Wxx = Wλxx = Wλλxx = Nλxx = Nλλxx = −2(Eλλ Eλx −Eλ Eλλx )(Eλ Eλx +Eλλ Eλλx ) ´2 ³ E 2 +E 2 λλ ´ λ ³ 2 2 E +Eλλ (Eλλ Eλxx −Eλ Eλλxx ) ´2 ³ + λ E 2 +E 2 λ λλ 2 2 Eλxx E −Eλ Exx E−2Eλx Ex E+2Eλ Ex E3 2 Eλλxx E 2 −Eλλ Exx E−2Eλλx Ex E+2Eλλ Ex E3 Exx E Eλxx E Eλλxx E 2 Eλxx E 2 −Eλ Exx E−2Eλx Ex E+2Eλ Ex E3 2 Eλλxx E−Eλλ Exx −2Eλxx Eλ −2Eλλx Ex −2Eλx E2 2 E2 2 2 2E E E+8Eλx Eλ Ex E+2Eλ Exx E−6Eλ x + λλ x E4 Hxy = Cλxy = Cλλxy = Wxy = Wλxy = Wλλxy = Nλxy = Nλλxy = ∂ ∂xy ´ ³ −2(Eλλ Eλx −Eλ Eλλx ) Eλ Eλy +Eλλ Eλλy ´2 ³ 2 2 E +E λλ ´ ´³ λ ³ 2 Eλλ Eλxy −Eλ Eλλxy −Eλy Eλλx +Eλx Eλλy E 2 +Eλλ ´2 ³ + λ 2 2 E +E λ λλ Eλxy E 2 +Eλx Ey E−Eλy Ex E−Eλ Exy E−2Eλx Ey E+2Eλ Ex Ey E3 Eλλxy E 2 +Eλλx Ey E−Eλλy Ex E−Eλλ Exy E−2Eλλx Ey E+2Eλλ Ex Ey E3 4.3. Measurement of Color Invariants Table 4.4: Summary of the spatial second order derivatives for the various color invariant sets. Exy E Eλxy E Eλλxy E Eλxy E 2 +Eλx Ey E−Eλy Ex E−Eλ Exy E−2Eλx Ey E+2Eλ Ex Ey E3 Eλλxy E−Eλλ Exy −2Eλxy Eλ −Eλλx Ey −Eλλy Ex −2Eλx Eλy E2 2E 2 2E Ex Ey E+4Eλx Eλ Ey E+4Eλy Eλ Ex E+2Eλ xy E−6Eλ Ex Ey + λλ E4 57 Chapter 4. Measurement of Color Invariants Table 4.5: Summary of the second order geometrical invariants for the various color invariant sets. H Hvv = Cλ Cλvv = Cλλ Cλλvv = ∂ ∂vv 2H 2 Hx yy −2Hx Hy Hxy +Hy Hxx 2 Hw 2 C 2 Cλx λyy −2Cλx Cλy Cλxy +Cλy Cλxx C2 λw 2 Cλλx Cλλyy −2Cλλx Cλλy Cλλxy C2 λλw 2 Cλλy Cλλxx Hvw = Cλvw = Cλλvw = + W Wλ Wλλ Ivv = Wλvv = Wλλvv = C2 λλw 2 2E Ex yy −2Ex Ey Exy +Ey Exx q 2 +E 2 E Ex y 2 E 2 Eλx λyy −2Eλx Eλy Eλxy +Eλy Eλxx r E E 2 +E 2 λx λy 2 Eλλx Eλλyy −2Eλλx Eλλy Eλλxy r E E2 +E 2 λλx λλy 2 Eλλy Eλλxx r Ivw = Wλvw = Wλλvw = + Nλ Nλλ Nλvv = Nλλvv = E E2 +E 2 λλx λλy 2 N 2 Nλx λyy −2Nλx Nλy Nλxy +Nλy Nλxx N2 λw 2 Nλλx Nλλyy −2Nλλx Nλλy Nλλxy N2 λλw 2 Nλλy Nλλxx 58 + N2 λλw Nλvw = Nλλvw = ∂ ∂vw ´ ³ 2 −H 2 H Hx Hy (Hxx −Hyy )− Hx xy y 2 Hw ´ ³ Cλx Cλy Cλxx −Cλyy C2 λw ´ ³ 2 −C 2 − Cλx λy Cλxy C2 ´ ³ λw Cλλx Cλλy Cλλxx −Cλλyy 2 C λλw ³ ´ C2 −C 2 Cλλxy − λλx 2λλy C λλw³ ´ 2 −E 2 E Ex Ey (Exx −Eyy )− Ex xy y q 2 +E 2 E Ex y´ ³ Eλx Eλy Eλxx −Eλyy r E E 2 +E 2 λx λy ´ ³ 2 Eλxy E 2 −Eλy r − λx 2 E E +E 2 λx λy ³ ´ Eλλx Eλλy Eλλxx −Eλλyy r E E2 +E 2 λλx λλy ´ ³ 2 2 Eλλxy Eλλx −Eλλy r − E E2 +E 2 λλx λλy ³ ´ Nλx Nλy Nλxx −Nλyy N2 λw ´ ³ Nλxy N 2 −N 2 − λx 2λy N λw ³ ´ Nλλx Nλλy Nλλxx −Nλλyy N2 λλw ³ ´ 2 2 Nλλx −Nλλy Nλλxy − 2 N λλw Hww = Cλww = Cλλww = ∂ ∂ww 2H 2 Hx xx +2Hx Hy Hxy +Hy Hyy 2 Hw 2 C 2 Cλx λxx +2Cλx Cλy Cλxy +Cλy Cλyy C2 λw 2 Cλλx Cλλxx +2Cλλx Cλλy Cλλxy C2 λλw 2 Cλλy Cλλyy + Iww = Wλww = Wλλww = C2 λλw 2 2E Ex xx +2Ex Ey Exy +Ey Eyy q 2 +E 2 E Ex y 2 E 2 Eλx λxx +2Eλx Eλy Eλxy +Eλy Eλyy r E E 2 +E 2 λx λy 2 Eλλx Eλλxx +2Eλλx Eλλy Eλλxy r E E2 +E 2 λλx λλy 2 Eλλy Eλλyy r + Nλww = Nλλww = E E2 +E 2 λλx λλy 2 N 2 Nλx λxx +2Nλx Nλy Nλxy +Nλy Nλyy N2 λw 2 Nλλx Nλλxx +2Nλλx Nλλy Nλλxy N2 λλw 2 Nλλy Nλλyy + N2 λλw 59 4.3. Measurement of Color Invariants (a) (b) (c) (d) Figure 4.1: Example of the invariants associated with Ĥ. The example image is shown in (a), invariant Ĥ in (b), the derived expression Ŝ (c), and gradient magnitude Ĥw (d). Intensity changes and highlights are suppressed in the Ĥ and Ĥw image. The Ŝ image shows a low purity at color borders, due to mixing of colors on two sides of the border. For all pictures, σ~x = 1 pixel and the image size is 256 × 256. (a) (b) (c) (d) Figure 4.2: The influence of white additive noise on gradient magnitude Ĥw . Independent Gaussian zero-mean noise is added to each of the RGB channels, SNR = 5 (a), and Ĥw is determined for σ~x = 1 (b), σ~x = 2 (c) and σ~x = 4 pixels (d), respectively. Note the noise robustness of the hue gradient Ĥw for larger σ~x . Measurement of invariants for white illumination and matte, dull surfaces The invariants Ĉλ and Ĉλλ represent normalized color, consequently their spatial derivatives measure the normalized color gradients. Ĉλw may be interpreted as the color gradient magnitude for transitions in first order spectral derivative, whereas Ĉλλw detects edges related to the second order spectral derivative. An example of the normalized colors and its gradients are shown in fig. 4.3. 60 Chapter 4. Measurement of Color Invariants (a) (b) (c) (d) Figure 4.3: Examples of the normalized colors Ĉλ denoting the first spectral derivative (a), Ĉλλ denoting the second spectral derivative (b), and their gradient magnitudes Ĉλw (c) and Ĉλλw (d), respectively. Note that intensity edges are being suppressed, whereas highlights are still present. (a) (b) (c) Figure 4.4: Examples of the gradient magnitudes Iˆw (a), Ŵλw (b) and Ŵλλw (c), respectively. Note all images show edges due to intensity differences and highlights. Iˆw shows purely intensity edges or shadow edges, while Ŵλw and Ŵλλw show color edges. Measurement of invariants for white and uniform illumination and matte, dull surfaces The invariant Iˆw denotes intensity or shadow edges, whereas the invariants Ŵλw and Ŵλλw represent color edges. Wλw may be interpreted as the gradient magnitude for the first spectral derivative. A similar interpretation holds for Ŵλλw , but here edges caused by the second spectral derivative are detected. An example of the gradients is shown in fig. 4.4. 61 4.3. Measurement of Color Invariants (a) (b) Figure 4.5: Examples of the gradient magnitudes N̂λw (a) and N̂λλw (b). Note that intensity edges are suppressed. Further, note that the assumptions underlying this invariant does not account for highlights and interreflections, as is seen in the figure. Measurement of invariants for colored illumination The invariant N̂λw and N̂λww may be interpreted as the reflectance function gradient magnitudes for spectral first and second order derivatives, respectively. Hence, material edges are detected independent of illumination intensity and illumination color. An example of the gradients N̂λw and N̂λλw is shown in fig. 4.5. In Chapter 3, illumination color invariance is investigated for the proposed edge strength, resulting in a significant reduction of chromatic variation due to illumination color. For a more elaborate discussion on the subject, see Chapter 3. Total color gradients The expressions for total gradient magnitude are given by Êw , Ŵw , Ĉw , N̂w , and Ĥw . The proposed edge strength measures may be ordered by degree of invariance, yielding Êw as measure of spectral edge strength, Ŵw as measure of color edge strength, disregarding intensity level, Ĉw as measure of chromatic edge strength, disregarding intensity distribution, N̂w as measure of chromatic edge strength, disregarding illumination, and Ĥw as measure of dominant wavelength, disregarding intensity and highlights. An example of the proposed measures is shown in fig. 4.6. 4.3.2 Discriminative Power for RGB Recording In order to investigate the discriminative power of the proposed invariants, edge detection between 1013 different colors of the PANTONE† color system is examined. The 1013 PANTONE colors‡ are recorded by a RGB-camera (Sony DXC-930P), un† PANTONE ‡ We is a trademark of Pantone, inc. use the PANTONE edition 1992-1993, Groupe BASF, Paris, France 62 Chapter 4. Measurement of Color Invariants (a) (b) (c) (d) Figure 4.6: Examples for the total color edge strength measures. a. Ŵw invariant for a constant gain or intensity factor; note that this image show intensity, color, and highlight boundaries. b. Ĉw and c. N̂w invariant for shading are shown. d. Ĥw invariant for shading and highlights. The effect of intensity and highlights on the different invariants are in accordance with tab. 4.1. der a 5200K daylight simulator (Little Light, Grigull, Jungingen, Germany). Purely achromatic patches are removed from the dataset, leaving 1000 colored patches. In this way, numerically unstable result for set Ĥ are avoided. Color edges are formed by combining each of the patches with all others, yielding 499,500 different edges. Edges are defined virtually by computing the left-hand part on one patch and the right-hand side of the filter on one of the other patches. The total edge strength measures for invariants Ê, Ŵ , Ĉ, N̂ , and Ĥ (tab. 4.3) are measured for each color combination at a scale of σx = {0.75, 1, 2, 3} pixels, hence evaluating the total performance of each set of invariants. Discrimination between colors is determined by evaluating the ratio of discriminatory contrast between patches to within patch noise, DN Rc (i, j) = max k q 1 N2 ĉij P (4.25) 2 x,y ĉk (x, y) where ĉ denotes one of the edge strength measures for Ê, Ŵ , Ĉ, N̂ , or Ĥ, respectively. Further, ĉij denotes the edge strength between patch i and j, and ĉk denotes the responses of the edge detector to noise within patch k. Hence, for detector ĉ, the denominator in expression (eq. 4.25) expresses the maximum response over the 1000 patches due to noise, whereas the numerator expresses the response due to the color edge. Two colors are defined to be discriminable when DN R ≥ 3, effectuating a conservative threshold. The results of the experiment are shown in tab. 4.6. For colors uniformly distributed in color space, and for the configuration used and spatial scale σ x = 0.75, about 970 colors can be distinguished from one another (Ê). For invariant W , per- 63 4.3. Measurement of Color Invariants Table 4.6: For each invariant, the number of colors is given which can be discriminated from one another in the PANTONE color system (1000 colors). The number refers to the amount of colors still to be distinguished with the conservative criterion DN R > 3 given the hardware and spatial scale σx . For σx ≥ 2, Ê and Ŵ discriminate between all patches, hence the results are saturated. σx = 0.75 σx = 1 σx = 2 σx = 3 Ê 970 983 1000 1000 Ŵ 944 978 1000 1000 Ĉ 702 820 949 970 N̂ 631 757 962 974 Ĥ 436 461 452 462 formance reduces to 950 colors. A further decrease is for Ĉ and N̂ , which distinct between approximately 700 and 630 colors, respectively. Lowest discriminative power is achieved by invariant set Ĥ, which discriminates approximately 440 colors. When the spatial scale σx increases, discrimination improves. A larger spatial scale yields better reduction of noise, hence a more accurate estimate of the true color is obtained. The results shown for σx ≥ 2 are saturated for Ê and Ŵ . Hence, a larger set of colors can be discriminated than shown here. Note that for σx ≥ 2 the performance for Ĉ is comparable to the performance of N̂ , again indicating saturation. Note also that the power of discrimination expressed as the amount of discriminable colors is inversely proportional to the degree of invariance. These are very encouraging results given a standard RGB-camera and not a spectrophotometer. To discriminate 450 to 950 colors while maintaining invariance on just two patches in the image is helpful for many practical image retrieval problems. 4.3.3 Evaluation of Scene Geometry Invariance In this section, illumination and viewing direction invariance is evaluated by experiments on a collection of real-world surfaces. Colored patches from the CUReT § database are selected [3]. The database consists of planar patches of common materials, captured under various illumination and viewing directions. Hence, recordings of 27 colored material patches, each captured under 205 different illumination and viewing directions are considered. Color edges are formed by combining the patch for each imaging conditions with the others, yielding 205 × (205 − 1)/2 = 20, 910 different edges per material. Edges are defined virtually by computing the left-hand part on one patch and the right-hand side of the filter on one of the other patches. The total edge strength measures for invariants Ê, Ŵ , Ĉ, N̂ , and Ĥ (tab. 4.3) are § http://www.cs.columbia.edu/CAVE/curet/ 64 Chapter 4. Measurement of Color Invariants obtained for each material at scale σx = 3 pixels. The root squared sum over the measured edge strengths indicates sensitivity to the scene geometry for the material and edge strength measure under consideration. For the spectral edge strength E, edge strength was normalized to the average intensity over all viewing conditions. In this way, comparison between the various edge strengths measured was possible. The results are shown in tab. 4.7. A high value of Ê indicates influence of scene geometry on material surface reflectance. By construction of the database, which contains planar patches, the value for Ŵ approximates the value for Ê. Exceptions are surfaces with rough texture, exhibiting shadow edges larger than the measurement scale (straw, cracker a). Further, the selected center point in the 205 recordings does not correspond to one identical point on the material patches, causing an error for non uniformly colored patches (orange peel, peacock feather), or patches exhibiting intensity variations (rabbit fur, brick b, moss). The measured error for Ĉ and N̂ is approximately similar. White light is used for the recordings, hence both Ĉ and N̂ reduce the measurement variation due to changes in intensity. Exceptions are scattering materials with fine texture relative to the measurement scale σx = 3 pixels (velvet, rug b). Hence, causing highlights to influence the measured surface reflectance. Overall, for Ĉ and N̂ , variation in edge strength due to illumination and viewing direction is reduced drastically. Even for these non-Lambertian real-world surfaces, invariant sets N̂ and Ĉ are highly robust against changes in scene geometry. For Ĥ, results are influenced by numerical stability. Highly saturated materials (velvet, artificial 2 >> 0. grass, lettuce leaf, soleirolia plant, moss) result in a small error since Êλ2 + Êλλ Exception is again the non-uniform colored orange peel. Note that the error due to highlights for velvet is much smaller than as measured for Ĉ, N̂ . For materials with lower saturation, errors become larger. Overall, influence of illumination and viewing direction is slightly reduced for Ĥ. In conclusion, the table demonstrates the expected error for real, commonly nonLambertian surfaces. These results demonstrate the usefulness of the various invariant sets for material classification and recognition, based on surface reflectance properties. 4.3.4 Localization Accuracy for the Geometrical Color Invariants Rotational and translational invariance is to be evaluated yet. The independency of the derived expressions on coordinate system is mathematically shown in [4]. It is demonstrated by the examples shown in fig. 4.6. The measurement problem related to rotation and translation invariance is the accuracy of edge localization between different colors. In order the investigate localization accuracy of the proposed invariants, edge location is evaluated between 1000 different colors of the PANTONE system. The uncoated patches as described in the preceding section (section 4.3.2) are used to form 499,500 different color edges. The total edge strength measures for invariants 65 4.3. Measurement of Color Invariants Table 4.7: Results for scene geometry invariance evaluation on the CUReT dataset [3]. The root squared sum in measured total edge strength over 205 recordings under different viewing and illumination direction is given for each of the materials. A high value for Ê indicates a large influence of scene geometry on surface reflectance for the considered material. A low value for the variation of invariants Ŵ , Ĉ, N̂ or Ĥ relative to Ê indicate robustness against scene geometry for the invariant under consideration. The table offers an indication for the error to expect in estimating surface reflectance for real materials. material Velvet Pebbles Artificial Grass Roof Shingle Cork Rug b Sponge Lambswool Lettuce Leaf Rabbit Fur Roof Shingle (zoomed) Human Skin Straw Brick b Corduroy Linen Brown Bread Corn Husk Soleirolia Plant Wood a Orange Peel Wood b Peacock Feather Tree Bark Cracker a Cracker b Moss recordings 205 205 205 205 205 205 205 205 205 205 205 205 205 205 205 205 205 205 205 205 205 205 205 205 205 205 205 E 102.7 25.0 29.1 27.4 34.5 28.2 29.0 29.6 41.3 26.0 34.2 39.7 33.7 77.6 27.3 39.1 29.7 33.3 39.4 34.8 73.3 34.3 61.3 37.0 28.7 25.3 46.9 W/E 1.01 1.07 0.97 1.04 0.93 1.01 1.09 1.09 1.02 1.15 0.98 0.99 1.13 0.55 1.14 0.90 1.06 1.05 0.98 0.94 0.73 0.97 0.66 0.97 1.22 1.01 0.84 C/E 0.86 0.15 0.22 0.30 0.20 0.45 0.21 0.18 0.13 0.19 0.19 0.10 0.12 0.06 0.07 0.11 0.19 0.10 0.15 0.19 0.30 0.13 0.16 0.16 0.19 0.30 0.13 N/E 1.45 0.15 0.23 0.30 0.19 0.38 0.21 0.17 0.15 0.18 0.19 0.09 0.12 0.06 0.07 0.11 0.19 0.11 0.17 0.18 0.28 0.13 0.16 0.16 0.19 0.27 0.15 H/E 0.03 0.59 0.28 2.00 0.59 0.58 0.62 0.53 0.26 0.68 0.97 0.47 0.38 0.30 0.47 0.62 0.54 0.41 0.24 0.58 0.36 0.36 0.70 0.48 0.71 0.75 0.20 Ê, Ŵ , Ĉ, N̂ , and Ĥ (tab. 4.3) are measured for each color combination at a scale of σx = {1, 2, 4} pixels, hence evaluating the total performance of each set of invariants. For color pairs that can be distinguished (see section 4.3.2), the edge position between different patches is determined by tracing the maximum response along the edge in the resulting edge strength image. The average deviation between the measured edge location and real edge location is considered to be a good measure for the localization accuracy. The root mean squared error in edge location is determined over all color pairs for each of the total edge strength measures. The results of the experiment are shown in tab. 4.8. For the invariants Ê, Ŵ , Ĉ, 66 Chapter 4. Measurement of Color Invariants Table 4.8: Results for the edge localization experiment relative to pixel size. For each invariant, the root mean squared error in measured edge position over the color pairs from the PANTONE system is given. σx = 1 σx = 2 Ê 0.22 0.28 σx = 4 0.36 Ŵ 0.51 1.03 1.82 2.44 Ĉ 0.66 1.49 N̂ 0.65 1.45 2.38 Ĥ 2.46 1.63 0.70 and N̂ , localization accuracy degrades for higher spatial scale σx . This is a well known property for Gaussian smoothing in the intensity domain. The invariants all result in a larger localization error than Ê, due to severe reduction in edge contrast. The localization error for Ĉ is almost identical to the error for N̂ , as expected. Note that the localization error remains within spatial scale σx . The results for invariants Ĉ and N̂ are almost identical, as expected. For the invariant Ĥ, edge strength is normalized by the squared sum of the spectral derivatives (eq. 4.9). Hence, localization accuracy improves for higher spatial scale due to a better estimation of local chromaticity. In conclusion, edge localization accuracy is slightly reduced for the invariant sets in comparison to Ê. However, precision remains within the spatial differential scale σx . The results show Ĥ to be noise sensitive for small spatial scale σx < 2. 4.4 Conclusion We have derived geometrical color invariant expressions describing material properties under three independent assumptions regarding the imaging conditions, a. white or colored illumination, b. matte, dull object or general object, or c. uniformly stained object or generally colored object. The reflectance model under which the invariants remain valid is useful for a wide range of materials [10]. Experiments on an example image showed the invariant set C and N to be successful in disregarding shadow edges, whereas the set H is shown to be successful in discounting both shadow edges and highlights. In Chapter 3 the degree of illumination color invariance for set N̂ is investigated. We showed the discriminative power of the invariants to be orderable by broadness of invariance. Highest discriminative power is obtained by set Ŵ (950 colors out of 1000) which has the tightest set of disturbing conditions, namely overall illumination intensity or camera gain. Discrimination degraded for set Ĉ (700 colors), which is invariant for shading effects. Set N̂ invariant for shading and illumination color discriminates between 630 color, whereas set Ĥ, invariant for shadows and highlights, Bibliography 67 has lowest discriminative power (440 colors). Discriminating power is increased when considering a larger spatial scale σx , thereby taking a larger neighborhood into account for determining the color value. Hence, a larger spatial scale results in a more accurate estimate of color at the point of interest, increasing the accuracy of the result. The aim of the chapter is reached in that high color discrimination resolution is achieved while maintaining constancy against disturbing imaging conditions, both theoretically as well as experimentally. We have restricted ourselves in several ways. We have derived expressions up to the second spatial order, and investigated their performance only for the spatial gradient. The derivation of higher order derivatives is straightforward, and may aim in corner detection [21]. Usually many derivatives are involved here, raising some doubt on the sustainable accuracy of the result. Consequently, a larger spatial scale may be necessary to increase the accuracy of measurements involving higher order derivatives. Further, we have only considered spectral derivatives up to second order, yielding compatibility with human color vision. For a spectrophotometer, measurements can be obtained at different positions λ0 , for different scales σλ , and for higher spectral differential order, thereby exploiting the generality of the Gaussian color model. We provided different classes of color invariants, under general assumptions regarding the imaging conditions. We have shown how to reliably measure color invariants from RGB images by using the Gaussian color model. The Gaussian color model extents the differential geometry approaches from grey-value images to multi-spectral differential geometry. Further, we experimentally proved the color invariants to be successful in discounting shadows and highlights, resulting in accurate measurements of surface reflectance properties. The presented framework for color measurement is well-defined on a physical basis, hence it is theoretically better founded as well as experimentally better evaluated than existing methods for the measurement of color features in RGB-images. Bibliography [1] E. Angelopoulou, S. Lee, and R. Bajcsy. Spectral gradient: A material descriptor invariant to geometry and incident illumination. In Proceedings of the Seventh IEEE International Conference on Computer Vision, pages 861–867. IEEE Computer Society, 1999. [2] A. Cumani. Edge detection in multispectral images. CVGIP: Graphical Models and Image Processing, 53(1):40–51, 1991. [3] K. J. Dana, B. van Ginneken, S. K. Nayar, and J. J. Koenderink. Reflectance and texture of real world surfaces. ACM Trans Graphics, 18:1–34, 1999. 68 Chapter 4. Measurement of Color Invariants [4] L. M. J. Florack, B. M. ter Haar Romeny, J. J. Koenderink, and M. A. Viergever. Scale and the differential structure of images. Image and Vision Computing, 10(6):376–388, 1992. [5] L. M. J. Florack, B. M. ter Haar Romeny, J. J. Koenderink, and M. A. Viergever. Cartesian differential invariants in scale-space. Journal of Mathematical Imaging and Vision, 3(4):327–348, 1993. [6] R. Gershon, D. Jepson, and J. K. Tsotsos. Ambient illumination and the determination of material changes. J. Opt. Soc. Am. A, 3:1700–1707, 1986. [7] T. Gevers, S. Ghebreab, and A. W. M. Smeulders. Color invariant snakes. In P. H. Lewis and M. S. Nixon, editors, Proceedings of the Ninth British Machine Vision Conference, pages 659–670. University of Southhampton, 1998. [8] T. Gevers and A. W. M. Smeulders. Color based object recognition. Pat. Rec., 32:453–464, 1999. [9] T. Gevers and H. Stokman. Reflectance based edge classification. In Proceedings of Vision Interface, pages 25–32. Canadian Image Processing and Pattern Recognition Society, 1999. [10] D. B. Judd and G. Wyszecki. Color in Business, Science, and Industry. Wiley, New York, NY, 1975. [11] J. J. Koenderink. The structure of images. Biol. Cybern., 50:363–370, 1984. [12] J. J. Koenderink and A. Kappers. Color Space. Utrecht University, The Netherlands, 1998. [13] J. J. Koenderink and A. J. van Doorn. Receptive field families. Biol. Cybern., 63:291–297, 1990. [14] P. Kubelka and F. Munk. Ein beitrag zur optik der farbanstriche. Z. Techn. Physik, 12:593, 1931. [15] T. Lindeberg. Scale-Space Theory in Computer Vision. Kluwer Academic Publishers, Boston, 1994. [16] P. Olver, G. Sapiro, and A Tannenbaum. Differential invariant signatures and flows in computer vision: A symmetry group approach. In B. M. ter Haar Romeny, editor, Geometry-Driven Diffusion in Computer Vision. Kluwer Academic Publishers, Boston, 1994. [17] M. Pluta. Advanced Light Microscopy, volume 1. Elsevier, Amsterdam, 1988. Bibliography 69 [18] G. Sapiro and D. L. Ringach. Anisotropic diffusion of multivalued images with applications to color filtering. IEEE Trans. Image Processing, 5(11):1582–1586, 1996. [19] S. A. Shafer. Using color to separate reflection components. Color Res. Appl., 10(4):210–218, 1985. [20] H. Stokman and T. Gevers. Detection and classification of hyper-spectral edges. In Proceedings of the Tenth British Machine Vision Conference, pages 643–651. CRI Repro Systems Ltd., 1999. [21] B. M. ter Haar Romeny, editor. Geometry-Driven Diffusion in Computer Vision. Kluwer Academic Publishers, Boston, 1994. [22] G. Wyszecki and W. S. Stiles. Color Science: Concepts and Methods, Quantitative Data and Formulae. Wiley, New York, NY, 1982. [23] S. Di Zenzo. A note on the gradient of a multi-image. Comput. Vision Graphics Image Processing, 33:116–125, 1986. Part II Geometrical Structure Chapter 5 Robust Autofocusing in Microscopy appeared in Cytometry, vol. 39, pp. 1–9, 2000. Europian patent application filed under no. 99201795.4 on June 4, 1999. “The way the Nutri-Matic machine functioned was very interesting. When the Drink button was pressed it made an instant but highly detailed examination of the subject’s taste buds, a spectroscopic analysis of the subject’s metabolism and then sent tiny experimental signals down the neural pathways to the taste centres of the subject’s brain to see what was likely to go down well. However, it invariably produced a plastic cup filled with a liquid which was almost, but not quite, entirely unlike tea.” in The Hitch Hikers Guide to the Galaxy, by Douglas Adams. Along with the introduction of high throughput screenings, quantitative microscopy is gaining importance in pharmaceutical research. Fully automatic acquisition of microscope images in an unattended operation coupled to an automatic image analysis system allows for the investigation of morphological changes. Time lapse experiments reveal the effect of drug compounds on the dynamics of living cells. Histochemical assessment of fixed tissue sections is used to quantify pathological modification. A critical step in automatic screening is focusing. Fast and reliable autofocus methods for the acquisition of microscope images are indispensable for routine use on a large scale. Autofocus algorithms should be generally applicable on a large variety of microscopic modes and on a large variety of preparation techniques and specimen types. Although autofocusing is a long standing topic in literature [4, 5, 7, 8, 9, 10, 11, 19], no such generally applicable solution is available. Methods are often designed for one kind of imaging mode. They have been tested under well-defined circumstances. The assumptions made for determining the focal plane in fluorescence microscopy 73 74 Chapter 5. Robust Autofocusing in Microscopy are not compatible with the same in phase contrast microscopy, and this holds true throughout. We consider the design of a method which is generally applicable in light microscopy. From Fourier optics [13] it has been deduced that well-focused images contain more detail than images out of focus. A focus score is used to measure the amount of detail. The focus curve can be estimated from sampling the focus score for different levels of focus. Some examples of focus curves are shown in fig. 5.2. Best focus is found by searching for the optimum in the focus curve. In a classical approach the value of the focus score is estimated for a few focus positions [8, 19, 2]. Evaluation of the scores indicates where on the focus curve to take the next sample. Repeating the process iteratively should ensure convergence to the focal plane. A major drawback is that such optimization procedure presupposes a. a uni-modal focus function, and b. a broad-tailed extremum to obtain a wide focus range, which do not hold true in general. In reality, the focus curve depends on the microscope setup, imaging mode and preparation characteristics [16]. When the assumed shape of the focus curve does not match the real focus curve, or when local extrema emerge, convergence to the focal plane is not guaranteed. Groen et al. [5] specifies criteria for the design of autofocus procedures. We adopt these criteria of good focusing: a. accuracy, b. reproducibility, c. general applicability, d. insensitivity to other parameters. Under insensitivity to other parameters is considered robustness against noise and optical artifacts common to microscopic image acquisition. Further, we reject the criteria of unimodality of the focus curve, which can not be achieved in practise [16, 15]. As a consequence, the range or broadness of the extremum in the focus curve is of less relevance. In this report, an autofocus method is presented which is generally applicable in different microscopic modes. The aim was to develop a method especially suited for an unattended operational environment, such as high throughput screenings. Therefore, the method should be robust against confounding factors common in microscopy, as noise, optical artifacts and dust on the preparation surface. To evaluate the performance of the autofocus method, experiments have been conducted in screening applications. 5.1 5.1.1 Material and Methods The Focus Score From Fourier optics, measurement of the focus score can best be based on the energy content of a linearly filtered image [5, 13]. From [5, 8, 16] it can be deduced that an optimal focus score is output by the gradient filter. Scale-space theory [20] leads to the use of the first order Gaussian derivative to measure the focus score. The σ of the 75 5.1. Material and Methods Gauss filter determines the scale of prominent features. The focus function becomes 1 X 2 2 [f (x, y) ∗ Gx (x, y, σ)] + [f (x, y) ∗ Gy (x, y, σ)] N M x,y 1 X 2 f + fy2 = N M x,y x F (σ) = (5.1) where f (x, y) is the image grey value, Gx (x, y, σ) and Gy (x, y, σ) are the first order Gaussian derivatives in the x- and y-direction at scale σ, N M is the total amount of pixels in the image, and fx , fy are the image derivatives at scale σ in the x- and y-direction, respectively. Often, a trade-off between noise sensitivity and detail sensitivity can be observed for a specific microscope set-up. For example, in fluorescence microscopy the signal to noise ratio (SNR) is often low, and relatively smooth images are examined. For phase contrast microscopy, SNR is high, and small details (the phase transitions) have to be detected. Accuracy of autofocusing depends on the signal to noise ratio as propagated through the focus score filter [19]. Therefore, the σ of the Gaussian filter should be chosen such that noise is maximally suppressed, while the response to details of interest in the image is preserved. For bar-like structures, the value of σ should conform to [17] d σ≈ √ 2 3 (5.2) where the thickness of the bar is given by d. Assuming that the smallest detail to be focused may be considered bar shaped, (eq. 5.2) gives an indication for the minimal value of σ. Note that the filter response degrades for smaller values, whereas a very large value smooths all details to noise level. 5.1.2 Measurement of the Focus Curve Consider a system consisting of the following hardware: 1) a microscope with scanning stage and position controller for both axial and lateral direction, 2) a camera placed on the microscope recording its field of view, 3) a video digitizer connected to a computer system, writing at video rate the camera output into the computers memory. The computer system is able to send positioning commands to the stage controller. Examples of such systems will be given later. The focal plane of the microscope is assumed to be within a pre-defined interval ∆z around the start z-position z. The scanning stage is moved down to the position zmin = z− 21 ∆z. Backlash correction is applied by sending the stage further down than necessary, and raising it again to the given position [10]. In this way, focus positions are always reached from the same direction. As a result, mechanical tolerance in cog-wheels is eliminated. 76 Chapter 5. Robust Autofocusing in Microscopy At t = 0 ms, the stage controller starts raising the stage to traverse the complete focus interval ∆z. During the stage movement through focus, successive images of the preparation are captured at 40 ms intervals (video rate). The focus score of each captured image is calculated. The image buffer is re-used for the next video frame, necessitating only two image memory buffers to be active at any time. One of the buffers is used for focus score calculation of the previously captured image, while the other is used for capturing the next image. Calculation of the focus score should thus be performed within one video frame time. As soon as the stage has reached the end of the focus interval, timing is stopped at t = td ms. An estimation of the focus curve is obtained for the complete focus interval. The global optimum in the estimate for the focus curve represents the focal plane. Now, each z-position is related to the time at which the corresponding image has been captured. When linear movement of the stage is assumed, the position at which the image at time ti is taken corresponds to zi = ti ∆z + zmin td (5.3) where td represents the travel duration, ∆z is the focus interval, and zmin is the start position (position at t = 0 ms). Since the focus curve is parabolic around the focal plane [10, 11, 19], high focus precision can be achieved by quadratic interpolation. When assuming linear stage movement, or z = vt + zmin , the focus curve around the focal plane can be approximated by s(t) = c + bt + at2 (5.4) The exact focus position is obtained by fitting a parabola through the detected optimum and its neighboring measurements. Consider the detected optimum s(t o ) = so at time t = to . The time axis may be redefined such that the detected optimum is at time t = 0. Then, neighboring scores are given by (sn , tn ) and (sp , tp ), respectively. Solving for a, b and c gives c = so , b = −so t2n + sp t2n + so t2p − sn t2p s o tn − s p tn − s o tp + s n tp , a= t2n tp − tn t2p t2n tp − tn t2p (5.5) The peak of the parabola, and thus the elapsed time to the focus position, is given by tf = − so t2n − sp t2n − so t2p + sn t2p b + to = + to 2a 2 (so tn − sp tn − so tp + sn tp ) (5.6) The focal plane is at position zf = tf ∆z + zmin td to which is moved, taking the backlash correction into account. (5.7) 77 5.1. Material and Methods 5.1.3 Sampling the Focus Curve The depth of field of an optical system is defined as the axial distance from the focal plane over which details still can be observed with satisfactory sharpness. The thickness of the slice which can be considered in focus is then given by [14, 26] zd = λ ¶ q ¡ ¢2 2n 1 − 1 − NA n µ (5.8) where n is the refractive index of the medium, λ the wavelength of the used light, and N A the numerical aperture of the objective. The focus curve is sampled at Nyquist rate when measured at zd intervals [18]. The parabolic fitting ensures that focus position is centered within thick specimens, i.e. specimens much larger than z d . Common video hardware captures frames at fixed rate. Thus the sampling density of the focus curve can only be influenced by adjusting the stage velocity to travel z d µm per video frame time. In order to calculate the focus score within video frame time for current sensors and computer systems, simplification of the focus function (eq. 5.1) is considered. For biological preparations, details are distributed isotropically over the image. The response of the filter in one direction is adequate for determination of the focal plane. Further computation time can be saved by estimating the filter response from a fraction of the scan lines in the image. Then, the focus function is given by F (σ) = L X 2 [f (x, y) ∗ Gx (x, y, σ)] . N M x,y (5.9) For our purpose, each sixth row (L = 6) is applied. A recursive implementation of the Gaussian derivative filter is used [22], for which the computation time is independent of the value of σ. The calculation time is kept under 40 ms for all computer systems we used in the experiments, even when the system is running other tasks simultaneously. Comparison between the focus curve calculated in two dimensions for the whole image (eq. 5.1), and the response of (eq. 5.9) reveals only marginal differences for all experiments. 5.1.4 Large, Flat Preparations For the acquisition of multiple aligned images from large, flat preparations, the variation in focus position is assumed small but noticeable at high magnification. Proper acquisition of adjacent images can be obtained by focusing a few fields. Within the preparation, the procedure starts by focusing the first field. Fields surrounding the focused field are captured, until the next field to capture is a given distance away from the initially focused field. Deviation from best focus is now corrected for by 78 Chapter 5. Robust Autofocusing in Microscopy focusing over a small interval. The preparation is scanned, keeping track of focus position at fields further away than a given distance from the nearest of all the previously focused fields. The threshold distance for which focusing is skipped depends on the preparation flatness and magnification, and has to be empirically optimized for efficiency. Fields that have been skipped for focusing are positioned at the focus level of the nearest focused field. Small variations in focus position while scanning the preparation are corrected during acquisition. 5.1.5 Preparation and Image Acquisition The autofocus algorithm is intensively tested in the following applications: a. quantitative neuronal morphology, b. time-lapse experiments of cardiac myocyte dedifferentiation, c. immunohistochemical label detection in fixed tissue d. C. Elegans GFP-VM screening, e. acquisition of smooth muscle cells, and f. immunocytochemical label detection in fixed cells. Each of these applications is described below. The software package SCIL Image version 1.4 [21] (TNO-TPD, Delft, The Netherlands) is used for image processing, extended with the autofocus algorithm and functions for automatic stage control and image capturing. All preparations are observed on Zeiss invert microscopes (Carl Zeiss, Oberkochen, Germany), except for the immunohistochemical label detection, which is observed with an Zeiss Axioskop. The wavelength of the used light is 530 nm, unless stated different. For automatic position control, the microscopes are equipped with a scanning stage and MAC4000 or (comparable) MC2000 controller (Märzhäuser, Wetzlar, Germany). At power on, the stage is calibrated and an initial focus level is indicated manually. Backlash correction is empirically determined. For each application, the focus interval ∆z is determined by evaluating the variability in the z-position between focus events. Quantitative Neuronal Morphology in Bright-field Mode Morphological changes of neurons are automatically quantified as described in [12]. Briefly, PC12 cells were plated in poly-L-lysine (Sigma, St. Louis, MO) coated 12-well plates. In each well 5 × 104 cells were seeded. After 24 hours the cells were fixed with 1% glutaraldehyde for 10 minutes. Then the cells were washed twice with distilled water. The plates were dried in an incubator. The plates are examined in bright-field illumination mode, for details see tab. 5.1. The camera used is an MX5 (Adaptec, Eindhoven, The Netherlands) 780 × 576 video frame transfer CCD with pixel size 8.2 × 16.07µm2 , operating at room temperature with auto gain turned off. Adjacent images are captured by an Indy R4600 132MHz workstation (Silicon Graphics, Mountain View, CA), resulting in an 8 × 8 mosaic image for each well. Prior to the acquisition of the well, autofocusing at the center of the scan area is performed. The smallest details to focus are the neurites, which are about 3 pixels thick, yielding σ = 1.0 (eq. 5.2). The wave length of the illumination is 5.1. Material and Methods 79 about 530 nm, resulting in 23.4 µm depth of field (eq. 5.8). The effective stage velocity is somewhat different due to rounding off to controller built-in speeds. Due to the low magnification, backlash correction is not necessary. Cardiac Myocyte Dedifferentiation in Phase Contrast Mode Cardiac myocytes were isolated from adult rats (ca. 250 gram) heart by collagenase perfusion as described in [3]. The cell suspension containing cardiomyocytes and fibroblasts was seeded on laminin coated plastic petri dishes, supplied with M199 and incubated for one hour. Thereafter, unattached and/or dead cells were washed away by rinsing once with M199. The petri dishes were filled with M199 +20% fetal bovine serum and incubated at 37◦ C. The petri dishes are examined in phase contrast mode, for details see tab. 5.1. During the experiment, ambient temperature is maintained at 37◦ C. Time-lapse recordings (15 hours) are made in 6 manually selected fields, one in each of the 6 petri dishes. The scanning stage visits the selected fields at 120 second intervals. Fields are captured using a CCD camera (TM-765E, Pulnix, Alzenau, Germany). They are added to JPEG compressed digital movies (Indy workstation with Cosmo compressor card, SGI, Mountain View, CA), one for each selected field. Autofocusing is applied once per cycle, successively refocusing all the fields in 6 cycles. The smallest details to focus are the cell borders. Immunohistochemical Label Detection in Bright-field Mode Sections of the amygdala of mice injected with a toxic compound were cut at 15 µm thickness through the injection site. They were subsequently immunostained for the presence of the antigen, using a polyclonal antibody (44-136, Quality Control Biochemicals Inc., Hopkinton, MA) and visualized using the chromogen DAB. Four microscope slides (40 brain slices) at once are mounted on the scanning stage and observed in bright-field illumination mode, see tab. 5.1. Adjacent images are captured (Meteor/RGB frame-grabber, Matrox, Donval, Quebec, Canada in an Optiplex GXi PC with Pentium 200MHz MMX, Dell, Round Rock, TX) by use of an MX5 CCD camera (Adaptec, Eindhoven, The Netherlands). As a result, mosaics of complete brain slices are stored on disk. Prior to acquisition, autofocusing at approximately the center of the brain slice is performed, the smallest details to focus being tissue structures. Due to the low magnification, backlash correction is not necessary. C. Elegans GFP-VM Screening in Fluorescence Mode Individual C. Elegans worms transgenic for GFP expressing vulval muscles (GFPVM) were selected from stock, and one young adult hermafrodite (P0 ) was placed in 80 Chapter 5. Robust Autofocusing in Microscopy each of the 60 center wells of a 96-well plate (Costar, Acton, MA) filled with natural growth medium, and incubated for five days at 25◦ C to allow F1 progeny to reach adult stage. Before image acquisition, fluorescent beads (F-8839, Molecular Probes, Eugene, OR) are added to the wells as background markers for the focus algorithm. The well plate is examined in fluorescence mode, see tab. 5.1. A FITC filter (B, Carl Zeis, Oberkochen, Germany) in combination with a 100W Xenophot lamp is used to excite the GFP. Images are captured (O2 R5000 180MHZ workstation, Silicon Graphics, Mountain View, CA) using an intensified CCD camera (IC-200, PTI, Monmouth Junction, NJ). Each of the selected wells is scanned and the adjacent images, completely covering the well, are stored on disk. Variability in the z-position between the center of the wells turned out to be within 250 µm, which is taken as focus interval for initial focusing. After autofocusing on the well center, deviation from best focus while scanning the well is corrected over one-fifth of the initial focus interval. Focusing of all fields further than 3 fields away from a focused field was sufficient to keep track of the focal plane. The diameter of the fluorescent spheres is 15 µm (30 pixels), which is much larger than zd . Since the spheres are homogeneously stained, the smallest detail to consider in the z-direction is a cylindrically shaped slice through the spheres, where the cylinder height is determined by the horizontal resolution. Therefore, stage velocity is reduced to approximately one third of the sphere diameter during focusing. Acquisition of Smooth Muscle Cells in Phase Contrast Mode Smooth muscle cells were enzymatically isolated from the circular muscle layer of guinea-pig ileum by a procedure adapted from [1]. Dispersed cells were suspended in a HEPES buffered saline containing 1 mM CaCl2 . Aliquots (200 µl) of the cell suspension were distributed over test tubes and maintained at 37◦ C for 30 minutes. Then, 800 µl of medium containing the compound to be tested was added and cells were incubated for 30 seconds. The reaction was stopped by addition of 1% glutaraldehyde. A drop of each cell suspension is brought on a microscopic glass slide, and observed in phase contrast mode (see tab. 5.1). A region containing sufficient cells is selected manually and adjacent images are captured (Indy R4600 132MHz workstation, Silicon Graphics, Mountain View, CA) using an MX5 CCD camera (Adaptec, Eindhoven, The Netherlands). Autofocusing is performed at approximately the center of the selected area, the smallest details being the elongated cells. Immunocytochemical Label Detection in Fluorescence Mode Human fibroblasts were seeded in a 96-well plate (Costar, Acton, MA) at 7000 cells per well, in 2% FBS/Optimem. Cells were immunostained according to [6] with primary antibody rabbit anti human NF-κ B (p65) (Santa Cruz Biotechnology, Santa Cruz, CA) and secondary Cy3 labeled sheep anti rabbit (Jackson, Uvert-Grove, PA). 81 5.1. Material and Methods Table 5.1: Summary of the experimental setup and parameter settings for the various experiments. The value for sigma (eq. 5.2) is given together with the smallest structure (d) in pixels. The focus interval ∆z and depth of field zd are given in [ µm]. The effective velocity used during focusing is given by veff in [ µm / 40 ms]. application Quant neuronal morph Cardiac myocyte dediff Immunohist label det C. Elegans screening Acq smooth muscle Immunocyt label det mode bright phase bright fluoresc phase fluoresc obj 5× 32× 2.5× 40× 10× 40× (NA) (0.15) (0.4) (0.075) (0.6) (0.3) (0.6) σ 1.0 1.0 1.0 8.5 1.0 8.5 (d) (3) (4) (3) (30) (4) (30) ∆z 500 100 1,000 50 500 250 zd 23.4 3.2 94 1.33 5.75 1.13/1.50 veff 24.7 2.5 98.7 4.94 4.94 4.94 Further, nuclear counter staining with Hoechst 33342 (Molecular Probes, Eugene, OR) was applied. Well plates are examined in fluorescence mode, see tab. 5.1. A DAPI-FITC-TRITC filter (XF66, Omega Optical, Brattleboro, VT) in combination with a 100W Xenophot lamp is used to excite the cells (emission nuclei at 450 nm, immuno signal at 600 nm). Adjacent images are captured (O2 workstation R5000 180MHZ, Silicon Graphics, Mountain View, CA) using an intensified CCD camera (IC-200, PTI, Monmouth Junction, NJ). Autofocusing is performed at approximately the center of the scan area, the smallest details being the nuclei. Cell thickness is about 5–15 µm, much larger than zd . Therefore, during focusing, stage velocity is reduced to approximate the cell thickness. 5.1.6 Evaluation of Performance for High NA The autofocus algorithm performance is objectively evaluated by comparing focus random error with observers. For this purpose, 2 µm epon sections of dog left ventricle cardiac myocytes stained with periodic acid schift and toluidin blue are observed with a Zeiss Axioplan. A high NA objective 40× NA 1.4 oil immersion is used, for which the depth of field is zd = 0.36 µm (eq. 5.8). Autofocusing is considered not trivial under these circumstances. Unfocused, arbitrarily selected fields (20 in total) are visited and manually focused by two independent experienced observers. Focus positions are recorded for both observers. Similarly, the found focus positions for the autofocus algorithm is recorded (σ = 1.0, backlash correction 15 µm, ∆z = 25 µm). Comparison of the random error between observers and for observer vs. autofocus gives an objective evaluation of autofocus performance. 82 Chapter 5. Robust Autofocusing in Microscopy 1 0.8 0.6 0.4 0.2 0 -250 -200 -150 -100 -50 0 50 z-position [um] 100 150 200 250 Figure 5.1: Focus function as measured for the smooth muscle cells in phase contrast mode. The focus score (arbitrary units) of one representative field is plotted as function of the z-position. The peaks are caused by phase transition effects; the focal plane for the cell bodies is at −75 µm. 5.2 5.2.1 Results Autofocus Performance Evaluation The focus algorithm was not able to focus accurately on the smooth muscle cells. Figure 5.1 shows a representative focus curve measured with σ = 1.0. Measurement of the focus curve at other scales resulted in similar curves. The peaks are caused by phase transitions occurring when scanning through focus. For different focus positions, bright halos appear around the cells due to light diffraction [15]. The area of the cell bodies is small compared to the size of the halos, and thus the relevant image information content is too low. These circumstances caused failure of the focus algorithm to accurately focus on the cell bodies. For the other applications, fig. 5.2 shows the average focus curves, not considering complete failures. The variation in focus score is mainly due to the different number of cells or amount of tissue present in each field. For the time lapse of the cardiac myocytes (fig. 5.2b), variation in focus score is caused by the dedifferentiation of the cardiac myocytes over time. The variation in focus score for the immunohistochemical label detection (fig. 5.2c) is caused by contrast differences between slices. Further, for the quantitative neuronal morphology (fig. 5.2a), the measured focus curve with lowest maximum score (peak at 0.004) is at a field containing only some dead cells. Note the local maximum beneath focus, caused by a 180◦ phase shift in the point spread function of the optical system [25]. Table 5.2 shows a summary of autofocus performance. All fields were accurately focused according to an experienced observer, except for a few complete failures. Focus could not be determined on empty fields, as is the case for 14 failures in the C. Elegans GFP-VM screening. For the immunohistochemical label detection, focusing 83 5.2. Results Table 5.2: Summary of the results for the various experiments. The total number of focus events is denoted by # events. The time needed for focusing is given by tfoc in seconds, and as percentage of the total acquisition time tacq . application Quant neuronal morph Cardiac myocyte dediff Immunohist label det C. Elegans screening Immunocyt label det mode bright phase bright fluoresc fluoresc events 180 75 100 1800 300 fail 0 0 2 14 2 (correct) (100%) (100%) (98%) (> 99%) (> 99%) tfoc 1.7 2.8 1.5 1.1 2.8 tacq 7.5% 7% 12% 14% (tacq ) (4.5 min.) — (3 min.) (4.5 hour) (20 min.) failed on 2 fields, which contained not enough contrast for focusing. Further, for 2 fields in the immuno signal of the immunocytochemical label detection, the camera was completely saturated (bloomed) due to preparation artifacts, causing the autofocus algorithm to fail. For the C. Elegans GFP-VM screening, total acquisition time for a 96-well plate was 4.5 hours for 28,000 images, which is reasonable given the time needed for preparation. In summary, failure is caused by a shortage of relevant image information content. The proposed algorithm was completely successful in determining correct focus position for the thoroughly stained preparations of the quantitative neuronal morphology, even for fields containing only a few dead cells. Further, complete success was achieved for the cardiac myocyte dedifferentiation. Despite the morphological changes in image content during the experiment, none of the time lapse movies was out of focus any time. A high success rate was obtained for the immunohistochemical label detection, failing for 2 fields containing not enough contrast. For the fluorescence applications, the images were highly degraded by the presence of random noise (SNR ≤ 10 dB) due to fluorescent bacteria (C. Elegans screening), camera noise and structural noise caused by earth loops in combination with the extremely sensitive CCD camera. Nevertheless, a high success rate was achieved. 5.2.2 Evaluation of Performance for High NA Comparison between observer 1 and observer 2 resulted in an average error of 0.070 µm, whereas autofocus versus observer 1 resulted in 0.423 µm error. Hence, the autofocus method as implemented is slightly biased. The root mean squared error was 0.477 µm between observers, and 0.494 µm between autofocus and observer, which both is in the range of the depth of field for the used objective. Maximum error between observers was 1.27 µm, and for autofocus versus observer 1.12 µm, both within the slice thickness of 2 µm. Concluding, even for high NA objectives, autofocus performance is comparable to experienced observers. 84 Chapter 5. Robust Autofocusing in Microscopy 1 1 mean min max 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 -250 -200 -150 -100 -50 0 50 z-position [um] mean min max 0.8 0 100 150 200 250 -40 -20 (a) 0 z-position [um] 20 40 (b) 1 1 mean min max 0.8 mean min max 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 -400 -200 0 z-position [um] 200 400 -60 -40 -20 (c) 0 20 z-position [um] 40 60 (d) 1 1 mean min max 0.8 mean min max 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 -60 -40 -20 0 20 40 60 -60 -40 -20 0 z-position [um] z-position [um] (e) (f) 20 40 60 Figure 5.2: Average focus score (arbitrary units) as function of the z-position measured for different applications. a. Quantitative neuronal morphology. b. Cardiac myocyte dedifferentiation. c. Immunohistochemical label detection. d. C. Elegans GFP-VM screening. e. Immunocytochemical label detection nuclei and f. immuno signal, respectively. The measured focus curves indicated by “max” and “min” represent the focus events resulting in the lowest and highest maximum score, indicating variability and influence of noise on the estimate of the focus score. 85 5.2. Results 5.2.3 Comparison of Performance with Small Derivative Filters In order to evaluate the effect of the scale σ in the estimate for the focus score, experiments with σ = 0.5 are performed. For the quantitative neuronal morphology, accurate focusing with σ = 0.5 was not possible for 1 out of 24 fields. In this case, the algorithm focused on the reversed phase contrast image. Application of the small scale in focusing of the cardiac myocyte dedifferentiation failed whenever fungal contamination at the medium surface occurred, which was taken as focal plane. Taking σ = 1.0 solved this problem, that is by focusing persistingly on the myocytes. Focusing with σ = 0.5 on the immunohistochemical label detection resulted in focusing on dust particles at the glass surface for 5 out of 24 fields. For the fluorescence applications, accurate focusing was not possible with σ = 0.5, due to the small signal to noise ratio (SNR ≤ 10 dB). Experiments taken with σ = 0.75 resulted in inaccurate focusing for 18 out of 30 fields for the C. Elegans GFP-VM screening. Further, the algorithm was not able to focus accurately on 13 out of 30 fields for the nuclei in the immunocytochemical label detection, and failed for 17 out of 30 fields on the immuno signal. Repeating these experiments with the values of σ as given in tab. 5.1 resulted in accurate focus for all fields. 5.2.4 General Observations The effect of the scale σ results in robustness against noise and artifacts. A larger scale resulted in robustness against phase reversion (quantitative neuronal morphology), fungal contamination at the medium surface (cardiac myocyte dedifferentiation), dust on the glass surface (immunohistochemical label detection) and noise (the fluorescence applications). The performance of small differential filters, as used in [2, 5, 19, 16], is poor given the number of inaccurately focused images for σ = 0.5 or σ = 0.75. For the different applications, the chosen focus interval was effectively used for about 30%, i.e. the top of the measured focus curve was commonly within one-third of the focus interval centered at the origin. The focus interval should not be taken too narrow to ensure that the focal plane is inside the interval, regardless manual placement of the preparations. An effective use of 30% of the interval for 95% of the focus events seems an acceptable rule of thumb. The time needed for the autofocus algorithm varied from 1.5 up to 2.8 seconds for current sensors and computer systems, which is in the same time range as experienced observers. Focus time is completely determined by the depth of field and the video frame time, which both can be considered as given quantities, and by the size of the focus interval. Therefore, further reduction of focus time can only be achieved by a smaller focus interval, on the condition that the variability in preparation position is limited. When positional variability is low or well known, the focus interval ∆z can 86 Chapter 5. Robust Autofocusing in Microscopy be reduced to exactly fit the variability. For the applications given, focus time can be reduced up to a factor 3 in this way. Failure of the autofocus algorithm due to a shortage of image content can be well predicted. If the focal plane is inside the focus interval, there should be a global maximum in the estimate of the focus curve. Comparing the maximum focus score s o with the highest of the focus scores at the ends of the focus interval, se = max(s(0), s(td )) which are certainly not in focus, determines the signal content with respect to noise. When the maximum score does not exceed significantly the focus scores at the ends of the interval, or (so − se )/se < α, the found focus position should be rejected. In this case, focusing can better be based on a neighboring field. For the reported results, a threshold of α = 10% safely predicts all failures. 5.3 Discussion The success of automatic morphological screenings holds or falls with the accuracy of autofocus procedures. Although focusing is trivial for a trained observer, automatic systems often fail to focus images in different microscopic modalities. Autofocus procedures are often optimized for one specific preparation, visualized in one microscopic imaging mode. This report presents a method for autofocusing in multi-mode light microscopy. The objective was to develop a focus algorithm which is generally applicable in microscopy, and robust against confounding factors common in microscopy. Defocused images inherently have less information content than well focused images [5, 8]. Focus functions based on this criteria, such as the Gaussian derivative filter used in the presented method, by definition respond to the best focus position with a local maximum. Reliable focusing, without taking a-priori information into account, is possible whenever the best focus response becomes the global maximum. This criterion is fulfilled when the information content due to the signal is higher than that of optical artifacts, inherent to some modes of microscopic image formation, and noise. Sampling of the focus curve at Nyquist rate over the complete focal range guarantees detection of the global maximum. Consequently, the present autofocus method is generally applicable in any microscopic mode, whenever the amount of detail in the preparation is of larger influence than artifacts and noise. The effectiveness of the proposed method has been evaluated experimentally for the following specimen: neuronal cells in bright-field, cardiac myocytes in phase contrast, neuronal tissue sections in bright-field, fluorescent beads and GFP-VM expressing C. Elegans Nemotodes, smooth muscle cells in phase contrast, and immunocytochemically fluorescent labeled fibroblasts. The method was not able to focus the smooth muscle cells accurately, due to a lack of relevant image information content. For the other experiments, 2830 fields were focused with an overall success rate of 99.4%, where of the remaining 0.6% failure could be safely predicted. For each new Bibliography 87 specimen and microscope set-up, it suffices to set the parameters for scale σ, focus interval ∆z and focus speed, which can be derived from the size of the structures in the specimen, the used light and objective NA. In addition, for the scanning of large preparation, the distance after which focus has to be corrected and the fraction of the focus interval to correct for should be set. In contrast to other autofocus methods, the proposed algorithm is robust against confounding factors like: a. noise, b. optical artifacts, inherent to a particular mode of microscopic image formation, as halos in phase-contrast microscopy, and c. artifacts such as dust and fungal contamination, lying at a different focus level than the preparation. Focusing is performed within 2 or 3 seconds, which is in the same time range as trained observers. Moreover, even for high NA objectives, autofocus accuracy is comparable to experienced observers. For high magnification imaging of thick specimens, the method can be easily combined with focal plane reconstruction techniques [23, 24]. No constraints have been imposed on the focus curve other than that the global maximum indicates the focal plane. Hence, the method is generally applicable in light microscopy. The reliability of the proposed autofocus method allows for unattended operation on a large scale. Bibliography [1] K. N. Bitar and G. M. Makhlouf. Receptors on smooth muscle cells: Characterization by contraction and specific antagonists. J. Physiology, 242:G400–407, 1982. [2] F. R. Boddeke, L. J. van Vliet, H. Netten, and I. T. Young. Autofocusing in microscopy based on the OTF and sampling. Bioimaging, 2:193–203, 1994. [3] L. Ver Donck, P. J. Pauwels, G. Vandeplassche, and M. Borgers. Isolated rat cardiac myocytes as an experimental model to study calcium overload: the effect of calcium-entry blockers. Life Sci., 38:765–772, 1986. [4] L. Firestone, K. Cook, K. Culp, N. Talsania, and K. Preston Jr. Comparison of autofocus methods for automated microscopy. Cytometry, 12:195–206, 1991. [5] F. C. A. Groen, I. T. Young, and G. Ligthart. A comparison of different focus functions for use in autofocus algorithms. Cytometry, 6:81–91, 1985. [6] T. Henkel, U. Zabel, K. van Zee, J. M. Müller, E. Fanning, and P. A. Baeuerle. Intramolecular masking of the nuclear location signal and dimerization domain in the precursor for the p50 nf-κ b subunit. Cell, 68:1121–1133, 1992. 88 Chapter 5. Robust Autofocusing in Microscopy [7] E. T. Johnson and L. J. Goforth. Metaphase spread detection and focus using closed circuit television. J. Histochem. Cytochem., 22:536–545, 1974. [8] E. Krotkov. Focusing. Int. J. Computer Vision, 1:223–237, 1987. [9] S. J. Lockett, K. Jacobson, and B. Herman. Application of 3d digital deconvolution to optically sectioned images for improving the automatic analysis of fluorescent-labeled tumor specimens. Proc. SPIE, 1660:130–139, 1992. [10] D. C. Mason and D. K. Green. Automatic focusing of a computer-controlled microscope. IEEE Trans. Biomed. Eng., 22:312–317, 1975. [11] M. L. Mendelsohn and B. H. Mayall. Computer-oriented analysis of human chromosomes-iii: Focus. Comput. Biol. Med., 2:137–150, 1971. [12] R. Nuydens, C. Heers, A. Chadarevian, M. de Jong, R. Nuyens, F. Cornelissen, and H. Geerts. Sodium butyrate induces aberrant tau phosphorylation and programmed cell death in human neuroblastoma cells. Brain Res., 688:86–94, 1995. [13] A. Papoulis. The Fourier Integral and Its Applications. McGraw-Hill, New York, 1960. [14] M. Pluta. Advanced Light Microscopy, volume 1. Elsevier, Amsterdam, 1988. [15] M. Pluta. Advanced Light Microscopy, volume 2. Elsevier, Amsterdam, 1989. [16] J. H. Price and D. A. Gough. Comparison of phase-contrast and fluorescence digital autofocus for scanning microscopy. Cytometry, 16:283–297, 1994. [17] C. Steger. An unbiased detector of curvilinear structures. IEEE Trans. Pattern Anal. Machine Intell., 20:113–125, 1998. [18] N. Streibl. Depth transfer by an imaging system. Opt. Act., 31:1233–1241, 1984. [19] M. Subbarao and J. K. Tyan. Selecting the optimal focus measure for autofocusing and depth-from-focus. IEEE Trans. Pattern Anal. Machine Intell., 20:864–870, 1998. [20] B. M. ter Haar Romeny, editor. Geometry-Driven Diffusion in Computer Vision. Kluwer Academic Publishers, Boston, 1994. [21] R. van Balen, D. Koelma, T. K. ten Kate, B. Mosterd, and A. W. M. Smeulders. ScilImage: a multi-layered environment for use and development of image processing software. In H. I. Christensen and J. L. Crowley, editors, Experimental Environments for Computer Vision & Image Processing, pages 107–126. World Scientific Publishing, 1994. Bibliography 89 [22] L. J. van Vliet, I. T. Young, and P. W. Verbeek. Recursive Gaussian derivative filters. In Proceedings ICPR ’98, pages 509–514. IEEE Computer Society Press, 1998. [23] H. S. Whu, J. Barbara, and J. Gil. A focusing algorithm for high magnification cell imaging. J. Microscopy, 184:133–142, 1996. [24] T. T. E. Yeo, S. H. Ong, Jayasooriah, and R. Sinniah. Autofocusing for tissue microscopy. Imaging Vision Comput., 11:629–639, 1993. [25] I. T. Young, J. J. Gerbrands, and L. J. van Vliet. Fundamental Image Processing. Delft University of Technology, Delft, 1995. [26] I. T. Young, R. Zagers, L. J. van Vliet, J. Mullikin, F. Boddeke, and H. Netten. Depth-of-focus in microscopy. In Proceedings 8th SCIA, pages 493–498, 1993. Chapter 6 Segmentation of Tissue Architecture by Distance Graph Matching appeared in Cytometry, vol. 35, pp. 11–22, 1999. “Cell and tissue, shell and bone, leaf and flower, are so many portions of matter, and it is in obedience to the laws of physics that their particles have been moved, moulded and conformed. They are no exceptions to the rule that God always geometrizes. Their problems of form are in the first instance mathematical problems, their problems of growth are essentially physical problems, and the morphologist is, ipso facto, a student of physical science.” – –D’Arcy W. Thompson. Quantitative morphological analysis of fixed tissue plays an increasingly important role in the study of biological and pathological processes. Specific detection issues can be approached by classical staining methods, enzyme histochemical analysis, or immunohistochemical processes. The tissue can not only be characterized by the properties of individual cells, such as staining intensity or expression of specific proteins, but also by the geometrical arrangement of the cells [4, 6, 10]. Interesting tissue parameters are derived from the topographical relationship between cells. For instance, topographical analysis in tumor grading can significantly improve routine diagnosis [3, 8, 16]. Studies of growing cancer cell lines have revealed a non-random distribution of cells [5, 14]. Partitioning of epithelial tissue by cell topography is used for quantitative evaluations [17]. We propose a new method for the partitioning of tissues. As an example, structural integrity of hippocampal tissue after ischemia will be examined. 91 92 Chapter 6. Segmentation of Tissue Architecture by Distance Graph Matching As a first step, tissue parts of interest have to be segmented into cell clusters. Segmentation of cell clusters can be based on distances between the center of gravity of the cells. The recognition of tissue architecture is then reduced to determining borders of point patterns. The problem traced as such can be solved by the application of neighbor graphs, and partitioning them. The Voronoı̈ graph is often applied as a modeling tool for point patterns [3, 5, 13, 16]. The definition of the Voronoı̈ graph is given by polygons Z(p), where each polygon defines the area for which all points are closer to marker p than to any other marker [22]. A polygon Z(p) is called the zone of (geometrical) influence of p. Neighboring markers to p are defined by the set of all markers for which the zone of influence touches that of p. Such a tesselation of the plane depends on the spatial distribution of the cell markers. Cluster membership is determined by evaluation of geometrical feature measurements on the zones of influence [1]. Rodenacker et al. [17] used the Voronoı̈ graph for partitioning epithelial tissue. Segmentation was obtained by propagating the neighbors from the basal layer of the epithelial tissue to the surface. Borders between basal, intermediate and superficial areas were determined by examining the occupied surface of propagation. In this way, every third of the total area of the Voronoı̈ graph was assigned to one of the regions, yielding three regions with approximately similar areas in terms of zones of influence. As discussed elsewhere [5, 12], the Voronoı̈ graph is sensitive to detection errors. Removal or insertion of one object will change the characteristics of the Voronoı̈ graph. A second drawback is that Voronoı̈’s graph is ill-defined at cluster borders. This makes the Voronoı̈ graph unsuited for robust segmentation of tissue architecture. Another option for the recognition of point patterns is a modification of the Voronoı̈ graph: the k-nearest neighbor graph [11, 15, 19]. The neighbors of a point p are ordered as the nearest, second-nearest, and up to k th -nearest neighbor of p. The k-nearest neighbor graph is defined by connecting each point to its k-nearest neighboring points [22]. The strength of each connection is weighted by the distance between points. Similarity between k-nearest neighbor graphs is determined by comparing the graphs extracted from detected point patterns with prototype k-nearest neighbor graphs. In Schwarz and Exner [19], the distance distribution to one of the nearest neighbors was used for the separation of clusters from a background of randomly disposed points. The main drawback is that not all patterns can be discriminated by considering only one specific k-nearest neighbor distance. Lavine et al. [11] used sequences of sorted interpoint distances extracted from noisy point images to match the image with one of a set of prototype patterns. Similarity between prototype and point set is based on a rankwise comparison. From the two sorted interpoint distance vectors, the corresponding (relative-) difference vector is calculated. The number of components which exceeds a given threshold is used for discrimination between patterns. A major disadvantage of the rankwise comparison 6.1. Materials and Methods 93 is that all components have to be detected. When the nearest neighbor is missed in the detection, the first one in rank is compared with the second one. Thus, failure to detect one cell results in poor similarity. Automatic segmentation of tissue architecture is difficult because biological variability and tissue preparation have a major influence on the tissue at hand. The detection and classification of individual cells in the tissue is prone to error. Although most authors [3, 5, 12] were aware of the lack of robustness in the quantification of tissue architecture, little effort was made to incorporate uncertainty of cell detection in tissue architecture quantification methods. Lavine et al. [11] showed that the k-nearest neighbor graph is well-suited for point pattern recognition under spatial distortions, but the method used is not able to anticipate cell detection errors. In this chapter we present a robust method for tissue architecture segmentation, based on the k-nearest neighbor graph. A sequence comparison algorithm is used to allow missing or extra detected cells in the detected point set. Uncertainty in cell classification is incorporated into the matching process. Experiments show that the robustness of the method presented is superior to that of existing methods. The method is demonstrated by segmentation of the CA region in rat hippocampi, where structural integrity of the CA1 cell layer is affected by ischemia. The correlation between manual scoring and automatic analysis of CA1 preservation is shown to be excellent. 6.1 6.1.1 Materials and Methods Hippocampal Tissue Preparation Rat brains were fixed by intracardiac perfusion with diluted Karnovsky’s fixative (2% formaldehyde, 2.5% glutaraldehyde in Sörensen’s phosphate buffer; pH 7.4). They were immersed overnight in the same fixative. Coronal vibratome sections of the dorsal hippocampus were prepared stereotaxically 3.6 mm caudally to the bregma (Vibratome 1000, TPI, St. Louis, MO). Slices (200 µm) were postfixed with 2% osmium-tetroxide, dehydrated in a graded ethanol series, and routinely embedded in Epon. Epon sections were cut at 2 µm and stained with toluidine blue. 6.1.2 Image Acquisition and Software Images were captured by a CCD camera (MX5, Adimec, Eindhoven, The Netherlands), which is a 780 × 576 video frame transfer CCD with pixel size 8.2 × 16.07µm 2 , operating at room temperature with auto gain turned off. The camera was mounted on top of an Axioskop in bright-field illumination mode (Carl Zeiss, Oberkochen, Germany). The microscope was equipped with a scanning stage for automatic position control (stage and MC2000 controller, Märzhäuser, Wetzlar, Germany). The 94 Chapter 6. Segmentation of Tissue Architecture by Distance Graph Matching 31 60 28 43 40 25 35 30 49 37 24 Figure 6.1: Example of a k-nearest neighbor graph. The nodes represents cells in tissue, while the edges represent their relation. The relations in this graph are given by the two nearest neighboring cells, and edges are weighted by the distance between the cells. scanning stage was calibrated for a 10× magnification and adjacent 512 × 512 images were captured to ensure that complete hippocampi were scanned. Typical composite image sizes were 6144 × 4096 pixels, or 4.94 × 3.30 mm2 . For image processing, the software package SCIL-Image version 1.4 (TNO-TPD, Delft, The Netherlands) was used on an O2 workstation (SGI, Mountain View, CA). The package was extended with the distance graph matching algorithm. 6.1.3 K-Nearest Neighbor Graph Consider an image of a tissue containing cells. Detection of cells in the image will result in m markers at possible cell locations. Let V be the set of m detected cell markers, V = {v1 , v2 , . . . , vm }. The elements in V are called vertices or nodes. A graph G(V, E) (fig. 6.1) defines how elements of V are related to one another. The relation between the vertices is defined by the set of edges E, in which the elements eij connects the vertices vi to vj . A weighted graph is defined by the graph G(V, E), where a value is assigned to each edge eij . The k-nearest neighbor graph of a node v is defined as the subset of k vertices closest to v. The edges between v and the neighboring vertices are weighted by the Euclidian distance, or Nvk = { d1 , d2 , . . . , dk | di = dist(v, vi ), di < di+1 }. Taking k = 1 for all v ∈ V results in the nearest neighbor graph, in which each cell is connected to its closest neighbor. The average edge length in the k-nearest neighbor graph gives a measure of scale of the pattern of cells. Division of all distances di in a k-nearest neighbor graph by the ¯ normalizes the graph for scale, i.e., d˜i = di /d. ¯ average of all distances in the graph, d, 6.1.4 Distance Graph Matching Point patterns of interest were extracted from the k-nearest neighbor graph. As an example, consider fig. 6.2. A regular structured tissue was assumed, consisting of cells 6.1. Materials and Methods 95 Figure 6.2: Extraction of tissue architecture. A typical relationship around a cell is obtained from an example of the tissue of interest (a). The prototype k-nearest neighbor graph is derived from distances to cells (b). All prototypes shown are considered equal to fit deformed tissue parts. Further freedom is given by a certain elasticity of the edges in the prototype graph. Extraction of the tissular architecture proceeds by fitting the prototype graph on each cell and its neighborhood in the tissue (c). Within the similar tissue parts, the graph will fit. Outside these regions, matching is limited to only one or two edges. In order to safeguard against cell detection errors, not all edges in the prototype have to fit the cellular neighborhood. regularly distributed over the tissue. Such a point pattern reveals an equally spaced layout everywhere within the tissue borders. The surrounding of each cell belonging to the pattern can be modeled by the neighborhood of one single cell (fig. 6.2). The k-nearest neighbor graph of a typical pattern cell gives a characterization of the point pattern of interest. After selection of a typical cell, the pattern is given by a prototype k-nearest neighbor graph, with distance set P = {p1 , p2 , . . . , pk }, where pi denotes 96 Chapter 6. Segmentation of Tissue Architecture by Distance Graph Matching the prototype distances. Acceptance or rejection of a detected object as belonging to the cell-cluster of interest is based on comparison of the observed k-nearest neighbor distances Nvk , to the prototype defined by the characteristic distances to the neighbors in P . 6.1.5 Distance Graph Comparison The difference between observation and prototype set is expressed by the replacements necessary to match the prototype with the observation. This is referred to as dissimilarity between sets [18]. For example, consider for simplicity the discrete observed set {3, 10, 11, 15, 20, 20, 21, 25} and prototype {5, 5, 10, 10, 20, 20}. When disregarding the last distances in the observation (21, 25), two substitutions (3 7→ 5, 11 7→ 10), one insertion (5) and one deletion (15) transforms the observed distance set into the prototype. So there are four modifications between prototype and observation. The extra distances at the end of the observed set are necessary for expanding the comparison when elements are deleted in the beginning of the set. Without these extra elements, deletion of one item at the beginning of the set implies the addition of an item at the end of the set. There will be no need for addition when there is a cell at the correct distance. Therefore, the amount of elements in the observation l should be larger than the prototype length k to allow for expansion in the comparison. A cost is assigned to each type of replacement. Let ci be the cost for insertion, cd the cost for deletion, cs the cost for substitution, and cm the cost for matching. In the example, 11 is closer to 10 than 3 is to 5, which can be reflected in their respective matching costs. The minimum total cost t, necessary to transform the observed set into the prototype, gives the similarity between the sets. The minimum cost is obtained by using a string matching algorithm [18] (see Appendix). The lowest possible value for the cost t is obtained when both sets are equal. The amount of replacements is zero, and thus the cost is zero. An upper bound for the cost necessary to match two sets is obtained when all elements are replaced. In this case, either all elements are inserted at the beginning of the set, or all elements are substituted, depending on the respective costs. The upper bound is then given by tupper = k min(ci , cs ). Normalization of the minimum total cost gives a correspondence measure, indicating how well the observed pattern matches the prototype, i.e., C= tupper − t × 100%. tupper (6.1) Discrimination between two known point patterns, cluster and background, can be based on example and counterexample. Consider the observed k-nearest neighbor graph Nvk , the prototype P describing the pattern of interest, and a prototype B characterizing the background pattern. When elements in background B match elements 97 6.1. Materials and Methods in P , the cost tbackgr related to matching P with B is less than the upper bound for the minimum cost. Then, discrimination between the two patterns is enhanced by normalizing the correspondence to the cost given by matching P with background B, or C0 = tbackgr − t × 100%. tbackgr (6.2) Note that C 0 can be negative for patterns which neither correspond to the foreground prototype nor to the background prototype. The extension to multiclass problems can be made by considering prototype P for the class of interest, and prototypes B1 , B2 , . . . , Bn for the remaining classes. Matching P with each of the prototypes Bi gives the correspondences between the pattern of interest and the other patterns. The pattern Bi which is most similar to P results in the lowest matching cost, which should be used for normalization. 6.1.6 Cost Functions The total cost depends on the comparison between each of the individual elements of Nvk and P , and thus the replacements necessary to match them. The replacement operations are given by insertion (cost ci ), deletion (cost cd ) substitution (cs ) and match (cm ). The cost for matching cm is zero when the two distances are equal. The difference between two distances is defined as their relative deviation, or δ = |di − pj |/pj . Here, di denotes the observed distance and pj the prototype distance with which to compare. Robustness against spatial distortion is obtained by allowing a percentage deviation α in the comparison of distances [11]. In this case, two distances are considered equal as long as their relative deviation is smaller than the tolerance α. A minimum value for α is given by the distance measurement error. When the deviation percentage between two distances is higher than α, their correspondence is included in the matching cost. The correspondence C then depends on the total distance deviation between the compared elements. The matching cost is taken linearly proportional to the distance deviation, or if δ ≤ α 0 s (δ − α) csc−α if α < δ < cs cm = (6.3) c otherwise. s The cost for matching is cs if δ ≥ cs , which is equivalent to a substitution operation. For our case, cell detector properties determine the costs for insertion. For a sensitive detector, the probability to miss a cell is low. As a consequence, the cost for insertion should be high compared to deletion. Alternatively, a low-sensitive cell detector will overlook cells, but fewer artifacts will be detected. Thus, the costs for 98 Chapter 6. Segmentation of Tissue Architecture by Distance Graph Matching insertion should be low relative to deletion. The insertion cost is therefore tuned to the cell detector performance, or ci #A ∝ . cd #M (6.4) Here, #A denotes the estimate of the average amount of artifacts detected as cells, and #M denotes the estimate of the average amount of missed cells. The deletion cost is derived from object features. A probability distribution can be obtained from well-chosen measurements, e.g., the contour ratio, on a test set of objects. Afterwards, the probability P (vi ) for object vi being a cell is extracted from the measured distribution. When an object has a low probability of being a cell, the object should be deleted. Therefore, rather than considering a fixed deletion cost, the probability of an object being a cell determines the deletion cost for that object, or ci (vi ) ∝ P (vi ) . (6.5) As a result, the correspondence measure for the object under examination is only slightly affected by the deletion of artifacts. The rejection of detected objects as being artifacts can be based on both cell probability P (vi ) and the correspondence C of the object to the cluster prototype. 6.1.7 Evaluation of Robustness on Simulated Point Patterns Four algorithms, based on the Voronoı̈ graph, nearest neighbor distance, the method of Lavine et al. [11], and the proposed distance graph matching, were tested in simulations. The segmentation performance was measured as a function of the input distortion. The input consisted of a foreground point pattern embedded in a background pattern, distorted by some random process. For the simulations, two arbitrarily chosen patterns were generated. A hexagonal point pattern was embedded in a random point pattern with the same density, and the same pattern was placed in a hexagonal pattern with half the density (fig. 6.3). Artificial distortion was added to the sets by consecutive random removal, addition, and displacement of points. The distortion was regulated from 0% up to a maximum, resulting in a noisy realization of the ideal patterns. By removing points, the algorithm is tested for robustness against missing cells. Addition of points reveals robustness of the algorithm against false cell detections. Robustness against spatial distortion is examined by means of point displacement. Each one of the four methods was tested for robustness against the given distortions. The combination of removal and displacement of points shows robustness against touching cells. The other combinations show the interaction of distortions on robustness. The segmentation performance indicates how well the foreground pattern was discriminated from the background points. It was measured as function of the distortion. 99 6.1. Materials and Methods L H M> I A "' & J G H D E F K C <B C < = " > ? " : @ A ; : 8 9 9 67 : # $ 3 2 1 4 5 ( / 0 . , ( * + ) " # ' ( & " ! # % $ (a) (b) Figure 6.3: Point patterns used for the experiments. a. A regular hexagonal pattern inside a hexagonal pattern with half the density. b. A regular hexagonal pattern inside a random pattern with the same density. The performance of the various algorithms was measured as one minus the ratio of the false negatives combined with the ratio of false positives, or P =1− #Fb #Bf − #T ruthf #T ruthb (6.6) Here, #Fb denotes the number of foreground markers classified as background, #B f denotes the amount of background markers classified as foreground, and #T ruth f and #T ruthb denotes the true number of foreground and background markers, respectively, in the distorted data set. 6.1.8 Algorithm Robustness Evaluation For the experiments, the area of the influence zones in the Voronoı̈ graph was thresholded [7] in order to partition the test point patterns. The thresholds were chosen such that 10% distortion on the distance to the nearest neighbors was allowed for the undistorted foreground pattern. This yields calculation of the minimum and maximum area for scaled versions of the pattern, with scaling factor 0.9 and 1.1. With regard to the nearest-neighbor distance, thresholds were taken such that 10% perturbation in the nearest-neighbor distance was allowed, determined in the undistorted foreground pattern. The method given by Lavine et al. [11] was tested for k ∈ {5, 10, 15, 20, 25}. Implementation of this method was achieved by using the distance graph matching algorithm. Examples of both foreground and background pattern were used for discrimination. Costs for insertion and deletion were taken as infinity (ci = cd = ∞); thus, only substitutions or matches were allowed. The allowed perturbation in the distances was set at 10% (cs = α = 0.1). The correspondence C 0 (eq. 6.2) was thresholded at 50%. 100 Chapter 6. Segmentation of Tissue Architecture by Distance Graph Matching Experiments for the proposed distance graph matching method were taken with prototype length k ∈ {5, 10, 15, 20, 25}. In order to allow the string matching to expand, the amount of observed elements considered for matching was twice the length of the prototype set (l = 2k). Examples of both foreground and background pattern were used for discrimination. Substitution of cells was not allowed, except as a deletion followed by an insertion operation. This can be achieved by taking the cost for substitution equal to the sum of the costs for insertion and deletion (c s = ci + cd = c). The costs for insertion and deletion were taken as equal. The allowed perturbation in the distances was taken to be 10% (c = α = 0.1). The correspondence C 0 (eq. 6.2) was thresholded at 50%. This way, parameters were set to permit fair comparison between the four methods for tissue architecture segmentation. 6.1.9 Robustness for Scale Measure In order to investigate the influence of distortions on the scale normalization measure, ¯ the average the measure was tested in the simulations. The normalization factor d, neighbor distance, was calculated under addition, removal, and displacement of points. The percentage error to the initial scale measure, d¯ for 0% distortion, was measured as function of the distortion. The amount of neighbors k considered for calculation of the scale measure was taken to be {1, 5, 10, 15}. 6.1.10 Cell Detection Cell domes were extracted from the hippocampal images by grey-level reconstruction [2], resulting in a grey-value image containing the tops of all mountains when considering the input image as a grey-level landscape. From the dome image, saturated transparent parts were removed, and the remaining objects were thresholded. The results contained cell bodies, neurite parts and artifacts. An opening was applied to remove the neurite parts. After labeling, the center of gravity of each object was calculated and used for determination of the k-nearest neighbor graphs. The reciprocal contour ratio (1/cr) was used as a measure for cell probability (eq. 6.5). 6.1.11 Hippocampal CA Region Segmentation Segmentation of the CA region was obtained by supervised selection of an example region. An arbitrary section, unaffected by ischemia, was taken and, after cell detection, one of the cells in the CA1 region was manually selected. The neighborhood of the selected cell was used as a prototype for segmentation. No counter (background) example was taken. Each of the four algorithms was used for segmentation of the CA region. Parameters for segmentation were derived from the example neighborhood to permit fair comparison between the methods. 101 6.2. Results Thresholds for the area of the influence zones in the Voronoı̈ graph were derived from the example, such that 35% distortion on the distance to the nearest neighbor was allowed. For the nearest-neighbor method, thresholds were taken such that 35% distortion on the nearest-neighbor distance in the example was allowed. The method of Lavine et al. [11] was implemented by using the distance graph matching algorithm. Costs for insertion and deletion were taken as infinity (c i = cd = ∞), allowing only substitutions or matches with 35% tolerance (cs = α = 0.35). The deletion cost for individual objects was adjusted by the cell probability, derived from the contour ratio. For graph matching, 15 neighbors were taken into account. A cell was considered as a cluster cell when the similarity between distance graph and prototype was at least 50%. For the distance graph matching method, substitution of cells was not allowed, achieved by setting cs = ci + cd = c. The substitution cost was tuned to allow for 35% distortion in the distances, from which the last 25% was included in the correspondence measure (c = 0.35, α = 0.1). After visual examination of the detector performance, the insertion cost was set at twice the deletion cost. The deletion cost for individual objects was adjusted by the cell probability, derived from the contour ratio. For the distance graph matching, 15 neighbors were taken into account. Matching was allowed to expand to twice the amount of neighbors in the prototype (l = 2k). A cell was considered as a cluster cell when the similarity between distance graph and prototype was at least 50%. 6.2 6.2.1 Results Algorithm robustness evaluation Figure 6.4 shows the results of the performance of the algorithms on the simulated point patterns, where 0% performance corresponds to random classification of the markers. The distortion for removal and addition of points is given as the percentage of points removed or added. For displacement of points, the distortion is given as percentage of displacement up to half the nearest neighbor distance (100%) of the undistorted hexagonal foreground pattern. When the distortion in displacement reaches 100%, the hexagonal pattern has become a random pattern, indistinguishable from the random background pattern (fig. 6.4f). The optimum performances which can be reached for the three types of distortion are shown in fig. 6.4b,d,f. In those cases, the segmentation result corresponds to correct classification of all (remaining) markers. The results of the combined experiments are examined for interaction between the different kinds of distortions, and their relation with the individual performances. The behavior of the algorithms under all distortions remains similar for both test patterns. This suggests that the performance of the different methods is insensitive 102 Chapter 6. Segmentation of Tissue Architecture by Distance Graph Matching to the type of test pattern. For addition and displacement of points, the minimum and maximum performance over the 25 simulation trials remains within 20% from the average. For removal of points, the minimum and maximum performance was within 40% from the average for the Voronoı̈, Lavine et al. [11], and distance graph matching methods. The nearest neighbor method shows a deviation of 60% from the average for removal of points, which is due to the normalization of the performance measure to the amount of markers (eq. 6.6). Figure 6.4a–d reveals that thresholding the area of influence in the Voronoı̈ graph is inadequate in determining cluster membership when cell detection is not reliable. No point can be removed or added without changing the Voronoı̈ partition for all (Voronoı̈) neighbors surrounding the removed or added point. A second drawback is the high initial error of 20% and 35%, respectively. Under displacement of points (fig. 6.4e,f), segmentation based on the Voronoı̈ graph is shown to be robust. Figure 6.4f reveals the bias (100% distortion, 10% performance) for the Voronoı̈ graph at the image border. Points near the image border are all (correctly) classified as background due to their deviation from the normal area of influence, resulting in a better than random classification for the indistinguishable fore- and background. Experiments for the Voronoı̈ method performed with thresholding the deviation on the nearest-neighbor distance at 5% give only marginally better performances (data not shown). For the combination of displacement and removal, the resulting segmentation error showed both factors to be additive below 15% removal (data not shown). Similarly, for the displacement and addition of points the combined error was shown to be the addition of errors caused by applying each distortion separately. The performance under removal and addition of points is only slightly better than the addition of the individual errors. Segmentation based on the nearest-neighbor distance behaves like the optimum when distorted by removal of points (fig. 6.4a,b). Under the condition of addition of points (fig. 6.4c,d), performance is as bad as with the Voronoı̈ method. Since 10% distortion on the nearest neighbor distances is allowed, the method performs well up to the 10% displacement (fig. 6.4e,f). As shown elsewhere [19], segmentation based on one of the other k-nearest neighbors is able to improve the discrimination between patterns. Behavior of the method under distortions for higher k remains similar to the results shown for k = 1. The performance for the combinations removal-addition and removal-displacement was completely determined by addition and displacement (data not shown), respectively. As can be expected from fig. 6.4a,b the influence of removal of points may be neglected. For the combination of addition and displacement of points, the effect on the segmentation error is the addition of the errors caused by each distortion separately. For the method of Lavine et al. [11], the results are shown for k = 10. The initial segmentation error between the test point patterns (fig. 6.4a,b) is smaller than 6.2. Results 103 with both the Voronoı̈ and nearest-neighbor method. Taking more neighbors into account clearly results in better discrimination between point patterns. The performance under removal of points degrades faster than the nearest-neighbor segmentation (fig. 6.4a,b), while the performance for addition of points (fig. 6.4c,d) degrades less severely for small distortions. The tolerance for spatial distortion is improved in comparison to the nearest-neighbor method. Analysis based on larger neighborhood sizes (k ∈ 15, 20, 25) shows that the performance for removal and addition of points degrades faster, whereas the performance improves under the condition of displacement of points. Additionally, the initial error increases with a few percentages. For k = 5, segmentation performance is comparable, except for the initial error which increases a few percentages. The error due to both the combinations removaldisplacement and addition-displacement was shown to be almost perfectly additive (data not shown). For the combination of addition and removal of points, the error due to removal is counteracted by the addition of points for large distortions. The distance graph matching method performs slightly better than the method of Lavine et al. [11] for removal of points (fig. 6.4a,b). Under the condition of point addition, the distance graph matching method is clearly superior. The initial error in the discrimination between both hexagonal foreground and background is zero for both the distance graph method and that of Lavine et al. [11]. For the discrimination between hexagonal foreground and random background, the initial performance for the distance graph matching is better than with the method of Lavine et al. [11]. Performance for a small neighborhood size is comparable to the performance with the method of Labine et al. [11] (k = 5). For large neighborhood sizes (k ≥ 15), performance for removal and addition degrades faster, but remains better than with the method of Lavine et al. [11]. Under displacement of points, the performance increases for high k. Additionally, the initial error increases a few percentages. The performance for the combined distortion from addition and displacement of points was shown to be completely determined by the point displacement (data not shown). For removal and addition, the error due to removal was reduced by the random addition of points for severe distortions. The combination of removal and displacement was shown to be better than the addition of the respective errors. From these experiments, it can be concluded that both thresholding the area of influence in the Voronoı̈ graph and thresholding the distance to one of the nearest neighbors are not suitable for robust segmentation of tissue architecture. The experiments undertaken show the instability of the Voronoı̈ graph for detection errors. The Voronoı̈ graph is certainly useful for determination of neighbors [16], but more robust parameters can be estimated from the Euclidian distance between these neighbors [23]. The proposed distance graph matching algorithm indeed has a better performance under detection errors than the method of Lavine et al. [11]. Therefore, the distance graph matching method is more suitable for use in the partitioning of tissue architecture. 104 100 100 80 80 % performance % performance Chapter 6. Segmentation of Tissue Architecture by Distance Graph Matching 60 40 20 Voronoi nearest neighbor Lavine distance graph optimum 60 40 20 0 0 0 20 40 60 % distortion 80 100 0 20 100 100 80 80 60 40 20 100 80 100 80 100 60 40 20 0 0 0 20 40 60 % distortion 80 100 0 20 (c) 40 60 % distortion (d) 100 100 80 80 % performance % performance 80 (b) % performance % performance (a) 40 60 % distortion 60 40 20 60 40 20 0 0 0 20 40 60 % distortion (e) 80 100 0 20 40 60 % distortion (f) Figure 6.4: Average segmentation performance is plotted as function of the distortion. Each point represents the average performance over 25 trials for the given percentage of distortion. For the method of Lavine et al. [11] and the distance graph matching method, results for k = 10 are shown. a. Point removal, hexagonal background. b. Point removal, random background. c. Point addition, hexagonal background. d. Point addition, random background. e. Point displacement, hexagonal background. f. Point displacement, random background. 105 6.2. Results 100 % performance 80 60 40 20 remove add shift 0 0 20 40 60 % distortion 80 100 Figure 6.5: Influence of removal, addition, and displacement of points on the scale normalization measure d¯ for k = 10. Average percentage error over 25 trials. 6.2.2 Robustness for Scale Measure Robustness of the scale normalization was tested on both artificial data sets. Results for k = 10 on the hexagonal-hexagonal data set are shown in fig. 6.5. The result for k = 1 degrades for addition and displacement of points, while removal of points is more stable. The results for k = 5 and k = 15 are almost identical to the results shown for k = 10. The results for the hexagonal-random data set are almost identical to the hexagonal-hexagonal results for k ∈ 5, 10, 15. The experiment shows that the average k-nearest neighbor distance is useful in normalization for scale when taking k large enough. 6.2.3 Hippocampal CA Region Segmentation The new method of distance graph matching was tested on the segmentation of the CA region in rat hippocampi (fig. 6.6a), based on the preservation of the CA1 structure after ischemia [9]. Here, the correlation between manual and automatic counting of the preserved cells in the CA1 region is shown. An example of the cell detection is shown in fig. 6.6b. As a result from the distance graph matching, all cells in the CA and Hillus region were extracted from the image (fig. 6.6c). Only cluster cells are preserved in the segmented image. The CA1 region (fig. 6.6) is that part inside the CA region, starting orthogonally at the end of the CA inside the hillus, and ending where the CA region becomes thicker before the U-turn. Manual counting was performed on 2–4 slices for each animal, resulting in a total number of preserved neurons counted in a total length of CA1 region (cells/mm) per animal. To demonstrate the usefulness of the proposed segmentation method, correlation between these manual countings and automatic counting is shown. Due to the ambiguous definition of the CA1 region, manual indication of the CA1 region in the hippocampus image was necessary. For each hippocampus, three points were ob- Chapter 6. Segmentation of Tissue Architecture by Distance Graph Matching (a) (b) (c) 106 Figure 6.6: Example of the segmentation of cell clusters in the hippocampus of a rat. The line segment SME indicates the CA1 region. All segmented cells in figure (c) between points S and E are considered part of the CA1 region. The length of the CA1 region is derived from the length of line SME. a. Hippocampus image as acquired by the setup. b. The resulting image from the cell detection. c. Cell clusters after the distance graph matching. 6.3. Discussion 107 tained, indicating the start (S), middle (M), and end (E) of the CA1 region. The segmented cells between start and end point, and within a reasonable distance from the line segment SME connecting the three points, were classified as belonging to the CA1 region. The average amount of cells per unit length was calculated for the obtained cell cluster. The cluster length was taken to be the length of line segment SME. Figure 6.7 shows the correlation between the manual and automatic counting for each of the algorithms tested. Results obtained with segmentation based on the Voronoı̈ graph and for the nearest-neighbor distance do not correlate well with manual counting. The method of Lavine et al. [11] is biased (mean error, -12.9) and results in a mean squared error of 405.0 [20]. For the distance graph matching algorithm, the mean error is 0.1 and the mean squared error is 174.8. 6.3 Discussion The geometrical arrangement of cells in tissues may reveal differences between physiological and pathological conditions based on structure. This intuitive notion does not imply the quintessence that the arrangement can be captured easily in an algorithm. Quantification of tissue architecture, when successful and objectively measurable, opens the way to better assessment of treatment response. Before deriving parameters from tissue architecture, partitioning of the tissue in its parts of interest is necessary. We present a method for the segmentation of homogeneous tissue parts based on cell clustering. The objective is to develop a method which is robust under spatial distortions intrinsic to the acquisition of biological preparations, such as squeezing the tissue as well as taking a two-dimensional transection through a three-dimensional block. These manipulation artefacts lead to two major confounding factors: a. distortion in the cell density, and b. errors in cell detection. Distortion in cell density is reflected in the distance between cells. Irregularity or spatial distortion in the cell positions, and thus distortion in the neighbor distances, is inherent to tissues. Squeezing of tissue, or local nonrigid deformations result in structural changes in cell density and thus changes in neighbor distances. Small changes in transection angle cause loss of cells in regions of the tissue. A second source of error in cell detection is the classification of artifacts as cells, or else cells may be overlooked during detection, causing lack of proper definition of local tissue architecture. When neighboring cells touch one another, they are often erroneously detected as one single cell. The method also deals with the uncertainty in cell classification often encountered in the automatic processing of tissues. Errors in the assignment of cells on cluster borders should be minimal to prevent influence of cluster shape on the segmentation result. The quantitative method enables reliable classification of areas by type of tissue. In contrast to other cell pattern segmentation methods, the proposed distance 108 140 140 120 120 automatic count [cells/mm] automatic count [cells/mm] Chapter 6. Segmentation of Tissue Architecture by Distance Graph Matching 100 80 60 40 20 100 80 60 40 20 0 0 0 20 40 60 80 100 manual count [cells/mm] 120 140 0 20 120 140 120 140 (b) 140 140 120 120 automatic count [cells/mm] automatic count [cells/mm] (a) 40 60 80 100 manual count [cells/mm] 100 80 60 40 20 100 80 60 40 20 0 0 0 20 40 60 80 100 manual count [cells/mm] (c) 120 140 0 20 40 60 80 100 manual count [cells/mm] (d) Figure 6.7: Correlation between average number of cells per mm CA1 length per animal counted manually, and the number of segmented cells per estimated mm CA1 length per animal. Dashed line indicates y = x. a. Voronoı̈ method. b. Nearest-neighbor method. c. Method of Lavine et al. [11]. d. Distance graph matching method. graph matching algorithm meets the various demands as formulated above. Detection errors as missing cells or artifact detection are corrected by respective insertion and deletion operations. Deviation of the distances to neighboring cells is incorporated by allowing some tolerance in distance matching. Local deformation of the tissue has only minor influence as long as the deviation in distances remains within tolerance. The total sum of errors, combined with deviation in distances, indicates how well the cell and its environment fit the prototype environment. A possible drawback of the algorithm is its insensitivity for orientation. It is possible for two different patterns to have the same distance graphs. Under these circumstances, segmentation is not possible by any algorithm based on interpoint distances. Including cell probability in the matching process further improves segmentation 6.4. Appendix: Dynamic Programming Solution for String Matching 109 performance. The interference between the probability indicating that the object is or is not a cell, and the fit of the object in the cluster prototype, allow a better rejection of artifacts, while cluster cell classification is less affected. Cell confidence levels can be derived from the evaluation of the probability distribution of cell features as contour ratio. In order to remain independent of microscope and camera settings, the cell features chosen should not depend on scale, absolute intensity, etc. The selection of an example often involves a supervised (i.e., interactive) procedure. The design of such a procedure requires adherence to several principles [21]. Among other requirements, reproducibility under the same intention is considered the most important for our purpose. As a consequence, any prototype selection algorithm should only consider cells in conformity with the expert’s intention. Application of the method to the detection of the CA structure in rat hippocampi showed that even narrow elongated structures, only a few cells thick, can be wellsegmented using the proposed distance graph matching. Results obtained semiautomatically correlate well with manual countings of preserved cells in the CA1 region, as long as there are enough cells left to discern regular clusters. The other segmentation methods tested, based on the area of influence in the Voronoı̈ graph, the distance to the nearest neighbor, and the method of Lavine et al. [11], resulted in poor correlation between automatic segmentation and the countings by the expert. For the case of CA region determination, the proposed method proved to be compatible with the perception of the pathologist. We have not applied the method to other tissue segmentation problems. For the recognition of tissue architecture, the proposed distance graph matching algorithm has proven to be a useful tool. The method reduces the nonbiological variation in the analysis of tissue sections and thus improves confidence in the result. The present method can be applied to any field where regular patterns have to be recognized, as long as the directional distribution of neighbors may be neglected. 6.4 Appendix: Dynamic Programming Solution for String Matching The dynamic programming solution for matching the observation with the prototype is given in fig. 6.8. The graph searches a small set (horizontal) inside a larger set (vertical). The graph represents horizontally the prototype set P = {p1 , p2 , . . . , pk } and vertically the input set Nvl = {d1 , d2 , . . . , dl }. Each node C[i, j] in the graph represents the comparison between the ith element from the prototype with the j th element from the input set. The directional edges in the graph determine which operations (deletion, insertion, or matching/substitution) are necessary to obtain the observed and prototype distance at the same position in the comparison string. For instance, each valid path 110 Chapter 6. Segmentation of Tissue Architecture by Distance Graph Matching start p1 p2 Ci d1 Cd Cm p3 p4 … pk term Ci Cd Cm d2 Cd C[2,3] … d3 dl term Figure 6.8: The dynamic programming solution for string matching. from “start” to node C[2,3] describes the operations necessary to end up with a set where the third element in the observation is considered as the second one. A horizontal step represents insertion of the prototype element; the same observed distance is compared to the next prototype element. A vertical step implies deletion of an observation; the next observed distance is compared to the prototype element. Matching or substitution is represented by a diagonal step. A cost is assigned to each edge. Using an edge to reach a particular node implies the addition of the edge penalty to the total cost involved for reaching the node. Horizontal edges have cost ci ; vertical edges cost cd . The cost for diagonal edges depends on the comparison between the elements connected to the node from which the arrow starts. The cost is zero when the elements match (cm = 0), cs when the observed element is substituted for the prototype (when cs ≤ cm ), or cost cm for making them match otherwise. The cost to reach a particular node is the sum of all costs necessary when taking some valid path from “start” to the node considered. The minimum cost to reach that node is related to the path with the least total cost compared to all other possible paths. When considering only the previous nodes, i.e., all nodes from which the one under consideration can be reached, the problem can be reformulated into a recurrent relation. In this case, the minimum cost path is given by the least of the minimum cost paths to the previous nodes, increased by the cost for traveling to the node of interest. Comparison begins at the “start” node, and each column is processed consecutively 111 Bibliography from top to bottom. In this manner, the minimum cost paths to the previous nodes are already determined when arriving at a particular node. The minimum cost to reach the node under consideration is then given by: C[i, j − 1] + cd C[i − 1, j] + ci C[i, j] = min C[i − 1, j − 1] + cm C[i − 1, j − 1] + cs . (6.7) The initial value at “start” is zero; the cost from “start” to the first node is also zero. The cost assigned to nonexisting edges (at the border of the graph) are considered infinity. The “term” nodes at the bottom and right side of the graph are used for collecting the matching costs assigned to matching the last element in the observation (bottom) or the last element from the prototype (right). The term node C[k + 1, l + 1] describes the costs associated with matching the input set exactly to the prototype. The only interest is in finding the prototype in a (larger) number of observed distances, for which the cost is given by node C[k + 1, k + 1]. This is the first node where the observation is exactly transformed in the prototype. When there exist additional insert and delete operations on the observed set which results in a smaller matching cost, this path should be taken as the minimum cost path. Therefore, the minimum total cost is given by the minimum of the term nodes from C[term, k+1] to C[term, l+ 1]. The order of the string matching algorithm is O(l × k) [18]. Here, k is the amount of neighbors in the prototype, and l is the amount of neighbors taken from the observation. When the cost for matching is constant, as is the cost for substitution, then algorithms with a lower complexity are known to compare ordered sequences. Bibliography [1] N. Ahuja and M. Tuceryan. Extraction of early perceptual structure in dot patterns: Integrating region, boundary, and component gestalt. Comput. Vision Graphics Image Process., 48(3):304–356, 1989. [2] S. Beucher and F. Meyer. The morphological approach to segmentation: The watershed transformation. In E. R. Dougherty, editor, Mathematical Morphology in Image Processing, chapter 12, pages 433–481. Marcel Dekker, New York, 1993. [3] G. Bigras, R. Marcelpoil, E. Brambilla, and G. Brugal. Cellular sociology applied to neuroendocrine tumors of the lung: Quantitative model of neoplastic architecture. Cytometry, 24:74–82, 1996. 112 Chapter 6. Segmentation of Tissue Architecture by Distance Graph Matching [4] R. Chandebois. Cell sociology: A way of reconsidering the current concepts of morphogenesis. Acta Bioth., 25:71–102, 1976. [5] F. Darro, A. Kruczynski, C. Etievant, J. Martinez, J. L. Pasteels, and R. Kiss. Characterization of the differentiation of human colorectal cancer cell lines by means of Voronoı̈ diagrams. Cytometry, 14:783–792, 1993. [6] K. J. Dormer. Fundamental Tissue Geometry for Biologists. Cambridge Univ. Press, London, 1980. [7] C. Duyckaerts, G. Godefroy, and J. J. Hauw. Evaluation of neuronal numerical density by Dirichlet tessellation. J. Neurosci. Methods, 51:47–69, 1994. [8] M. Guillaud, J. B. Matthews, A. Harrison, C. MacAulay, and K. Skov. A novel image cytometry method for quantification of immunohistochemical staining of cytoplasmic antigens. Analyt. Cell. Pathol., 14:87–99, 1997. [9] M. Haseldonckx, J. Van Reempts, M. Van de Ven, and L. Wouters. Protection with lubeluzole against delayed ischemic brain damage in rats. Stroke, 28:428– 432, 1997. [10] H. Honda. Geometrical models for cells in tissues. Int. Rev. Cytol., 81:191–248, 1983. [11] D. Lavine, B. A. Lambird, and L. N. Kanal. Recognition of spatial point patterns. Pattern Rec., 16:289–295, 1983. [12] R. Marcelpoil and Y. Usson. Methods for the study of cellular sociology: Voronoı̈ diagrams and parametrization of the spatial relationships. J. Theor. Biol., 154:359–369, 1992. [13] G. A. Meijer, J. A. M. Beliën, P. J. van Diest, and J. P. A. Baak. Image analysis in clinical pathology. J. Clin. Pathol., 50:365–370, 1997. [14] J. Palmari, C. Dussert, Y. Berthois, C. Penel, and P. M. Martin. Distribution of estrogen receptor heterogeneity in growing MCF–7 cells measured by quantitative microscopy. Cytometry, 27:26–35, 1997. [15] C. R. Rao and S. Suryawanshi. Statistical analysis of shape of objects based on landmark data. Proc. Natl. Acad. Sci. USA, 93:12132–12136, 1996. [16] E. Raymond, M. Raphael, M. Grimaud, L. Vincent, J. L. Binet, and F. Meyer. Germinal center analysis with the tools of mathematical morphology on graphs. Cytometry, 14:848–861, 1993. [17] K. Rodenacker and P. Bischoff. Quantification of tissue sections: Graph theory and topology as modelling tools. Pattern Rec. Lett., 11:275–284, 1990. Bibliography 113 [18] D. Sankoff and J. B. Kruskal. Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, 1983. [19] H. Schwarz and H. E. Exner. The characterization of the arrangement of feature centroids in planes and volumes. J. Microscopy, 129:155–169, 1983. [20] L. B. Sheiner and S. L. Beal. Some suggestions for measuring predictive performance. J. Pharmacokinet. Biopharm., 9(4):503–512, 1981. [21] A. W. M. Smeulders, S. Delgado Olabariagga, R. van den Boomgaard, and M. Worring. Design considerations for interactive segmentation. In R. Jain and S. Santini, editors, Visual Information Systems 97, pages 5–12, San Diego, 1997. Knowledge Systems Institute. [22] L. Vincent. Graphs and mathematical morphology. Signal Processing, 16:365– 388, 1989. [23] F. Wallet and C. Dussert. Multifactorial comparative study of spatial point pattern analysis methods. J. Theor. Biol., 187:437–447, 1997. Chapter 7 A Minimum Cost Approach for Segmenting Networks of Lines submitted to the International Journal of Computer Vision. Alice came to a fork in the road. ‘Which road do I take?’ she asked. ‘Where do you want to go?’, responded the Cheshire cat. ‘I don’t know.’ Alice answered. ‘Then,’ said the cat, ‘it doesn’t matter.’ – –Lewis Carroll. The detection of lines in images is an important low-level task in computer vision. Successful techniques are available for the detection of curvilinear structures [4, 6, 12]. They are applied in pharmaceutical research, where interesting tissue parameters can be obtained by the extraction of bloodvessels, neurites, or tissue layers. Furthermore, the extraction of roads, railroads, rivers and channels from satellite or aerial images can be used to update geographic information systems. A higher level of information is obtained by connecting the lines into networks. Applications here can be found in the roads between crossings or highways connecting cities, the railway system in between stations, the neurite system connecting the neurons, all yielding organizational information of the network under consideration. Extraction of line networks rests on the detection of connections, the vertices in the network, as well as their interconnecting curves. The linking of line points over the interconnections is an ill-defined problem since the curves are likely to contain gaps and branches. More attractive is to find the minimum cost path between vertices, the path which contains most line evidence. The vertices can be used to guide the line tracking. Network extraction is then reduced to tracing lines between vertices. 115 116 Chapter 7. A Minimum Cost Approach for Segmenting Networks of Lines In this chapter, we consider the robust extraction of networks of lines by the application of minimum cost graphs. Design objective is robustness against gaps in lines, which we consider as the most prominent source of error in network extraction. We propose a robust measure for edge saliency, which indicates confidence for each connection. 7.1 Network Extraction Algorithm A network consists of vertices interconnected by lines. Definition 16 (Network of Lines) A network of lines is defined by a set of vertices indicating line end points, and the corresponding set of lines representing interconnections, where none of the lines do cross. The definition above implies vertices at crossings. The network can be segmented by tracing the lines between vertices. Therefore, four steps are considered: a. the detection of line points, b. the detection of vertices, c. finding the optimal paths between neighboring vertices yielding the lines, and d. the extraction of the network graph from the set of vertices and lines. A flow diagram is given in fig. 7.1. Post-processing may include pruning of the graph to remove false branches, and the assignment of confidence levels to the found graph. Graph confidence is given by the saliency of the detected lines, and the basin coverage indicating how much line evidence is covered by the graph. If the network graph covers all line evidence, no lines are missed. However, if not all line evidence is covered by the graph, lines may be missed during extraction. Hence, basin coverage together with edge saliency indicate missed lines and spurious lines in the network graph. Each of these steps are described in further detail below. 7.1.1 Vertex Detection For specific applications, the network vertices are geometrical structures which are more obvious to detect than the interconnecting lines. Often, these are salient points in the image. We assume these structures to be detected as landmarks to guide the line tracing algorithm. For a general method one may rely on the detection of saddlepoints, T-junctions, and crossings to obtain vertices [9, 13]. 7.1.2 Line Point Detection Theoretically, in two-dimensions, line points are detected by considering the second order directional derivative in the gradient direction [12]. For a line point, the first order directional derivative perpendicular to the line vanishes, where the second order directional derivative exhibits an extremum. Hence, the second order directional 117 7.1. Network Extraction Algorithm (a) (b) Figure 7.1: Flow diagram for network extraction. a. Action flow diagram, b. the corresponding data flow. Graph extraction results in the network graph, line saliency indicating confidence for the extracted lines, and basin coverage indicating missed lines. derivative perpendicular to the line is a measure of line contrast. The second order directional derivatives are calculated by considering the eigenvalues of the Hessian, µ ¶ fxx fxy H= (7.1) fxy fyy given by λ± = fxx + fyy ± q 2 (fxx − fyy ) + 4fxy 2 (7.2) where f (x, y) is the grey-value function and indices denote differentiation. After ordering of the eigenvalues by magnitude, |λ+ | > |λ− |, λ+ yields the second order directional derivative perpendicular to the line. Bright lines are observed when λ + < 0 and dark lines when λ+ > 0 [10]. For both types of lines, the magnitude |λ+ | indicates line contrast. Note that this formulation is free of parameters. In practice, one can only measure differential expressions at a certain observation scale [5, 7]. By considering Gaussian weighted differential quotients, f xσ = G(σ)x ∗ f , a measure of line contrast is given by ¯ ¯ 1 R(x, y, σ) = σ 2 ¯λσ+ ¯ σ b (7.3) where σ, the Gaussian standard deviation, denotes the scale for observing the eigenvalues, and where line brightness b is given by ( fσ if λσ+ ≤ 0, σ (7.4) b = σ W −f otherwise. 118 Chapter 7. A Minimum Cost Approach for Segmenting Networks of Lines Line brightness is measured relative to black for bright lines, and relative to white level W (255 for an 8-bit camera) for dark lines. The original expression (eq. 7.2) is of dimension [intensity/pixel2 ]. Multiplication by σ 2 , which is of dimension [pixel2 ], normalizes line contrast (eq. 7.3) for the differential scale. Normalization by line brightness b results in a dimensionless quantity. As a consequence, the value of R(.) is within [0 . . . 1]. The response of the second order directional derivate |λ+ | does not only depend on the image data, but it is also affected by the Gaussian smoothing scale σ. By analysis of the response to a given line profile as function of scale, one can determine the optimal scale for line detection. For a bar-shaped line profile of width w, the response of R(.) (eq. 7.3) as function of the quotient q = w/σ is plotted in fig. 7.2. The response of R(.) is biased towards thin lines, and gradually degrades for larger w. For a thin line w → 0 the response equals line contrast, whereas for a large value of w relative to σ the response vanishes. Hence, the value of σ should be large enough to capture the line width. For optimal detection of lines, the value of σ should at least equal the width of the thickest line in the image, σ≥w . (7.5) When line thickness varies, one can set the value of σ to the size of the thickest line ŵ to expect, σ = ŵ . (7.6) In this case, response is slightly biased to thin lines. The differential equation (eq. 7.3) is a point measure, indicating if a given pixel belongs to a line structure or not. The result is not the line structure itself, but a set of points accumulating evidence for a line. In the sequel we will discuss how to integrate line evidence to extract line structures. 7.1.3 Line Tracing Consider a line and its two endpoints S1 and S2 . For all possible paths Ξ between S1 and S2 , the path which integrates most line evidence is considered the best connection between the vertices. Therefore, we reformulate the line tracing to a minimum cost optimization problem. First, let r(x, y, σ be a cost function depending on R(.) (eq. 7.3), r(x, y, σ) = ² ² + R(x, y, σ) and let us define the path integral, taking σ for granted, to be Z S2 c(S1 , S2 ) = min r (x(p), y(p)) dp . Ξ S1 (7.7) (7.8) 119 7.1. Network Extraction Algorithm 1 0.9 0.8 R(0,q) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.5 1 1.5 2 2.5 3 3.5 4 q Figure 7.2: Response of R(.) (eq. 7.3) at the centerline as function of relative line width q = w/σ. Here, (x(p), y(p)) is the Cartesian coordinate of the path parameterized by the linear path coordinate p. The path integral c(S1 , S2 ) now yields the integrated cost (eq. 7.7) over the best defined path in terms of line contrast R(.). For high line contrast, the line is well-defined, the cost r(.) should be determined by 1/R(.) ≈ 0. For a low value of R(.), the cost should approximate 1, such that the Euclidian shortest path is traced. Hence, the constant term ² in (eq. 7.7) determines the trade-off between either following the maximum line contrast or taking the shortest route. The value of ² is typically very small, e.g. ² = 0.001, compared to the line contrast, henceforth assures that plateaus are crossed. Note that line extraction does not introduce additional parameters. 7.1.4 Graph Extraction Now consider an image containing vertices S = {S1 , S2 , . . . , Sn }. For our case, lines are connecting neighboring vertices. The aim is to extract the network graph G = (S, E) with vertices S, and edges E, the interconnecting lines given by the minimum cost paths. As there will be no crossing paths (see section 7.1.1), the graph G may be found by a local solution. Hence, we concentrate on connecting neighboring vertices. Neighbors are defined by assigning a zone of influence to each vertex, where each region Z(Si ) defines the area for which all points p are closer to Si than to any other vertex [15], © ª Z(Si ) = p ∈ IR2 , ∀A ∈ S \ {Si }, c(p, Si ) < c(p, A) . (7.9) Here, distance is measured with respect to cost c(p, Si ) (eq. 7.8). The regions of in- 120 Chapter 7. A Minimum Cost Approach for Segmenting Networks of Lines fluence correspond to the catchment basins of the topographical watershed algorithm [11]. Neighboring vertices to Si are defined by the set of all vertices for which the zone of influence touches that of Si . Hence, neighboring vertices share an edge in the topographical watershed segmentation. The minimum cost path Ψij between Si and Sj runs over the edge shared by Si and Sj . The graph G is computed by applying the topographical watershed transform. First, the grey-weighted distance transform is applied on the cost image given by r (eq. 7.7), with the vertices S as mask. The grey-weighted distance transform propagates the costs from the masks over their influence area, resulting in a wavefront collision at places where two zones of influence meet. The collision events result in the edges between neighboring vertices, yielding the watershed by topographic distance. The minimum cost path between two neighboring vertices runs over the minimum in their common edge. Therefore, any edge between two neighboring vertices is traced for its minimum. Steepest descent on each side of the saddlepoint results in the minimum cost path between the vertices. Tracing the steepest descents for all different borders between the zones of influence results in the network graph G. The described algorithm requires one distance transform to find the zones of influence. Hence, the order of the algorithm is determined by the grey-weighted distance transform, which is of order O(N 2 ), N being the image dimension [14]. Note that the graph algorithm is free of parameters. 7.1.5 Edge Saliency and Basin Coverage A natural measure of edge saliency is the integrated line contrast (eq. 7.3) over the edge, s(S1 , S2 ) = Z S2 R (x(p), y(p)) dp (7.10) S1 where S1 , S2 are start and end node, and where (x(p), y(p)) is the path. Note that s(.), as R, is a dimensionless and parameter free quantity. A confidence measure indicating how well the edge is supported by the image data is given by the average saliency over the line, s̄(S1 , S2 ) = 1 s(S1 , S2 ) l (7.11) where l is the path length. Again, s̄ is a dimensionless quantity, with range [0 . . . 1], a high value indicating a well-defined line. Each basin in the minimum cost graph is surrounded by a number of connected paths forming the basin perimeter. An indication of segmentation confidence for a basin B may be obtained by considering the average saliency over the surrounding lines, compared to the average line contrast inside the graph basins. The average 121 7.1. Network Extraction Algorithm saliency over the basin perimeter is given by I 1 s̄B (B) = R (x(p), y(p)) dp l (7.12) B l being the basin perimeter, and p representing the linear path coordinate. A high value, in the range [0 . . . 1], indicates a well-outlined basin. The average line contrast within basin B is measured by ZZ 1 R (x(p), y(p)) dx dy . (7.13) c̄B (B) = A(B ª ) Bª B ª is the basin eroded by a band of thickness given by σ. Erosion is applied to prevent the detected line points, smoothed by the Gaussian at scale σ, to influence the basin contrast. In (eq. 7.13), A(.) is the area of the eroded basin. The value of c̄B increases when line structures are present inside the basin, maybe due to a missed line in the graph. Coverage of the graph G is defined by the ratio of the line contrast remaining inside the basins relative to the line contrast covered by the graph edges, c̄(B) = 1 − c̄B (B) . s̄B (B) (7.14) When all line points are covered by the basin perimeter, c̄ will be close to one. For a basin containing a missed line, the average line contrast over the basin will be high. When a spurious edge outlines the basin, summed contrast over the edges will be low, yielding a lower coverage value. 7.1.6 Thresholding the Saliency Hierarchy The graph G is constructed such that neighboring vertices are connected, regardless the absence of interconnecting lines. For a spurious connection saliency will be low, since evidence of a connecting line is lacking. Pruning of the graph for spurious lines may be achieved by thresholding on saliency. Pruning by saliency of G imposes a hierarchy on the graph, ranging from graph G with all edges included, up to the graph consisting of the one best defined edge in terms of contrast. The threshold parameter indicates the saliency level of the hierarchy. Note the introduction of a parameter, indicating the application dependent hierarchy level of the graph. We propose two methods to prune edges by saliency. First, global pruning may proceed by removing all ill-defined lines for which s̄(S1 , S2 ) < α . (7.15) The resulting graph consists of the most contrasting lines, removing lines for which contrast is below the threshold. The method is applicable when a clear distinction between lines and background is present. 122 Chapter 7. A Minimum Cost Approach for Segmenting Networks of Lines For the case of textured background, a local pruning method based on local comparison of edge saliency may be applied. Pruning of low confidence edges is installed by removing all edges for which an alternative path can be found, via other vertices, with higher confidence. Path confidence between S1 and Sn via vertices Si is defined by the average saliency over the n edges, Ãn−1 ! 1 X s(Si , Si+1 ) . (7.16) s̄(S1 , S2 , . . . , Sn ) = l i=1 Here, l is the total path length. The direct path between S1 and Sn is pruned when min s̄(S1 , . . . , Sn ) < αs̄(S1 , Sn ) (7.17) where the minimum is taken over all alternative paths between S1 and Sn . Locally ill-defined lines are removed from the graph, the degree of removal given by α. For α = 0, no lines are removed, whereas for α = 1 all lines are removed for which a detour via another vertex yields higher saliency. Hence, short ill-defined paths are pruned when longer well-defined paths exist between the vertices. The method is applicable for a textured background, and when enough connections are present to determine alternative routes between neighboring vertices. 7.1.7 Overview The algorithm is illustrated in fig. 7.3. The figure shows the extraction of cell borders from heart tissue (fig. 7.3a). Extracted vertices are indicated in fig. 7.3b. Line contrast is calculated according to (eq. 7.3), shown in fig. 7.3c. The tracing of minimum cost paths is shown in fig. 7.3d. Most of the lines are correctly detected, together with some spurious lines. Local pruning the graph results in fig. 7.3e. Here, all edges which are not supported by the image data are removed. Figure 7.3f shows the area coverage, where black indicates c̄ = 1, and white indicates c̄ = 0. In summary, we have proposed a one-parameter algorithm for the extraction of line networks. The parameter indicates the saliency level in a hierarchical graph. The graph tessellates the image into regions, where each edge travels over the minimum cost path between vertices. The resulting graph is labeled by edge saliency and area coverage, both derived from line contrast. 7.1.8 Error Analysis The robustness of the proposed algorithm can best be evaluated when considering the different types of errors that may occur in forming the network graph. Table 7.1 gives an overview of possible errors and their consequences on the network graph G. The columns give a complete representation of the consequences an error may have on the network graph G. The rows overview the errors which may result from the vertex 123 7.1. Network Extraction Algorithm and line detection. In the sequel we discuss the sensitivity of the proposed algorithm to these types of errors. When the image contains textured regions, the texture may cause a high response for the line point detection. Hence, the algorithm will falsely respond to the texture as being an underbroken line and find an optimal path, illustrated in fig. 7.4a. Further, when spurious line structures are present in the image data, without being part of the network, distortions may occur when the line is near other interconnections. In that case, the best path between vertices may be via the spurious line. An example is shown in fig. 7.4b, where text interferes with dashed line structures. For missed lines, basin coverage degrades. As the line structure is not part of the network, such sensitivity is unwanted. Gaps in lines, or lines slightly off the vertex, illustrated by fig. 7.4c, will have no consequences except that saliency degrades. When a line structure is of too low contrast to contribute enough to form a line, the line maybe pruned after confidence thresholding. An example of a missed line is shown in fig. 7.4d. As a consequence, coverage degrades, thereby indicating the event of a missed line. For the case of a falsely detected vertex off a line (no example available), the vertex will be connected to the network. Saliency of the spurious lines will be low as line evidence is missing from the image. Hence, pruning of the network by saliency is likely to solve such errors. Spurious or missed vertices at lines has, except for the insertion or deletion of a vertex, respectively, no consequence for the extracted network. An examples of spurious vertices is given in fig. 7.4e. The measure of saliency is invariant for insertion and deletion of vertices. This is proven by considering the path integral (eq. 7.10). Insertion of a vertex Sx at the path S1 , S2 results in s(S1 , Sx ) + s(Sx , S2 ) = = Z Sx R (x(p), y(p)) dp + S1 Z S2 Z S2 R (x(p), y(p)) dp Sx R (x(p), y(p)) dp S1 = s(S1 , S2 ) t u which is of course similar to the original saliency. Invariance for vertex deletion follows from the reverse argumentation. For the average line contrast within the graph basins is not affected by insertion or deletion of vertices at edges, coverage (eq. 7.14) is invariant for vertex insertion or deletion at lines. More critical is overlooking a vertex at a fork or line-end. An example of a missing vertex is shown in fig. 7.4f. In both cases, an edge is missed in the resulting graph, and coverage degrades as not all line points are covered by the graph edges. For the missing of a vertex at a line end, the line is maybe connected to a different vertex, 124 Chapter 7. A Minimum Cost Approach for Segmenting Networks of Lines Table 7.1: Types of errors, in general for extraction of networks of lines and their consequences. Columns denote events in graph construction, whereas rows represent detection errors. Wanted sensitivity of the proposed algorithm is indicated by “+”, whereas unwanted sensitivity to errors is indicated by “–”. Robustness of the proposed method to errors is indicated by ¤. Error type spurious line gap in line line off vertex missed line spurious vertex off line at line missed vertex at line at fork at line end vertex insert ¤ ¤ ¤ ¤ vertex delete ¤ ¤ ¤ ¤ edge insert ¤ ¤ ¤ ¤ edge delete ¤ ¤ ¤ – edge deviation – ¤ ¤ ¤ saliency (eq. 7.10) ¤ + + ¤ coverage (eq. 7.14) – ¤ ¤ + – – ¤ ¤ – ¤ ¤ ¤ ¤ ¤ + ¤ ¤ ¤ ¤ ¤ ¤ – – – ¤ ¤ ¤ ¤ – – ¤ ¤ – ¤ ¤ ¤ ¤ + + causing the crossing of the background by the minimum cost path. Pruning of the network by saliency is likely to solve the error. Except for errors general for the extraction of networks of lines, the proposed algorithm generates errors specific for minimum cost path based methods. By definition, only one path between two vertices can be of minimum cost. Any other path connecting the same vertices will be removed, as illustrated in fig. 7.5a. As a consequence, an edge is missed in the graph, and basin coverage degrades indicating the event of a missed line. Further, when a better defined path exists in the neighborhood of the traced path, the algorithm tends to take a shortcut via the better defined path, as shown in fig. 7.5b. In that case, coverage degrades to indicate the missed line, whereas saliency increases due to the better defined route. In conclusion, the proposed method is robust against: a. gaps in lines, b. lines slightly off their vertex, c. spurious lines, and d. spurious vertices at lines. The algorithm is sensitive to: a. missed lines, b. spurious vertices off lines, and c. missed vertices at forks. For missed vertices, the resulting graph is degraded. For missed lines, the graph may be degraded, and confidence of the area in which the missed line is situated may be too high. Specific for the algorithm is the sensitivity to shortcuts, and the inability to trace more than one line between connections. 7.2. Illustrations 7.2 7.2.1 125 Illustrations Heart Tissue Segmentation Figure 7.3 illustrates the application of the proposed algorithm on the extraction of cells from heart tissue. The tissue consists of cardiac muscle cells, the dark textured areas, and bloodvessels, the white discs. Cell borders are transparent lines surrounding all cardiac muscle cells. Due to the dense packing of cells, bloodvessels are squeezed between the cells. The cell borders appear as bright lines connecting the bloodvessels. Further, the dense packing causes gaps in the lines at places were light microscopic resolving power is too low to examine the cell border. In the cardiac muscle cell application, the bloodvessels are considered as initial vertices. The vessels are detected by dome extraction [3] (fig. 7.3a). The extracted network graph, together with basin saliency and coverage, is shown in fig. 7.3d,e,f. The heart tissue segmentation is a successful application in that a large number of cells is correctly segmented by the proposed algorithm. Individual cell parameters may be estimated after selecting those cells with high saliency and coverage. The amount of cells extracted from the tissue is in the same range as for qualitative studies based on interactively outlining the cells by experts [1, 2, 16]. Hence, the algorithm enables the quantitative assessment of morphological changes in heart tissue at a cellular level. 7.2.2 Neurite Tracing A second example fig. 7.6 shows interactive segmentation of neurites. The neurite starting points at the cell bodies are interactively indicated, and used as initial markers for the network segmentation algorithm. The resulting network is shown in fig. 7.6b. In this case, pruning of lines is not possible since no alternative routes between the markers are present. Paths between cells which are not connected are removed by thresholding the saliency (eq. 7.15). Note that no errors are caused from lack of line structure, indicated in fig. 7.6a. The overall saliency of the result is s̄ = 0.44, indicating that the line contrast spans almost half the dynamic range of the camera. Coverage c̄ = 0.95, indicating that 95% of the line structures present in the image is covered by the network graph. Hence, the result is considered highly accurate. 7.2.3 Crack Detection An example of general line detection is shown in fig. 7.7, where cracks in ink at high magnification are traced. The image shows an ink line, at such a magnification that the ink is completely covering the image. Cracks in the ink form white lines, due to the transmission of light, against a background of black ink. Note that no natural markers are present. 126 Chapter 7. A Minimum Cost Approach for Segmenting Networks of Lines For the general case of line detection, saddlepoint detection may be used to extract markers. The saddlepoints on bright lines are detected by fxσ = 0, fyσ = 0, λσ+ < 0 , σ σ σ 2 fxx fyy − fxy < −α . (7.18) Here, α indicates salient saddlepoints, and is typically small to suppress spurious saddlepoints due to noise. The saddlepoints are used as markers for the network extraction algorithm. The detected saddlepoint are highlighted in fig. 7.7b. The result of the proposed algorithm, the saddlepoints as vertices, is shown in fig. 7.7c. Average saliency is thresholded (eq. 7.15) to remove paths which cross the background. Overall saliency of the graph is 0.313, and coverage 0.962. The cracks are successfully extracted by the proposed algorithm, except that the crack ends are missing when end markers are absent. In that case, the detected cracks are too short. Since no natural markers are present, the algorithm should be robust against marker insertion or deletion at lines. Figure 7.7d shows the result after random removal of half the markers in fig. 7.7c. Errors in the result include a shortcut via a more contrasting line. Further, line ends are pruned due to the absence of markers. Note that saliency is only marginally affected by the new situation, 0.316 instead of 0.313, whereas coverage likewise is reduced marginally, from 0.962 to 0.960. Hence, the algorithm is robust for variations in the threshold value α for saddle point detection (eq. 7.18). 7.2.4 Directional Line Detection Characteristic for the proposed algorithm is that line evidence is accumulated over the line. When line evidence is absent, the algorithm optimizes the shortest path to the neighboring line parts to continue integration. As a result, when large gaps are present, the algorithm may find an alternative route by crossing the background to a neighboring line, tracking that line, and jumping back to the original line after the gaps. The problem may be solved by including line orientation information into the algorithm. To proceed, we consider directional filtering for detection of line contrast. Consider (eq. 7.3), which was measured by isotropic Gaussian filters of scale σ. For the directional filtering, we consider anisotropic Gaussian filters of scale σ l and σs , for longest and shortest axis, respectively, and of orientation θ. Hence, line contrast is given by ¯ ¯ 1 ¯ ¯ R0 (x, y, σl , σs , θ) = σl σs ¯λσ+l ,σs ,θ ¯ σ ,σs , (7.19) b l where bσl ,σs is given by (eq. 7.4). The scale σs depends on line width as given by (eq. 7.6), whereas σl is tuned to adequately capture line direction. Hence, σl should 7.3. Conclusion 127 be large enough to bridge small gaps, but should be not too large to prevent errors when line curvature is high. In practice, an aspect ratio of σs = ŵ and σl = 3σs is often sufficient. Now we have established how to filter in a particular direction, the filter need to be tuned to the line direction. Therefore, two options are considered. First, eigenvector analysis of the Hessian results in the principal line direction. One could apply a first undirectional pass to obtain line direction, as described by Steger [12]. A second pass yields the tuning of the filter at each pixel in the line direction to obtain line contrast. Note that the filter orientation may be different for each position in the image plane. Instead of tuning the filter, sampling the image at different orientations may be applied. One applies (eq. 7.19) for different orientations. When the filter is correctly aligned with the line, filter response is maximal, whereas the filter being perpendicular to the line results in low response. Hence, the per pixel maximum line contrast over the orientations yields directional filtering. The proposed method is applied to an example of a dashed line pattern as given in fig. 7.8a, taken from [8]. The example is taken from the hardest class, the “complex” patterns. The grey dots represent interactively selected markers, indicating crossings and line end points. Orientation filtering is applied at 0◦ , 30◦ , 60◦ , 90◦ , 120◦ , and 150◦ , for which the maximum line contrast per pixel is taken over the sampled orientations. The result after graph extraction and saliency thresholding is shown in fig. 7.8b. The crude sampling of orientation space causes some of the lines to be noisy. A better sampling enhances the result. Further, one line part is missed, due to a shorter line connecting the same markers. The text present in the example causes the algorithm to follow parts of the text instead of the original line. Not that isotropic line detection does not adequately extract the graph (fig. 7.8c). 7.3 Conclusion The extraction and interpretation of networks of lines from images yields important organizational information of the network under consideration. We present a oneparameter algorithm for the extraction of line networks from images. The parameter indicates the extracted saliency level from a hierarchical graph. Input for the algorithm is the domain specific knowledge of interconnection points. The algorithm results in the network graph, together with edge saliency, and catchment basin coverage. The proposed method assigns a robust measure of saliency to each minimum cost path, based on the average path cost. Edges with a low saliency compared to alternative routes are removed from the graph, leading to an improved segmentation result. The correctness of the network extraction is indicated by the edge saliency and area coverage. Hence, confidence in the final result can be based on the overall 128 Chapter 7. A Minimum Cost Approach for Segmenting Networks of Lines network saliency. Design issues are robustness against general errors summarized in tab. 7.1. The proposed method is robust against: a. gaps in lines, b. lines slightly off their vertex, c. spurious lines, and d. spurious vertices at lines. The algorithm is sensitive to: a. missed lines, b. spurious vertices off lines, and c. missed vertices at forks. Thresholding on saliency reduces the errors caused by spurious vertices. Missed lines are signaled by a measure of coverage (eq. 7.14), indicating how much of the line evidence is covered by the network graph. Specific for the algorithm is the sensitivity to shortcuts, and the inability to trace more than one line between connections. Any algorithm based on minimum cost paths is sensitive to these types of errors. We restricted ourselves to locally defined line networks, where lines are connecting neighboring vertices. For globally defined networks, like electronic circuits, the algorithm can be adapted to yield a regional or global solution. Therefore, several distance transforms have to be applied, at the cost of a higher computational complexity. The pruning of the network, and the measure of saliency is again applicable for the global case. Incorporation of line directional information into the algorithm results in better estimation of line contrast, hence improves graph extraction. The eigenvector analysis of the directional derivatives yields an estimate of the local direction of the line. The directional information may be included by considering an anisotropic metric for the line contrast filtering. Experiments showed a better detection of the network graph for dashed line detection. The example given is considered as a complex configuration, according to [8]. Disadvantage is a longer computation time, due to the anisotropic filtering pass. The proposed method results in the extraction of networks from connection to connection point. The routing from a starting connection to its final destination depends on the functionality of the network, and is not considered in this chapter. Correct interpretation of the network in the presence of distortion obviously requires information on the function of the network. For the extraction of line networks the proposed method has proven to be a useful tool. The method is robust against gaps in lines, and against spurious vertices at lines, which we consider as the most prominent source of error in line detection. Hence, the proposed method enables reliable extraction of line networks. Furthermore, the method indicates detection confidence, thereby supporting error proof interpretation of the network functionality. The proposed method is applicable on a broad variety of line networks, including dashed lines, as demonstrated by the illustrations. Hence, the proposed method yields a major step towards general line tracking algorithms. Bibliography 129 Bibliography [1] J. Ausma, M. Wijfels, F. Thoné, L. Wouters, M. Allessie, and M. Borgers. Structural changes of atrial myocardium due to sustained atrial fibrillation in the goat. Circulation, 96:3157–3163, 1997. [2] C. A. Beltrami, N. Finato, M. Rocco, G. A. Feruglio, C. Puricelli, E. Cigola, F. Quaini, E. H. Sonnenblick, G. Olivetti, and P. Anversa. Structural basis of end-stage failure in ischemic cardiomyopathy in humans. Circulation, 89:151–163, 1994. [3] S. Beucher and F. Meyer. The morphological approach to segmentation: The watershed transformation. In E. R. Dougherty, editor, Mathematical Morphology in Image Processing, chapter 12, pages 433–481. Marcel Dekker, New York, 1993. [4] L. D. Cohen and R. Kimmel. Global minimum for active contour models: A minimal path approach. Int. J. Computer Vision, 24:57–78, 1997. [5] L. M. J. Florack, B. M. ter Haar Romeny, J. J. Koenderink, and M. A. Viergever. Scale and the differential structure of images. Image and Vision Computing, 10(6):376–388, 1992. [6] J. Illingworth and J. Kittler. A survey of the Hough transform. Computer Vision Graphics Image Process., 44:87–116, 1988. [7] J. J. Koenderink. The structure of images. Biol. Cybern., 50:363–370, 1984. [8] B. Kong, I. T. Phillips, R. M. Haralick, A. Prasad, and R. Kasturi. A benchmark: Performance evaluation of dashed-line detection algorithms. In R. Kasturi and K. Tombre, editors, Graphics Recognition Methods and Applications, pages 270– 285. Springer-Verlag, 1996. [9] T. Lindeberg. Scale-Space Theory in Computer Vision. Kluwer Academic Publishers, Boston, 1994. [10] C. Lorenz, I. C. Carlsen, T. M. Buzug, C. Fassnacht, and J. Weese. A multiscale line filter with automatic scale selection based on the hessian matrix for medical image segmentation. In Scale Space Theories in Computer Vision, pages 152–163. Springer-Verlag, 1998. [11] F. Meyer. Topographic distance and watershed lines. Signal Processing, 38:113– 125, 1994. [12] C. Steger. An unbiased detector of curvilinear structures. IEEE Trans. Pattern Anal. Machine Intell., 20:113–125, 1998. 130 Chapter 7. A Minimum Cost Approach for Segmenting Networks of Lines [13] B. M. ter Haar Romeny, editor. Geometry-Driven Diffusion in Computer Vision. Kluwer Academic Publishers, Boston, 1994. [14] P. W. Verbeek and J. H. Verwer. Shading from shape, the Eikonal equation solved by grey-weighted distance transformation. Pat. Rec. Let., 11:681–690, 1990. [15] L. Vincent. Graphs and mathematical morphology. Signal Processing, 16:365– 388, 1989. [16] H. W. Vliegen, A. van der Laarse, J. A. N. Huysman, E. C. Wijnvoord, M. Mentar, C. J. Cornelisse, and F. Eulderink. Morphometric quantification of myocyte dimensions validated in normal growing rat hearts and applied to hypertrophic human hearts. Cardiovasc. Res., 21:352–357, 1987. 131 Bibliography (a) (b) (c) (d) (e) (f) Figure 7.3: Example of line detection on heart tissue (a), observed by transmission light microscopy. The dark contours show the segmented blood vessels, superimposed on the original image. Line contrast R(.) is shown in (b), the minimum cost graph in (c). The final segmentation (d) after local pruning of spurious edges for α = 0.9 (eq. 7.17). The estimated saliency (e) (eq. 7.11) and area coverage (f ) (eq. 7.14), dark representing high confidence in the result. 132 Chapter 7. A Minimum Cost Approach for Segmenting Networks of Lines (a) (b) (c) (d) (e) (f) Figure 7.4: Example of failures in the line detection. a. The detection of a spurious line due to a textured region. b. The deviation of a line due to spurious line structures, the text, in the image. c. A gap in a line; the line is without errors correctly detected by the algorithm (result not shown). d. A missing connections due to lack of line evidence. e. Extra vertices on the line does not influence the algorithm performance. f. Missing of a vertex at a fork, resulting in a missed line in the network graph. 133 Bibliography (a) (b) Figure 7.5: Example of failures specific for algorithms based on minimum cost paths. a. The missing of a line due to the double linking of vertices, for which the best connection is preserved. b. A shortcut along a better defined lines to optimally connect two vertices. (a) (b) Figure 7.6: Extraction of a neurite network (a); note the gaps present in the neurites. The traced network is shown in (b). The dots represent the interactively indicated neurite start points at the cell bodies. 134 Chapter 7. A Minimum Cost Approach for Segmenting Networks of Lines (a) (b) (c) (d) Figure 7.7: Extraction of a general line network. a. shows a high magnification image of ink, completely covering the image, distorted by white cracks through which light is transmitted. No natural markers are present. b. The saddlepoints at the bright lines (eq. 7.18). c. The detected lines, overall saliency s̄ = 0.313, coverage c̄ = 0.962. d. The result for half the number of markers, overall saliency s̄ = 0.316, coverage c̄ = 0.960. Note the shortcut and the removal of line ends. 135 Bibliography (a) (b) (c) Figure 7.8: Extraction of a dashed line network (a), taken from [8]; markers are interactively selected at line crossings and line end points, indicated by grey dots. The extracted network is shown in (b). Errors made include some shortcuts, and the missing of the bend line part due to the presence of a second, shorter connection between the markers. Note the difference for the isotropic result for scale σl (c). The scale σl was taken to integrate line evidence over the gaps in the dashed lines. Deviation from centerline in the isotropic case is the result of the large integration scale compared to line width. Chapter 8 Discussion 8.1 Color In this thesis, we have developed a theory for the measurement of color in images, founded in physics as well as in measurement science. The thesis considers a physical basis for the measurement of spatio-spectral energy distributions, integrated with the laws of light reflection. The Gaussian color model solves the fundamental problem of color and scale by integrating the spatial and color information. The differential geometry framework [10, 4] is extended to the domain of color images. As a consequence, we have given a physical basis for the opponent color theory, and a physical basis for color receptive fields. Furthermore, it was concluded that the Gaussian color model is the natural representation for investigating the scaling behavior of color image features. The Gaussian color model describes the differential structure of color images. Selection of scale enables robust and accurate measurement of color value, even under noisy circumstances. Color perception is mainly constraint by the number of spectral measurements, the spectral resolution. Due to the limited space available on the retina, evolution was forced to trade-off between the number of different spectral receptors, and their spatial distribution. For humans, spectral vision is limited to three color samples, and a tremendous amount of spatial samples. Therefore, the Gaussian color model measures intensity, first order, and second order derivative of the incoming spectral energy distribution. Daylight has driven evolution to set the central wavelength of color vision at about 520 nm, and a spectral range of about 330 nm. For any colorimetric system, measurement is constraint by these parameters. A second achievement of the thesis is the integration of the physical laws of spectral image formation into the measurement of color invariants. We define a complete framework for the extraction of photometric invariant properties. The framework for color measurement is grounded in the physics of observation. Hence it is theoretically 137 138 Discussion better founded as well as experimentally better evaluated than existing methods for the measurement of color features in RGB-images. The framework can be applied to any vision problem where reflection laws are different from every day vision. Among other imaging circumstances, application areas are satellite imaging [3], vision in bad weather [7], and underwater vision. The physical model presented in Chapter 3 demands spatial comparison in order to achieve color constancy. The model confirms relational color constancy as a first step in color constant vision systems [2, 8]. The subdivision of human perception in edge detection based on color contrast, cooperating with a subsystem for assigning colors to the segmented visual scene, may yield an overall performance which is highly color constant. Hence, spatial edge detection based on color contrast plays an important role in color constancy. Most of the color invariant sets presented in Chapter 3 and Chapter 4 have spatial edge detection as lowest order expression. Edge detection is confirmed by Livingstone and Hubel [5] and by Foster [2] to be of primary importance in human vision. To cite Livingstone and Hubel about one of the three visual subsystems: Although neurons in the early stages of this system are colors-selective, those at higher levels respond to color-contrast borders but do not carry information about what colors form the border. They conclude that the subsystem is important in seeing stationary objects in great detail, given the slow time course and high resolution of the subsystem. From a physical perspective, these results are evident given the invariants derived in Chapter 4. We show in Chapter 4 that the discriminative power of the invariants is orderable by broadness of the group of invariance. A broad to narrow hierarchy of the invariance groups considered is given in section 4.2.6: H N C W E viewing direction surface orientation highlights illumination direction illumination intensity illumination color inter reflection + + + – – + + + – – + – – – – + + + – – + + + + – – + – – – – – – – – Invariance is denoted by +, whereas sensitivity to the imaging condition is indicated by –. The discriminative power of the invariance groups is given in section 4.3.2: 139 8.2. Geometrical Structure σx = 0.75 σx = 1 σx = 2 σx = 3 Ê 970 983 1000 1000 Ŵ 944 978 1000 1000 Ĉ 702 820 949 970 N̂ 631 757 962 974 Ĥ 436 461 452 462 The number refers to the amount of colors out of 1,000 patches still to be distinguished by the invariant, and is an absolute number given the hardware and spatial scale σx . For the proposed color invariants, discriminating power is increased when considering a larger spatial scale σx , thereby taking a larger neighborhood into account for determining the color value. Hence, a larger spatial scale results in a more accurate estimate of color at the point of interest, increasing the accuracy of the result. The aim of the thesis is reached in that high color discrimination resolution is achieved while maintaining constancy against disturbing imaging conditions, both theoretically as well as experimentally. The proposed invariance groups describe the local structure of color images in a systematic, irreducible, and complete sense. The invariance groups incorporate the physics of light reflection as well as the constraints imposed by human color perception. 8.2 Geometrical Structure Characterization of histological or pathological conditions can be based on the topographical relationship between tissue structures. Capturing the arrangement of local structure enables the extraction of global tissue architecture. Such an extraction procedure should be insensitive to distortions intrinsic to the acquisition of biological preparations. In this thesis, a graph based approach for the robust extraction of tissue architecture is established. Design issue is robustness against errors common in the preparation of biological tissues, like taking a transections through a three-dimensional block, and errors in the detection of cells, bloodvessels, and cell border. Biological variety, effectuating the architecture to be irregular, is taken as design issue rather than as error cause [1, 6, 9, 11]. As demonstrated in Chapter 6, these design considerations accomplished the recognition of tissue architecture. In both Chapter 6 and Chapter 7, the extraction of geometrical arrangements is based on local structure. Tissue architecture is derived from the local relationships between markers. Confidence in the final result is estimated by the saliency of the detected structures, and the goodness of fit to the quintessence of the architecture. 140 Discussion Robust extraction of tissue architecture reduces the nonbiological variation in the analysis of tissue sections and thus improves confidence in the result. The quantitative methods based on local structure enables reliable classification of areas by type of tissue. Combining the methodology as proposed in this thesis enables effective analysis and interpretation of histologically stained tissue sections. The proposed frameworks allow for fully automatic screening of drug targets in pharmaceutical research [12]. 8.3 General Conclusion This thesis makes a contribution to the field of color vision. The constraints imposed by human color vision are incorporated in the physical measurement of spatiospectral energy distributions. The spatial interaction between colors is derived from the physics of light reflection. Hence, the proposed framework for color measurement enables the interpretation of color images from both a physical and a perceptual viewpoint. The second contribution of the thesis is the assessment of spatial arrangement. The methodology presented is applied to the segmentation of biological tissue sections observed by light microscopy. The proposed concepts can be utilized in other application areas. As demonstrated by Mondriaan, the combination of color and spatial organization captures the essential visual information, in that the subsystems dealing with shape and dealing with localization both are in effect. Hence, combining color and spatial structure yet to follow, and the way to go, resolves the perceptual organization of images, Victory Boogie Woogie. Bibliography [1] F. Darro, A. Kruczynski, C. Etievant, J. Martinez, J. L. Pasteels, and R. Kiss. Characterization of the differentiation of human colorectal cancer cell lines by means of Voronoı̈ diagrams. Cytometry, 14:783–792, 1993. [2] D. H. Foster and S. M. C. Nascimento. Relational colour constancy from invariant cone-excitation ratios. Proc. R. Soc. London B, 257:115–121, 1994. [3] G. Healey and A. Jain. Retrieving multispectral satellite images using physicsbased invariant representations. IEEE Trans. Pattern Anal. Machine Intell., 18:842–848, 1996. [4] T. Lindeberg. Scale-Space Theory in Computer Vision. Kluwer Academic Publishers, Boston, 1994. Bibliography 141 [5] M. Livingstone and D. Hubel. Segregation of form, color, movement, and depth: Anatomy, physiology, and perception. Science, 240:740–749, 1988. [6] R. Marcelpoil and Y. Usson. Methods for the study of cellular sociology: Voronoı̈ diagrams and parametrization of the spatial relationships. J. Theor. Biol., 154:359–369, 1992. [7] S. Narasimhan and S. Nayar. Chromatic framework for vision in bad weather. In Proceedings of the Conference on Computer Vision and Pattern Recognition, volume 1, pages 598–605. IEEE Computer Society, 2000. [8] S. M. C. Nascimento and D. H. Foster. Relational color constancy in achromatic and isoluminant images. J. Opt. Soc. Am. A, 17(2):225–231, 2000. [9] E. Raymond, M. Raphael, M. Grimaud, L. Vincent, J. L. Binet, and F. Meyer. Germinal center analysis with the tools of mathematical morphology on graphs. Cytometry, 14:848–861, 1993. [10] B. M. ter Haar Romeny, editor. Geometry-Driven Diffusion in Computer Vision. Kluwer Academic Publishers, Boston, 1994. [11] H. W. Venema. Determination of nearest neighbours in muscle fibre patterns using a generalised version of the Dirchlet tesselation. Pat. Rec. Let., 12:445– 449, 1991. [12] K. Ver Donck, I. Maillet, I. Roelens, L. Bols, P. Van Osta, T. Bogaert, and J. Geysen. High density C. Elegans screening. In Proceedings of the 12th International C. Elegans Meeting, page 871, 1999. Samenvatting∗ Kleur en Geometrische Structuur in Beelden Toepassingen in microscopie Dit proefschrift behandelt zowel kleur als geometrische structuur. Kleur wordt benaderd vanuit de theoretische meettechniek, waarbij kleur het resultaat is van een lokale, spatio-spectrale apertuur meting. Vervolgens wordt differentiaalrekening gebruikt voor het afleiden van kenmerken invariant onder alledaagse belichtingsomstandigheden. De kenmerken worden in experimenten uitvoerig getest op invariantie en discriminerend vermogen. De experimenten tonen het hoge discrimenerende vermogen aan van de veschillende invarianten, en demonsteren daarbij de invariante eigenschappen. Nieuw in het proefschrift is de koppeling tussen het fysisch meten van kleur en de perceptie van kleur door de mens. Verder behandelt dit proefschrift het kwantificeren van geometrische structuren, specifiek toegepast in licht-microscopie, hoewel de ontwikkelde methodologie breder toepasbaar is. Graaf mathematisch-morfologische methodes worden ontwikkeld voor het segmenteren van regelmatige punt- en lijn-patronen, zoals aanwezig in hersen- en hartweefsel. De ontwikkelde methodes zijn succesvol toegepast bij het kwantificeren van morfologische parameters. De methodes kunnen worden ingezet bij het zoeken naar potentiële medicamenten in farmaceutisch onderzoek. ∗ Summary in Dutch 143 Dankwoord† Het proefschrift is de afronding van een leerproces, en is daarmee inherent beı̈nvloed door vele personen. Veel heb ik geleerd van mijn promotor, Arnold, die mij door de moeilijke passages in de tekst heen hielp, nadat hij z’n brood al had verdiend met een paar opmerkingen over het onderzoek zelf . . . Vooral het “pushen” van de wwwweekendjes naar de Ardennen, en later de Eifel, de Wadden, en de Zeeuwse kust, heb ik erg gewaardeerd en zal ik in de toekomst naar uitzien. Het onderzoek is grotendeels uitgevoerd bij Janssen Pharmaceutica, onder supervisie van Hugo, die mij de ruimte gaf om geheel mijn eigen weg te gaan, maar wel de druk op de ketel hield om die weg ook af te lopen. Ik heb deze vrijheid zeer gewaardeerd, waarvoor ook Frans lof verdiend met het vele geduld dat hij had tot er “eindelijk” iets uit kwam dat ook toepasbaar was. De samenwerking met Frans en Peter heeft, naar mijn mening, geleid tot een aantal goede toepassingen van beeldverwerking, in het bijzonder van scale-space methodes, in biologisch en farmaceutisch onderzoek. Dit mede door de “technology pull” van Kris, Luc, en Rony. Hoewel ik er vaak m’n eigen bescheiden “Hollandse” mening op na hield, hebben de discussies met Frans en Peter zeker verandering van inzicht tot gevolg gehad. Alhoewel Janssen een zeer grote research afdeling heeft, voel je je als AIO informatica toch enigzins verdwaalt tussen de biologen, met als enige uitzondering Luc, die me vele malen met statistiek om de oren sloeg. De twee- tot drie-wekelijkse bezoekjes aan Amsterdam waren dan ook zeer welkom en een onuitputtelijke bron van inspiratie. De weerslag van de discussies op die dagen met Rein, Theo, Anuj, Geert, Dennis, Harro, Erik en Arnold (Jonk) is dan ook duidelijk terug te vinden in het proefschrift. Mijn hartelijke dank hiervoor, het gaf me een flinke steun in de rug. Een duidelijke tekortkoming als AIO informatica is de gebrekkige kennis van biologie. Na vele verhalen van mijn belgische collega PhD studenten op Janssen over NF-κB receptors, biochemical pathways, en apoptosis van NGF gedepriveerde PC12 cellijnen, is één en ander me duidelijk geworden. Rony, Gwenda, Gerrit, Christopher, en de † Acknowledgement 145 studenten van de andere afdelingen, bedankt voor het bijbrengen van de biologische achtergrond om mee te kunnen in een pharmaceutisch bedrijf. Uiteraard ben ik hier ook Astrid, Peter en Jos een bedankje schuldig. De grafische afdeling van Janssen heeft de figuren in het proefschrift verzorgt, alsmede het drukken van het proefschrift. Vele malen heb ik Lambert verontrust met rare beeld formaten, formules, en encapsulated postscript figuren. Ook ben ik Jozef en Bob dank verschuldigd voor het maken van figuren en posters, zeer succesvol wanneer ze eenmaal op het poster-board pronkten. Ook dank aan Marcel en Luc voor de mogelijkheid om bij Janssen dit onderzoek te doen, en het verzorgen van de financiering voor het boekje. Ik heb zowel Janssen als de UvA als een prettige omgeving ervaren om onderzoek te doen, mede door de goede sfeer in beide groepen. Hiervoor dank aan alle collega’s van de oude afdeling Life Sciences, met name Mirjam, Gerd, Eddy, Roger, Koen, Marc, Guy en Greet, en iedereen van de ISIS groep, met name Benno, Marcel, Silvia, Carlo, Kees, Wilko, Edo, Herke-Jan, Frank, Joost, Tat, Andy en Jeroen. Astrid, hoe kan ik jouw ooit bedanken . . .