Lehrveranstaltungsinhalt aus - the Institute for Computer Graphics
Transcription
Lehrveranstaltungsinhalt aus - the Institute for Computer Graphics
Lehrveranstaltungsinhalt aus Bildanalyse und ” Computergrafik“ Franz Leberl 28. Jänner 2002 2 Contents 0 Introduction 11 0.1 Using Cyber-Cities as an Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 11 0.2 Introducing the Lecturer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 0.3 From images to geometric models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 0.4 Early Experiences in Vienna . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 0.5 Geometric Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 0.6 Automation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 0.7 Modeling Denver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 0.8 The Inside of buildings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 0.9 Built-Documentation Modeling the Inside of Things in Industry . . . . . . . . . . . 15 0.10 Modeling Rapidly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 0.11 Vegetation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 0.12 Coping with Large Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 0.13 Non-optical sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 0.14 The Role of the Internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 0.15 Two Systems for Smart Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 0.16 International Center of Excellence for City Modeling . . . . . . . . . . . . . . . . . 20 0.17 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 0.18 Telecom Applications of City Models . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1 Characterization of Images 37 1.1 The Digital Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 1.2 The Image as a Raster Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 1.3 System Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 1.4 Displaying Images on a Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 1.5 Images as Raster Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 1.6 Operations on Binary Raster Images . . . . . . . . . . . . . . . . . . . . . . . . . . 42 1.7 Algebraic Operations on Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3 4 CONTENTS 2 Sensing 51 2.1 The Most Important Sensors: The Eye and the Camera . . . . . . . . . . . . . . . 51 2.2 What is a Sensor Model? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 2.3 Image Scanning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.4 The Quality of Scanning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 2.5 Non-Perspective Cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 2.6 Heat Images or Thermal Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 2.7 Multispectral Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.8 Sensors to Image the Inside of Humans . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.9 Panoramic Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 2.10 Making Images Independent of Sunlight and in Any Weather: Radar Images . . . . 59 2.11 Making Images with Sound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 2.12 Passive Radiometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 2.13 Microscopes and Endoscopes Imaging . . . . . . . . . . . . . . . . . . . . . . . . . 61 2.14 Objects-Scanners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 2.15 Photometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 2.16 Data Garments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 2.17 Sensors for Augmented Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 2.18 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3 Raster-Vector-Raster Convergence 69 3.1 Drawing a straight line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.2 Filling of Polygons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.3 Thick lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.4 The Transition from Thick Lines to Skeletons . . . . . . . . . . . . . . . . . . . . . 73 4 Morphology 79 4.1 What is Morphology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.2 Dilation and Erosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.3 Opening and Closing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.4 Morphological Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.5 Shape Recognition by a Hit or Miss Operator . . . . . . . . . . . . . . . . . . . . . 85 4.6 Some Additional Morphological Algorithms . . . . . . . . . . . . . . . . . . . . . . 86 CONTENTS 5 5 Color 93 5.1 Gray Value Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.2 Color images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.3 Tri-Stimulus Theory, Color Definitions, CIE-Model . . . . . . . . . . . . . . . . . . 96 5.4 Color Representation on Monitors and Films . . . . . . . . . . . . . . . . . . . . . 99 5.5 The 3-Dimensional Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.6 CMY-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.7 Using CMYK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.8 HSI-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.9 YIQ-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.10 HSV and HLS -Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.11 Image Processing with RGB versus HSI Color Models . . . . . . . . . . . . . . . . 110 5.12 Setting Colors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.13 Encoding in Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.14 Negative Photography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.15 Printing in Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.16 Ratio Processing of Color Images and Hyperspectral Images . . . . . . . . . . . . . 113 6 Image Quality 121 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.3 Gray Value and Gray Value Resolutions . . . . . . . . . . . . . . . . . . . . . . . . 121 6.4 Geometric Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 6.5 Geometric Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 6.6 Histograms as a Result of Point Processing or Pixel Processing . . . . . . . . . . . 123 7 Filtering 133 7.1 Images in the Spatial Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 7.2 Low-Pass Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 7.3 The Frequency Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 7.4 High Pass-Filter - Sharpening Filters . . . . . . . . . . . . . . . . . . . . . . . . . . 137 7.5 The Derivative Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 7.6 Filtering in the Spectral Domain / Frequency Domain . . . . . . . . . . . . . . . . 140 7.7 Improving Noisy Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7.8 The Ideal and the Butterworth High-Pass Filter . . . . . . . . . . . . . . . . . . . . 141 7.9 Anti-Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 7.9.1 What is Aliasing ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 7.9.2 Aliasing by Cutting-off High Frequencies . . . . . . . . . . . . . . . . . . . . 142 7.9.3 Overcoming Aliasing with an Unweightable Area Approach . . . . . . . . . 143 7.9.4 Overcoming Aliasing with a Weighted Area Approach . . . . . . . . . . . . 143 6 CONTENTS 8 Texture 151 8.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 8.2 A Statistical Description of Texture 8.3 Structural Methods of Describing Texture . . . . . . . . . . . . . . . . . . . . . . . 152 8.4 Spectral Representation of Texture . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 8.5 Texture Applied to Visualisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 8.6 Bump Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 8.7 3D Texture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 8.8 A Review of Texture Concepts by Example . . . . . . . . . . . . . . . . . . . . . . 155 8.9 Modeling Texture: Procedural Approach . . . . . . . . . . . . . . . . . . . . . . . . 155 9 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . 151 161 9.1 About Geometric Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 9.2 Problem of a Geometric Transformation . . . . . . . . . . . . . . . . . . . . . . . . 161 9.3 Analysis of a Geometric Transformation . . . . . . . . . . . . . . . . . . . . . . . . 162 9.4 Discussing the Rotation Matrix in two Dimensions . . . . . . . . . . . . . . . . . . 165 9.5 The Affine Transformation in 2 Dimensions . . . . . . . . . . . . . . . . . . . . . . 167 9.6 A General 2-Dimensional Transformation . . . . . . . . . . . . . . . . . . . . . . . 169 9.7 Image Rectification and Resampling . . . . . . . . . . . . . . . . . . . . . . . . . . 171 9.8 Clipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 9.9 9.8.1 Half Space Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 9.8.2 Trivial acceptance and rejection . . . . . . . . . . . . . . . . . . . . . . . . . 172 9.8.3 Is the Line Vertical? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 9.8.4 Computing the slope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 9.8.5 Computing the Intersection A in the Window Boundary . . . . . . . . . . . 172 9.8.6 The Result of the Cohen-Sutherland Algorithm . . . . . . . . . . . . . . . . 173 Homogeneous Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 9.10 A Three-Dimensional Conformal Transformation . . . . . . . . . . . . . . . . . . . 174 9.11 Three-Dimensional Affine Transformations . . . . . . . . . . . . . . . . . . . . . . . 176 9.12 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 9.13 Vanishing Points in Perspective Projections . . . . . . . . . . . . . . . . . . . . . . 177 9.14 A Classification of Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 9.15 The Central Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 9.16 The Synthetic Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 9.17 Stereopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 9.18 Interpolation versus Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 182 9.19 Transforming a Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 CONTENTS 7 9.19.1 Presenting a Curve by Samples and an Interpolation Scheme . . . . . . . . 182 9.19.2 Parametric Representations of Curves . . . . . . . . . . . . . . . . . . . . . 183 9.19.3 Introducing Piecewise Curves . . . . . . . . . . . . . . . . . . . . . . . . . . 183 9.19.4 Rearranging Entities of the Vector Function Q . . . . . . . . . . . . . . . . 183 9.19.5 Showing Examples: Three methods of Defining Curves . . . . . . . . . . . . 184 9.19.6 Hermite’s Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 9.20 Bezier’s Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 9.21 Subdividing Curves and Using Spline Functions . . . . . . . . . . . . . . . . . . . . 185 9.22 Generalization to 3 Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 9.23 Graz and Geometric Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 10 Data Structures 195 10.1 Two-Dimensional Chain-Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 10.2 Two-Dimensional Polygonal Representations . . . . . . . . . . . . . . . . . . . . . 196 10.3 A Special Data Structure for 2-D Morphing . . . . . . . . . . . . . . . . . . . . . . 197 10.4 Basic Concepts of Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 10.5 Quadtree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 10.6 Data Structures for Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 10.7 Three-Dimensional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 10.8 The Wire-Frame Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 10.9 Operations on 3-D Bodies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 10.10Sweep-Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 10.11Boundary-Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 10.12A B-Rep Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 10.13Spatial Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 10.14Binary Space Partitioning BSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 10.15Constructive Solid Geometry, CSG . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 10.16Mixing Vectors and Raster Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 10.17Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 11 3-D Objects and Surfaces 211 11.1 Geometric and Radiometric 3-D Effects . . . . . . . . . . . . . . . . . . . . . . . . 211 11.2 Measuring the Surface of An Object (Shape from X) . . . . . . . . . . . . . . . . . 211 11.3 Surface Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 11.4 Representing 3-D Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 11.5 The z-Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 11.6 Ray-tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 11.7 Other Methods of Providing Depth Perception . . . . . . . . . . . . . . . . . . . . 218 8 CONTENTS 12 Interaction of Light and Objects 223 12.1 Illumination Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 12.2 Reflections from Polygon Facets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 12.3 Shadows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 12.4 Physically Inspired Illumination Models . . . . . . . . . . . . . . . . . . . . . . . . 228 12.5 Regressive Ray-Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 12.6 Radiosity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 13 Stereopsis 235 13.1 Binokulares Sehen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 13.2 Stereoskopisches Sehen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 13.3 Stereo-Bildgebung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 13.4 Stereo-Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 13.5 Non-Optical Stereo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 13.6 Interactive Stereo-Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 13.7 Automated Stereo-Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 14 Classification 245 14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 14.2 Object Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 14.3 Features, Patterns, and a Feature Space . . . . . . . . . . . . . . . . . . . . . . . . 246 14.4 Principle of Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 14.5 Bayes Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 14.6 Supervised Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 14.7 Real Life Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 14.8 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 15 Resampling 15.1 The Problem in Examples of Resampling 255 . . . . . . . . . . . . . . . . . . . . . . . 255 15.2 A Two-Step Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 15.2.1 Manipulation of Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . 256 15.2.2 Gray Value Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 15.3 Geometric Processing Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 15.4 Radiometric Computation Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 15.5 Special Case: Rotating an Image by Pixel Shifts . . . . . . . . . . . . . . . . . . . 258 CONTENTS 9 16 About Simulation in Virtual and Augmented Reality 261 16.1 Various Realisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 16.2 Why simulation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 16.3 Geometry, Texture, Illumination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 16.4 Augmented Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 16.5 Virtual Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 17 Motion 265 17.1 Image Sequence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 17.2 Motion Blur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 17.3 Detecting Change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 17.4 Optical Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 18 Man-Machine-Interfacing 269 18.1 Visualization of Abstract Information . . . . . . . . . . . . . . . . . . . . . . . . . 269 18.2 Immersive Man-Machine Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . 269 19 Pipelines 271 19.1 The Concept of an Image Analysis System . . . . . . . . . . . . . . . . . . . . . . . 271 19.2 Systems of Image Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 19.3 Revisiting Image Analysis versus Computer Graphics . . . . . . . . . . . . . . . . . 272 20 Image Representation 275 20.1 Definition of Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 20.1.1 Transparency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 20.1.2 Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 20.1.3 Progressive Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 20.1.4 Animation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 20.1.5 Digital Watermarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 20.2 Common Image File Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 20.2.1 BMP: Microsoft Windows Bitmap . . . . . . . . . . . . . . . . . . . . . . . 278 20.2.2 GIF: Graphics Interchange Format . . . . . . . . . . . . . . . . . . . . . . . 278 20.2.3 PICT: Picture File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 20.2.4 PNG: Portable Network Graphics . . . . . . . . . . . . . . . . . . . . . . . 279 20.2.5 RAS: Sun Raster File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 20.2.6 EPS: Encapsulated PostScript . . . . . . . . . . . . . . . . . . . . . . . . . 279 20.2.7 TIFF: Tag Interchange File Format . . . . . . . . . . . . . . . . . . . . . . 279 20.2.8 JPEG: Joint Photographic Expert Group . . . . . . . . . . . . . . . . . . . 280 20.3 Video File Formats: MPEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 20.4 New Image File Formats: Scalable Vector Graphic - SVG . . . . . . . . . . . . . . 281 10 CONTENTS A Algorithmen und Definitionen 285 B Fragenübersicht 289 B.1 Gruppe 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 B.2 Gruppe 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 B.3 Gruppe 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 Chapter 0 Introduction 0.1 Using Cyber-Cities as an Introduction We introduce the subject of “digital processing of visual information”, also denoted as “digital image processing” and “computer graphics”. We introduce the subject by means of one particular application, namely 3D computer modelling of our cities. This is part of the wider topic of the so called “virtual habitat”. “Modelling cities”, what do we mean by that? The example in Slide 0.7 shows a traditional representation of a city, in this particular example the “Eisene Tor” in Graz. In two dimensions we see the streetcar tracks, we see the Mariensäule, buildings and vegetation. This is the status quo of current urban 2-D computer graphics. The new approach is to represent this in three dimension as shown in Slide 0.8. The two dimensional map of the city is augmented to include the third dimension, thus the elevations, and in order to render, represent or visualise the city we add photographic texture to create as realistic a model of the city as possible. Once we have that we can stroll through the city, we can inspect the buildings, we can read the signs and derive from them what’s inside the buildings. The creation of the model for this city is a subject of “image processing”. The rendering of the model is the subject of “computer graphics”. These two belong together and constitute a field denoted as “digital processing of visual information”. The most sophisticated recent modelling of a city was archieved of a section of Philadelphia. This employed a software called “Microstation” and was done by hand with great detail. In this case this detail includes vegetation, the virtual trees, waterfountains and people. I am attempting here to illustrate the concepts of “computer graphics” and “image processing” by talking about Cyber-Cities, namely how to create them from sensor data and how to visualise them. And this is the subject of this introduction. 0.2 Introducing the Lecturer Before we go into the material, permit me to introduce myself. I have roots both in Graz and in Boulder (Colorado, USA). My affiliations are since 1992 with the Technische Universität Graz, where I am a Professor of Computer Vision and Graphics. But I am also affiliated with a company in the United States since 1985 called Vexcel Corporation. In both places, the Vexcel Corporation and the University, cyber-cities play a role in the daily work. Vexcel Corporation in the US operates in four technical fields: 1. It builds systems to process radar images 11 12 CHAPTER 0. INTRODUCTION 2. It deals with satellite receiving stations, to receive large quantities of images that are transmitted from satellites 3. It deals with close range photogrammetry for “as-built” documentation and 4. It deals with images from the air Slide 0.19 is an example showing a remote sensing satellite ground receiving station installed in Hiroshima (Japan), carried on a truck to be moveable. Slide 0.20 shows a product of the Corporation, namely a software package to process certain radar-images interferometrically. We will towards the end of this class, talk quickly about this interferometry. What you see in Slide 0.20 are interferometric “fringes” obtained from images, using a phase differences between the two images. The fringes indicate the elevation of the terrain, in this particular case Mt. Fuji in Japan. Another software package models the terrain and renders realistically looking images by superimposing the satellite images over the shape of the terrain with its mountains and valleys. Slide 0.22 shows another software package to convert aerial photography to so called “ortho-photos”, a concept we will explain later in this class. Then we have an application, a software package called Foto-G, which supports the modelling of existing plants performing a task called “as builtdocumentation”. You take images of a facility or plant, extract from the image geometry the location and dimensions of pipes and valves, and obtain in a “reverse engineering mode” so called CAD (computer-aided-design) drawings of the facility. 0.3 From images to geometric models We proceed to a serious of sub-topics to discuss the ideas of city-modeling. I would like to convey an idea of what the essence is of “digital processing of visual information”. What we see in Slide 0.25 is on the left part of an aerial photograph of a new housing development and on the right we see information extracted from the image of the left using a process called “stereoscopy”, representing the small area that is marked in red on the right side. We are observing here a transition from images of an object to a model of that object. Such images as in Slide 0.26 show so-called “human scale objects” like buildings, fences, trees, roads. But images may show our entire planet. There have been various projects in Graz to address the extraction of information from images in there is a burdle of problems available as topic for a Diplomarbeit or a Dissertation to address the optimum geometric scale and geometric resolution needed for a specific task at hand. If I want to model a building, what is the required optimum image resolution? We review in Slide 0.29 the Down-town of Denver at 30 cm per pixel. Slide 0.30 is the same Down-town at 1.20 m per pixel. Finally in Slide 0.31 we have 4 meters per pixel. Can we map the buildings and which accuracy can we get in mapping them? 0.4 Early Experiences in Vienna Our Institute at the Technical University in Graz got involved in city-modelling in 1994 when we got invited by the Magistrat of Vienna to model a city block consisting of 29 buildings inside the block and another 25 buildings surrounding the block. The block is defined by the 4 streets in the 7th district in Vienna. The work was performed by 2 students in two diploma theses and the initial results were of course a LEGO-type representation of each building. The building itself can not be recognised, as seen in the example of a generic building. It can be recognised only if we apply the photographic texture. We can take this either from a photograph taken from a street level or from aerial photography taking from an airplane. The entire cityblock was modelled but a cause that some photographic texture was missing. Particularly the photographic texture was missing 0.5. GEOMETRIC DETAIL 13 in the courtyards and so they shown black or grey here. When this occurs, the representation is without photographic texture, and is instead in the form of a flat shaded representation. Slide 0.37 looks at the roof scape and we see that perhaps we should model the chimneys as shown here. However, the skylights were not modeled. What can we do with these data? We can walk or fly through the cities. We can assess changes for example by removing a building and replacing it by a new one. We call this “virtual reality”, but scientists often prefer the expression “virtual environment”, since “virtual” and “reality” represent a contradiction in terms. This differs of course from photographic reality, which is more detailed and more realistic by showing great geometric detail, showing wires, dirt on the road, cars, the effect of weather. There is yet another type of reality, namely “physical reality”, when we are out there in a city and we feel the wetness in our shoes, we feel the cold in the air, we hear the noise of birds, the screeching of cars. So we see various levels of reality: physical, photographic and virtual reality. 0.5 Geometric Detail What geometric detail do we need when we model a city? Lets take the example of a roof. Slide 0.44 is a roofshape extracted for the Vienna example, We have not applied to the roof photographic texture, but instead some generic computer texture. We will talk later of course about texture and I will try to explain different types of texture for use in rendering for computer graphics. If we apply this kind of generic texture we loose all information about the specific characteristics of this roof. What we would like to have is the roof shown with chimneys. Maybe we need skylights as well far the fire-guard in order to direct people to an exit through the roof in the case of a catastrophy. There is a topic here for a Diplomarbeit and Dissertation theme to study the amount of geometric detail needed in the presence of photographic texture: the trade-off between photographic texture and geometry detail. To illustrate this further let us take a look at the same roof with its skylights and chimneys and now use photographic texture to illustrate how this roof looks like. If we take photographic texture, and if we have some chimneys, and if we render this roof from another perspective than that from which the photograph was taken, the chimneys will look very unnatural. So we need to do some work and create the geometric model of the chimneys. If we employ that model and we now superimpose the photographic texture over it, we see that we have sunshine casting shadows and we have certain areas of the roof that are covered by pixels from the shadows left by the chimneys. If the sunshine is from another side, say in the morning, but the picture was taken in the afternoon, we have wrong shadows. So we need to fix this by eliminating the shadows in the texture. We introduce the shadow in a proper rendering by a computation. We also need to fill in those pixels that are covered by the perspective distortion of the chimneys, and use generic pixels of the roof to fill in the areas where no picture exists. Slide 0.50 is the final result: we have removed the shadow, we have filled in the pixels. We now have the best representation of that roof with its chimneys and we can render this now correctly in the morning and in the afternoon, with rain or with sunshine. 0.6 Automation All of this modeling of cities is expensive, because it is based on manual work. In order to reduce the cost of creating such models one needs to automate their creation. Automation is a large topic and is available for many Diplomarbeiten and many Dissertations. Let me illustrate automation for about our city-models in Graz. There already exist 2-dimensional descriptions so the task of automating here is to achieve the transition from two to three dimensions. Slide 0.52 is a two-dimensional so-called geographic information system (GIS) of a certain area around the Schlossberg in Graz. Lets take a look at this particular building in Slide 0.53. We have a total of five aerial photographs, 3 of them are shown of that particular building in Slide 0.54. 14 CHAPTER 0. INTRODUCTION The five photographs can be converted into so called edge images, a classical component of image processing. There are topics hidden here for more Diplomarbeiten and Dissertationen. We also convert an input GIS data into an output edge image. This edge image from the GISvectors can now be the basis for a match between these five edge images and the two dimensional GIS image. They will not fit, because those edges of the roof as shown here are elevated and therefore perspectively distorted as the other polygon is the representation of the footprint of the building. Algorithm 1 Affine matching 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: Read in and organize one or more digital photos with their camera information Compute an egde image for each of the photos Read in and organize the polygons of each building footprint Project the polygon into each photo’s edge image Vector-raster convert the polygon in each edge image, creating a polygon image Compute a distance transform for each polygon image repeat Compute the distance between each edge image and its polygon image using the distance transform Change the geometry of the polygon image until distance no longer gets reduced There is a process called “affine matching” which allows to match the edge images computed from the aeriaphotos and the representation which originally was a vector data structure. Affine matching is a Graz innovation: To match two different data structures namely raster and vector, which in addition are geometrically different, is the purpose of affine matching the footprint of the house is in an orthographic projection, with the roofline of the house in a central perspective projection. Affine matching overcomes these differences and finds the best possible matches between the data structures. The result in Slide 0.58 shows how the footprint was used to match the roofline of the building using this affine matching technique. The algorithm itself is rather simple described (see Algorithm 1). Now, the same idea of matching vectors with images is shown in the illustration of Slide 0.59 where we see in yellow the primary position of a geometric shape, say typically the footprint, and in red is the roofline. We need to match the roofline with the footprint. Slide 0.60 is another example of these matches, and Slide 0.61 is the graphic representation of the roofline. 0.7 Modeling Denver We talk about a method to model all the buildings of a city like Denver (Colorado, USA). This is an aerial photographic coverage of the entire city. Slide 0.63 is the down town area of Denver. From overlapping aerial photographs we can automatically create a digital elevation model (DEM) by a process called stereo matching. A DEM is a representation of the z-elevation to each (x, y) at a regular grid mesh of points. So we have a set of regularly space of (x, y) locations where we know the z-value of the terrain. We invite everybody to look into a Diplomarbeit or Dissertation topic of taking this kind of digital elevation model and create from what it is called the “Bald Earth”. One needs to create a filter which will take the elevation model and erase all the trees and all the buildings, so that the only thing that is left is the Bald Earth. What is being “erased” are towers, trees, buildings. That process needs an intelligent low-pass-filter. We will talk about low-pass-filters later in this class. Slide 0.67 is the result a so called Bald Earth DEM (das DEM der kahlen Erde). The difference between the two DEMs, namely the Bald Earth DEM and the full DEM is of course the elevation of the vertical objects that exist on top of the Bald Earth. These are the buildings, the cars, the vegetation. This is another topic one could study. Now we need to look at the difference DEM and automatically extract the footprints of buildings. We can 0.8. THE INSIDE OF BUILDINGS 15 do that by some morphological operations, where we will close the gaps straighten the edges of buildings, then compute the contours of the buildings. Finally we obtain the buildings and place them on top of the Bald Earth. When we have done that, we can now superimpose over the geometric shapes of building “boxes” (the box-models) the photographic texture. We get a photorealistic model of all of Denver, all generated automatically from aerial photographs. There exist multiple views of the same area of Denver. 0.8 The Inside of buildings City models are not only a subject of the outside of buildings, but also of their inside. Slide 0.74 is the Nationalbibliothek in Vienna, in which there is a Representation Hall (Prunksaal). If one takes the architect’s drawings of that building, one can create a wire mesh representation as illustrated in Slide 0.75, consisting of arcs and nodes. We can render this without removal of the hidden surfaces and hidden lines to obtain this example. We can go inside this structure, take pictures and use photographic texture to photo-realistically render the inside of the Prunksaal in a manner that a visitor to the Prunksaal will never see. We can not fly into the Prunksaal like a bird. We can also see the Prunksaal in the light that computer rendering permits us to create. We can even go back a hundred years and show the Prunksaal at it was a hundred years ago, before certain areas were converted into additional shelf-space for books. There is a Diploma- and Dissertation-topic hidden in developments to produce images effectively and efficiently inside a building. An example is shown in Slide 0.80 and Slide 0.81 of the ceiling, imaging it efficiently in all its detail and colorful glory. Yet another subject is how to model objects inside a room like this statue of emperor Charles VI. He is shown in Slide 0.82 a triangulated mesh created from a point cloud. We will talk a little bit about triangulated meshes later. Slide 0.82 is based on 20.000 points that are triangulated in a non-trivial process. Slide 0.83 is a photo-realistic rendering of the triangulated point cloud, with each triangle being superimposed by the photographic texture that was created from photographs. A good scientific topic for Diplomarbeiten of Dissertationen is the transition from point clouds to surfaces. A non-trivial problem exists when we look at the hand of the emperor. We need to make sure to connect points in the triangles that should topologically be connected. And we do not want the emperor to have hands like the feet of a duck. 0.9 Built-Documentation Modeling the Inside of Things in Industry There exists not only cultural monuments, but also industrial plants. This goes back to that idea of “inverse” or “reverse engineering” to create drawings of a facility of a building for example, of a refinery. The refinery may have been built 30 or 40 years ago and the drawings are no longer available, since there was no CAD at that time. We take pictures of the inside of a building, using perhaps thousands of pictures. We re-establish relationships between the pictures. We need to know from were they are taken. One picture overlaps with another picture. Which pictures show the same objects and which do not? That is done by developing this graph in Slide 0.89. Each node of the graph is “a postage stamp” of the picture and the arcs between these nodes describe the relationship. If there is no arcs then there is no relationship. Any images can be called up on a monitor. Also pairs of images can be set up. We can point to a point on one image and a process will look for the corresponding point in the other overlapping image or images. The three dimensional location of the point we have pointing at in only one image will be shown in the three dimensional rendering of the object. So again, “from image to objects” means in this 16 CHAPTER 0. INTRODUCTION case “reverse engineering” or “as-built-documentation”. Again there are plenty of opportunities for research and study in the area of automation of all these processes. A classical topic is the use of two pictures of a some industrial structure to find correspondences of the same object in both images without any knowledge about the camera or object. By eye we can point to the same feature in two images, but this is not trivial to do by machine if we have no geometrie relationships established between the two images that would limit the search areas. One idea is to find many candidates of features in both images and than determine by some logic which of those features might be identical. So we find one group of features in one image, and another group in the other image. Then we decide which points or objects belong together. The result is shown as highlighted circles. A similar situation is illustrated in Slide 0.95, however with test targets to calibrate a camerasystem for as-built-documentation. We automatically extract all the test objects (cercles) from the images. We can see a three dimensional pattern of these calibration targets in Slide 0.96 and Slide 0.97. Now the same approach can also be applied to the outside of buildings as shown in Slide 0.98 with three photographs of a railroad-station. The three images are input to an automatic algorithm to find edges, the edges get pruned and reduced so that we are only left with significant edges that represent windows, doors, awnings and the roofline of the building. This of course can also be converted into three dimensions. There is yet another research topic, namely “automated mapping of geometric details of facades”. Slide 0.100 and Slide 0.101 are the three dimensional renderings of those edges that are found automatically in 3-D. 0.10 Modeling Rapidly We not only want to create these data at a low cost, we also want to get them rapidly. Slide 0.103 is an example: a village as been imaged from a helicopter with a handheld camera, looking out to the horizon we appreciate an oblique, panoramic image. “Give us a model of that village tomorrow” may be the task. Particularly when it concerns catastrophies, disasters, military or anti-terror operations and so forth. The topic which is hidden here is that these photos were not taken with a well-controlled camera but accidentally and hastily from a helicopter and with an average amateur camera. The research topic here is the “use of uncalibrated cameras”. A wire-mesh representation of the geometry can be created by a stereo process. We can then place the buildings on top of the surface much like in the Denver-example discussed earlier and we can render it in a so-called flat-shaded representation. We can now look at it, navigate in the data set, but this is not visually as easy to interpret as it would be if we had photography super-imposed, which is the case in Slide 0.109 and Slide 0.110. Now we can rehearse an action needed because of a catastrophy or because of a terrorist attack in one of those buildings. We can fly around, move around and so forth. 0.11 Vegetation “Vegetation” is a big and important topic in this field. Vegetation is difficult to map, difficult to render and difficult to remove. Vegetation as in the Graz-example, may obscure facades. If we made pictures to map the buildings and to get the photographic texture, then these trees, pedestrians and cars are a nuisance. What can we do? We need to eliminate the vegetation, and this is an interesting research topic. The vegetation is eliminated with a lot of manual work. How can we automate that? There are ways and ideas to automate this kind of separation of objects that are at a different depth from the viewer using multiple images. 0.12. COPING WITH LARGE DATASETS 17 Using vegetation for rendering, like in the picture of the Schloßberg of Slide 0.115, is not trivial either. How do we model vegetation in this virtual habitat? The Schloßberg example is based on vegetation that is photographically collected and then pasted onto flat surfaces that are mounted on tree trunks. This is acceptable for a still image like Slide 0.117, but if we have some motion, then vegetation produces a very irritating effect, because the trees move as we walk by. Another way, of course, is to really have a three dimensional rendering of a tree, but they typically are either very expensive or they look somewhat artificial, like the tree in the example of Slide 0.118. Vegetation rendering is thus also an important research topic. 0.12 Coping with Large Datasets We have a need to cope with large data sets in the administration, rendering and visualization of city data. The example of modeling Vienna with its 220,000 buildings in real-time illustrates the magnitude of the challenge. Even if one compresses the 220,000 individual buildings into 20,000 “blocks”, thus on average combining 10 buildings into a single building block, one still has to cope with a time-consuming rendering effort that is not possible to achieved in real-time. A recent doctoral thesis by M. Kofler (1998) reported on algorithms to accelerate the rendering on an unaided computer by the factor of 100, simply by using an intelligent data structure. If the geometric data are augmented by photographic texture, then the quantity of data gets even more voluminous. Just assume that one has 220,000 individual buildings consisting of 10 facades each, each facade representating roughly 10m × 10m, photographic texture at a resolution of 5cm × 5cm per pixel. You are invited to compute the quantity of data that results from this consideration. Kofler’s thesis proposed a clever data structure called “LOD/R-tree”. “LOD” stands for level of detail, and R-tree stands for rectangular tree. The author took the entire city of Vienna and defined for each building a rectangle. These are permitted to overlap. In addition, separate rectangles represent a group of buildings, even the districts are represented by one rectangle each. Actually, the structure was generalized to 3D, thus we are not dealing with rectangles but with cubes. Now as this is being augmented by photographic texture one needs to select the appropriate data structure, to be super-imposed over the geometry. As one uses the data one defines the so-called “Frustum” as the intanstaneous cone-of-view. At the front of the viewing cone one has high resolution, whereas in the back one employs low resolution. The idea is to store the photographic texture and the geometry at various levels of detail and then call up those levels of detail that are relevant, at a certain of distance to the viewer. This area of research is still rapidly evolving and “fast visualization” is therefore another subject of on-going research for Diplomarbeiten and Dissertationen. The actual fly-over of Vienna using the 20,000 building blocks in real-time is now feasible on a regular personal computer producing about 10 to 20 frames per second as opposed to 10 seconds per frame prior to the LOD/R-tree data structure. Slide 0.129 and Slide 0.130 are two views computed with LOD/R-tree. The same LOD/R-tree data structure can also be used to fly over regular DEMs - recall that these are regular grids in (x, y) to which a z-value is attached at each grid intersection to represent terrain elevations. These meshes are then associated with photographic texture as shown in three segmential views. We generally call this “photorealistic rendering of outdoor environments”. Another view of a Digital Elevation Model (DEM), super-imposed with a higher resolution aerial photograph, is shown in Slide 0.135 and Slide 0.136. 18 0.13 CHAPTER 0. INTRODUCTION Non-optical sensing Non-photographic, therefore non-optical, sensors could be used for city modeling. Recall that we model cities from sensor data and then we render cities using the models as input and we potentially augment those by photographic texture. Which non-optical sensors can we typically consider? A first example is radar imagery. We can use imagery taken with microwaves at wavelengths between 1 mm to 25 cm or so. That radiation penetrates fog, rain, clouds and is thus capable of “all-weather” operations. The terrain is illuminated actively like with a flash light supporting a “day & night” operation. An antenna transmits microwave radiation, this gets reflected on the ground, echoes are coming back to the antenna which is now switched to receive. We will discuss radar imaging in a later section of this class. Let’s take a look at two images. One image of Slide 0.138 has the illumination from the top, the other has the illumination from the bottom. Each image point or pixel covers 30 cm × 30 cm on the ground representing a geometric resolution of 30 cm. Note that the illumination causes shadows to exist and how the shadows fall differently in the two images. The radar images can be associated with a direct observation of the digital elevation of the terrain. Slide 0.139 is an example associated with the previous two images of the area of the Sandia Research Laboratories in Albuquerque (New Mexico, USA). About 6,000 people work at Sandia. The individual buildings are shown in this dataset, which is in it-self rather noisy. But it becomes a very powerful dataset when it is combined with the actual images. We have found here a non-stereo way of directly mapping the shape of the Earth in three dimensions. Another example with 30 cm × 30 cm pixels is a small village, the so-called MOUT site (Military Operations in Urban Terrain). Four looks from the four cardinal directions show shadows and other image phenomena that are different to understand and are subject of later courses. We will not discuss those phenomena much further in this course. Note simply that we have four images of one and the same village and those phenomena in the four images look very different. Just study those images in detail and consider how shadows fall, how roofs are being imaged and note in particular one object, namely a church as marked. This church can be reconstructed using eleven measurements. There are about 47 measurements one can take from those four images, so that we have a set of redundant observations of these dimensions to describe the church. The model of the church is shown in Slide 0.141, and is compared to an actual photograph of the same church in Slide 0.142. This demonstrates that one can model a building not only from optical photography, but from various types of sensor data. We have seen radar images in combination with interferometry. There is a ample opportunity to study “Building re-construction from radar images” in the form of Diploma and Doctoral thesis. Another sensor is the laser scanner. Slide 0.144 is an example of a laser scanner result from downtown Denver. How does a laser scanner operate? An airplane carries a laser device. It shoots a laser ray to the ground. It gets reflected and the time it takes to do the roundtrip is measured. If there is an elevation the roundtrip time is shorter than if there is a depression. The direction into which the laser “pencil” looks rapidly changes from left-to-right to create a “scanline”. Scanlines are being added up by the forward motion of the plane. The scanlines accrue into an elevation map of the ground. The position of the airplane itself is determined using a Global Positioning System which is carried on the airplane. The position might have a systematic error. But by employing a second simultaneously observed GPS position on the ground one will really observe the relative motion between the airplane GPS and the stationary GPS platform on the ground. This leads to a position error in the cm-range for the airplane and to a very small error in the cm range for the distance between the airplane in the ground. Laser measurements are a very hot topic in city modeling, and there are advantages as well as disadvantages vis-a-vis building models from images. To study this issue could be a subject of Diploma and Doctoral thesis. Note that as the airplane flies along, only a narrow strip of the ground gets mapped. In order to cover a large area of the ground one has to combine individual strips. Slide 0.147 illustrates how 0.14. THE ROLE OF THE INTERNET 19 the strips need to be merged and how any discrepancies between those strips, particularly in their overlaps, need to be removed by some computational measure. In addition, one needs to know points on the ground with their true coordinates in order to remove any uncertainties that may exist from the airplane observations. So finally we have a matched, merged, cleaned-up data set and we now can do the same thing that we did with the DEM from aerial photography, namely we merge the elevation data obtained from the laser scanner with potentially simultaneously collected video imagery, also taken from that same airplane: We obtain a laser scan and phototexture product. 0.14 The Role of the Internet It is of increasing interest to look at a model of a city from remote locations. An example is the so-called “armchair tourism”, vacation planning and such. Slide 0.152 is an example of work done for a regional Styrian tourism group. They contracted to have a mountain-biking trail advertised on the Internet using a VRML model of the terrain. Shown is in Slide 0.153 a map near Bad Mitterndorf in Styria and a vertical view of a mountain-biking trail. Slide 0.154 is a perspective view of that mountain-bike trail super-imposed onto a digital elevation model that is augmented by photographic texture obtained from a satellite. This is actually available today via the Internet. The challenge is to compress the data without significant loss of information and to offer that information via the Internet at attractive real-time rates. Again Diploma and Doctoral thesis topics could address the Internet and how it can help to transport more information faster and in more detail and of course in all three dimensions. Another example of the same idea is an advertisement for the Grazer Congress on the Internet. The Grazer Congress’s inside was to be viewable to far away potential organizers of conferences. They obtain a VRML view of the various inside spaces. Because of the need to compress those spaces, the data are geometrically very simple, but they carry the actual photographic texture that is available through photographs taken at the inside of the Grazer Congress. The Internet is a source of a great variety of image information, an interesting variation of the city models relates to the so-called “orthophoto”, namely photographs taken from the air or from space that are geometrically corrected to take on the geometry of a map. The example of Slide 0.158 shows the downtown of Washington D.C. with the U.S. Capitol (where the parliament resides). This particular web site is called “City Scenes”. 0.15 Two Systems for Smart Imaging We already talked about imaging by regular cameras, by radar or non-imaging sensing and by laser. Let’s go a step further: specific smart sensing developed for city mapping. As part of a doctoral thesis in Graz a system was developed to be carried on the roof of a car with a number of cameras that allow one to reconstruct the facades of buildings in the city. Images are produced by driving with this system along those buildings. At the core of the system is a so-called linear detector array consisting of 6,000 CCD elements in color. These elements are combined with two or three optical systems, so that 3,000 elements are exposed through one lens and another 3,000 elements through another lens. By properly arranging the lenses and the CCDs one obtains a system, whereby one lens collects a straight line of the facade looking forward and the other lens collects a straight line either looking backwards or looking perpendicular at the building. In Slide 0.163 we see the car with the camera-rig driving by a few buildings in Graz- Kopernikusgasse. Slide 0.164 shows two images with various details from those images in Slide 0.165, in particular images collected of the Krones-Hauptschule. Simultaneously with the linear detector array collecting images line by line as the car moves forward (this is also called “push broom imaging”), one can take images with a square array camera. So we have the lower resolution 20 CHAPTER 0. INTRODUCTION square array camera with maybe 700 × 500 pixels augmented by the linear detector array images with 3,000 pixels in one line and an infinite number of lines as the car drives by. The opportunity exists here as well to perform work for Diploma or Doctoral-theses to develop the advantages and disadvantages of square array versus line array cameras. A look at an image of a linear array shows its poor geometry because as the car drives there are lots of motions going on. In the particular doctoral thesis, the candidate developed software and algorithms to fix the geometric deformations in the images. Used is the fact that many of the features are recti-linear, for example edges of windows and details on the wall. This can help to automatically produce good images. If two images are produced, one can produce a stereo rendering of the city scape. The human observer can obtain a 3 dimensional using stereo glasses, as we will discuss later. That linear detector array approach carried in a car as a rigid arrangement without any moving camera points was also used by the same author to create a panoramic camera. What is a panorama camera? This is a camera that sweeps (rotates) across the area of interest with an open shutter, producing a very wide angle of view, in this case of 360 degrees in the horizontal dimension and maybe 90 degrees in the vertical direction. We can use two such images for stereoscopy by taking photos from two different positions. The example shown in Slide 0.172 has two images taken of an office space to combine into a stereo pair which can be used to recreate a complete digital 3-D model of the office space. These are the two raw images in which the “panoramic sweep” across 360o is presented as a flat image. What is the geometry of such a panoramic camera? This is rather complex. We do have a projection center O that is located on a rotation axis, which in turn defines a z-coordinate axis. The rotation axis passes through the center of an imaging lens. The CCD elements are arranged vertically at location zCCD . An object point pObj is imaged onto the imaging surface at location zCCD . The distance between O and the vertical line through the CCD is called “focal distance” fCCD . An image is created by rotating the entire arrangement around the z-axis and collecting vertical rows of pixels of the object space, and as we move we a assemble many rows into a continuous image. One interesting topic about this typ of imaging would be to find out what the most efficient and smartest ways would be to image indoor spaces (more potential topics for Diploma- and Doctoral research. To conclude Slide 0.175 is an image of an office space with a door, umbrella and a bookshelf that is created from that panoramic view in Slide 0.172 by geometrically “fixing” it to make it look like a photo from a conventional camera. The Congress Center in Graz has also been imaged in Slide 0.176 with a panoramic sweep; a separate sweep was made in slideFigure x to see how the ceiling looks when swept with a panoramic camera. 0.16 International Center of Excellence for City Modeling Who is interested in research on city models in the world? What are the “centers of excellence”? In any endeavour that is new and “hot” you always want to know who is doing what and where. In Europe there were several Conferences in recent years on this subject. One of these was in Graz, one was in Ascona in Switzerland, one in Bonn. Ascona was organized by the ETH-Zürich, Bonn by the University of Bonn, the Graz meeting by our Institute. The ETH-Zurich is home of considerable work in this area so much so that some University people even started a company, Cybercity AG. The work in Zurich addresses details of residential homes led to the organisation of two workshops in Ascona for which books have been published in the Birkhäuser-Verlag. One can see in these examples of Slide 0.182 Slide 0.183 Slide 0.184 Slide 0.185 Slide 0.186 that they find edges, use those to segment the roof into it’s parts. They use multiple images of the same building to verify that the segmentation is correct and improve it if errors are found. The typical example from which they work is aerial photography at large scales (large scales are at 1:1500; small scales are at 1:20,000). Large models have been made, for example of Zurich as shown in Slide 0.186. 0.17. APPLICATIONS 21 The most significant amount of work in this area of city modeling has probably been performed at the University in Bonn. The image in Slide 0.188 is an example of an area in Munich. The method used in Bonn is fairly complex and encompasses an entire range of procedures that typically would be found in many chapters of books on image processing or pattern recognition. One calls the diagram shown in Slide 0.189 an “image processing pipeline”. The data processed in Bonn are the same as used in Zurich. There exists an international data-set for research so that various institutions have to ability to practice their skill and compare the results. We will later go through the individual worksteps that are being listed in the pipe-line. One result from Bonn using the international images shows edges and from the edges finds match points and corners in separate images of the same object. This indicates the top of a roof. This illustration in Slide 0.190 is explaining the principle of the work done in Bonn. Another Bonnapproach is to first create corners and then topologically connect the corners so that roof segments come into existence. Then these roof-segments are merged into the largest possible area that might present roofs as shown in this example. Another approach is to start the modelling of a building not from the image itself nor from its edges and corners, but to create point clouds by stereo measurements. This represents a dense digital elevation model as we have explained earlier in the Denver-example. Digital elevations are illustrated here by encoding the elevation by brightness values with dark being low, white being high. One can now try to fit planes to the elements of the digital elevation model. Slide 0.193 is an intermediate result, where it looks as if one has found some roofs. The digital elevation model here invites one to compute planes to define roofs and the sides of buildings. In North America the work on City modeling is typically sponsored by the Defence Advanced Research Projects Agency (DARPA). Their motivation is the military application, for example to fight urban wars or having robots move through cities, or face terrorists. DARPA programs typically address university research labs. The most visible ones were the University of Massachusetts in Amherst, the University of Southern Colorado, the Carnegie-Mellon University and the Stanford Research Institute (SRI), which is a spin-out from Carnegie-Mellon University. SRI is a well-known research lab that is separately organised as a foundation. In the US there are other avenues towards modeling of cities, which are not defense oriented. One is architecture. In Los Angeles there is the architecture department of the University of California at Los Angeles. They are building a model the entire city of Los Angeles using students and manual work. 0.17 Applications Let me come to a conclusion of city modeling. Why do people create such modeling? The development of an anwer presents another opportunity to do application studies for Diploma and Doctoral thesis Let me illustrate some of those applications of city models. These certainly include city planning, architectural design, (car-)navigation, there is engineering re-construction of buildings that have been damaged and need to be repaired, then infotainment (entertainment), there is simulation and training for fire-guards and for disaster preparedness. Applications can be found in Telecom or in the Military. A military issue is guidance of robot soldiers, targeting and guiding of weapons. In Telecom we may need to transmit data from roof to roof as one way of broad band wireless access systems. In infotainment we might soon have 3-dimensional phonebooks. 0.18 Telecom Applications of City Models A particular computer graphics and image processing issue which should be of specific interest to Telematics-people is “the use of building models for Telecom and how these building models are 22 CHAPTER 0. INTRODUCTION made”. In Slide 0.202 is a three dimensional model of the downtown of Montreal. The purpose of this model is the plan to setup on top of roofs of high buildings. Those antennas would serve as hubs to illuminate other buildings and to receive data from other buildings, in a system that is called Local Multi-Point Distribution system (LMDS). This is a broadband wireless access technology that competes with fibre optics in the ground and with satellite communication. We will see how the technologies will shake out, but LMDS is evolving everywhere, it is very scaleable since one can build up the system sequentially hub-by-hub, and one can increase the performance sequentially as more and more users in buildings sign up. Slide 0.204 is a model of a large section of Vancouver, where the buildings are modeled in support of an LMDS-project. In order to define where to place a hub one can go into software that automatically selects the best location for a hub. For example if we place an antenna on a high building we then can determine which buildings illuminated from that antenna and which are not. We use examples from a Canadian project to map more than 60 cities. One delivers to the Telecom company so called “raster data”, but also so called “vector data”, and also non-graphic data, namely addresses. We will talk later about raster and vector data structures, and we will discuss how they are converted into one another. The geometric accuracy of the shape of these buildings should be in the range of ±1 meter in x, y, and z in order to be useful for the optimum location of antennas. How many buildings are in a square km? In Montreal this was about 1000 buildings per sqkm in the downtown. Because the data need to be delivered quickly (Telecom-companies need them “now”), one can not always have perfect images to extract buildings from. So one must be able to mix pre-existing photography and new aerial sources and work from what is there. For this reason one needs to be robust in one’s procedures vis-à-vis the type of photography. The question often is: from what altitude is that photography taken and therefore what is the scale of the photographs? Some Telecom-companies want all buildings (commercial and residential), while others only need the commercial buildings. Most of the companies want all addresses. Even multiple addresses must be provided in the case of an apartment building. There is always a need to be quick and inexpensive Companies expect that a hundred sqkm can be modeled per week which is a hundred thousand buildings per week. One cannot achieve this by hand. One has to do this by machine. One challenge might be that one is faced with aerial photography that is flown at too large a scale. Slide 0.207 shows a high-riser, looks different in one view from the other stereoscopic view in Slide 0.208. In a high-rise building we may not even see a certain side of the building in one photograph, but we see that side in the other. Our procedure must cope with these dissimilarities. In Slide 0.209 is a set of polygons extracted from an image and one can already see that some polygons are not visible from that particular photograph. Clearly those data where extracted from another photograph as shown in Slide 0.210. The same situation is illustrated again in this second example of Slide 0.211. Finally we have a raster representation of the buildings in Slide 0.212. So we have an (x, y)-grid on the ground and to each (x, y)-grid we have a z-elevation. The images shown before were the source of the building in the center of this particular raster representation. But we also want a vector-representation of the building footprints and of the details of the roofs as in the example of downtown Montreal. These vectors are needed, because the addresses can be associated with polygons describing a building, but one has a harder time associating addresses with a raster representation. However, the signal propagation computation needs raster data as shown here. The entire area of central Montreal has 400,000 buildings as shown in Slide 0.217. Zooming in on the green segment permits one to see city-blocks. Zooming in further produces individual buildings. A very complex building is the cathedral, which on a an aerial photograph looks like Slide 0.220. 0.18. TELECOM APPLICATIONS OF CITY MODELS 23 Lets us summarize: the data-sets being used for this Telecom wave-propagation modeling in the LMDS application consists first of all of vector data of the buildingsSlide 0.222 but also of the vegetation, because the vegetation may block the intervisibility of antennas and, we show also the combination of both. Of course the same data are needed in a raster format of the building data, and finally a combination of raster and vector data to include the trees. And we must not forget the addresses. Again, there may be one address per building, or multiple addresses for each building. The addresses are locked to the geometric data address-locators that are placed inside the polygons. As a result the addresses are associated with the polygons and thus with the buildings. What do such Telecom-data-sets go for in terms of price? A building may cost between $ 1 and $ 25. A square km may go for $ 100 to $ 600. However, if there are 1000 buildings per sqkm then obviously an individual building may cost less than one dollar. A metropolis such as Montreal may cover 4000 square km but the interest is focussed on 800 sqkm. On average of course there are less than 1000 buildings per sqkm. One might find more typically 200 or so buildings per sqkm over larger metropolitan regions. ... 24 CHAPTER 0. INTRODUCTION 0.18. TELECOM APPLICATIONS OF CITY MODELS 25 Slide 0.1 Slide 0.2 Slide 0.3 Slide 0.4 Slide 0.5 Slide 0.6 Slide 0.7 Slide 0.8 Slide 0.9 Slide 0.10 Slide 0.11 Slide 0.12 Slide 0.13 Slide 0.14 Slide 0.15 Slide 0.16 Slide 0.17 Slide 0.18 Slide 0.19 Slide 0.20 Slide 0.21 Slide 0.22 Slide 0.23 Slide 0.24 Slide 0.25 Slide 0.26 Slide 0.27 Slide 0.28 26 CHAPTER 0. INTRODUCTION Slide 0.29 Slide 0.30 Slide 0.31 Slide 0.32 Slide 0.33 Slide 0.34 Slide 0.35 Slide 0.36 Slide 0.37 Slide 0.38 Slide 0.39 Slide 0.40 Slide 0.41 Slide 0.42 Slide 0.43 Slide 0.44 Slide 0.45 Slide 0.46 Slide 0.47 Slide 0.48 Slide 0.49 Slide 0.50 Slide 0.51 Slide 0.52 Slide 0.53 Slide 0.54 Slide 0.55 Slide 0.56 0.18. TELECOM APPLICATIONS OF CITY MODELS 27 Slide 0.57 Slide 0.58 Slide 0.59 Slide 0.60 Slide 0.61 Slide 0.62 Slide 0.63 Slide 0.64 Slide 0.65 Slide 0.66 Slide 0.67 Slide 0.68 Slide 0.69 Slide 0.70 Slide 0.71 Slide 0.72 Slide 0.73 Slide 0.74 Slide 0.75 Slide 0.76 Slide 0.77 Slide 0.78 Slide 0.79 Slide 0.80 Slide 0.81 Slide 0.82 Slide 0.83 Slide 0.84 28 CHAPTER 0. INTRODUCTION Slide 0.85 Slide 0.86 Slide 0.87 Slide 0.88 Slide 0.89 Slide 0.90 Slide 0.91 Slide 0.92 Slide 0.93 Slide 0.94 Slide 0.95 Slide 0.96 Slide 0.97 Slide 0.98 Slide 0.99 Slide 0.100 Slide 0.101 Slide 0.102 Slide 0.103 Slide 0.104 Slide 0.105 Slide 0.106 Slide 0.107 Slide 0.108 Slide 0.109 Slide 0.110 Slide 0.111 Slide 0.112 0.18. TELECOM APPLICATIONS OF CITY MODELS Slide 0.113 Slide 0.114 Slide 0.115 Slide 0.117 Slide 0.118 Slide 0.119 29 Slide 0.116 30 CHAPTER 0. INTRODUCTION 0.18. TELECOM APPLICATIONS OF CITY MODELS 31 Slide 0.120 Slide 0.121 Slide 0.122 Slide 0.123 Slide 0.124 Slide 0.125 Slide 0.126 Slide 0.127 Slide 0.128 Slide 0.129 Slide 0.130 Slide 0.131 Slide 0.132 Slide 0.133 Slide 0.134 Slide 0.135 Slide 0.136 Slide 0.137 Slide 0.138 Slide 0.139 Slide 0.140 Slide 0.141 Slide 0.142 Slide 0.143 Slide 0.144 Slide 0.145 Slide 0.146 Slide 0.147 32 CHAPTER 0. INTRODUCTION Slide 0.148 Slide 0.149 Slide 0.150 Slide 0.151 Slide 0.152 Slide 0.153 Slide 0.154 Slide 0.155 Slide 0.156 Slide 0.157 Slide 0.158 Slide 0.159 Slide 0.160 Slide 0.161 Slide 0.162 Slide 0.163 Slide 0.164 Slide 0.165 Slide 0.166 Slide 0.167 Slide 0.168 Slide 0.169 Slide 0.170 Slide 0.171 Slide 0.172 Slide 0.173 Slide 0.174 Slide 0.175 0.18. TELECOM APPLICATIONS OF CITY MODELS 33 Slide 0.176 Slide 0.177 Slide 0.178 Slide 0.179 Slide 0.180 Slide 0.181 Slide 0.182 Slide 0.183 Slide 0.184 Slide 0.185 Slide 0.186 Slide 0.187 Slide 0.188 Slide 0.189 Slide 0.190 Slide 0.191 Slide 0.192 Slide 0.193 Slide 0.194 Slide 0.195 Slide 0.196 Slide 0.197 Slide 0.198 Slide 0.199 Slide 0.200 Slide 0.201 Slide 0.202 Slide 0.203 34 CHAPTER 0. INTRODUCTION Slide 0.204 Slide 0.205 Slide 0.206 Slide 0.207 Slide 0.208 Slide 0.209 Slide 0.210 Slide 0.211 Slide 0.212 Slide 0.213 Slide 0.214 Slide 0.215 Slide 0.216 Slide 0.217 Slide 0.218 Slide 0.219 Slide 0.220 Slide 0.221 Slide 0.222 Slide 0.223 Slide 0.224 Slide 0.225 Slide 0.226 Slide 0.227 Slide 0.228 Slide 0.229 Slide 0.230 Slide 0.231 0.18. TELECOM APPLICATIONS OF CITY MODELS 35 Slide 0.232 Slide 0.233 Slide 0.234 Slide 0.235 Slide 0.236 Slide 0.237 Slide 0.238 Slide 0.239 Slide 0.240 Slide 0.241 Slide 0.242 Slide 0.243 Slide 0.244 Slide 0.245 Slide 0.246 Slide 0.247 Slide 0.248 Slide 0.249 Slide 0.250 Slide 0.251 Slide 0.252 Slide 0.253 Slide 0.254 Slide 0.255 Slide 0.256 Slide 0.257 Slide 0.258 Slide 0.259 36 CHAPTER 0. INTRODUCTION Chapter 1 Characterization of Images 1.1 The Digital Image Images can be generated from at least two sources. The first is creation of the image from the measurements taken by a sensor. We would call this a “natural image”. In contrast, an image may also be generated by a computer describing an object or a situation that may or may not consist in the real-world. Such images are “computer generated” (CGI, computer-generated-images). All digital images have a coordinate system associated with them. Slide 1.5 is an original and typical image with two dimensions and has a rectangular (Cartesian) coordinate system with axes x and y. Therefore a location in the image can be defined by its coordinates x and y. Properties of the image can now be associated with that location. In that sense the image is an algebraic function f (x, y). When we deal with digital images then we discretize this continuous function and we replace the continuous image by rows and columns of image elements or pixels. A pixel is typically to be a square or rectangular entity. More realistically of course the sensor that may have caused an image may have an instantaneous field-of-view that is not rectangular or square. It is oftentimes a circle. We are presenting an image digitally as an arrangement of square pixels, although the machinery which creates the digital image may not produce square pixels. Digital images are fairly simple arrangements of numbers that are associated with gray values as illustrated in Slide 1.7. If shows four different gray values between 0 and 30 with 0 being white and 30 being black. A very simple type of image is a so-called “binary image” or binary mask. That is an image of which the pixels have gray values of either 0 as white or 1 as black. Such a binary image may be obtained by thresholding a gray value image. We may have a threshold Algorithm 2 Threshold image 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: create a binary output image with the same dimensions as the input image for all pixel p of the input image do retrieve grayvalue v of pixel p from image find pixel p0 of output image corresponding to p if v ≥ vt then {compare grayvalue v with threshold vt } set p0 to white else set p0 to black end if end for 37 38 CHAPTER 1. CHARACTERIZATION OF IMAGES that takes all pixel values between 15 and 25 to be black (or 1) and all other gray values will be set to white or 0. An immediate question to ask is for the reason that this technology has been developed to take continuous gray values and convert them into digital pixel arrays. Let’s discuss a few advantages, a very significant one is “quantification”. In a digital environment we are not subject to judging an image with our opinions but one has actual measurements. This can be illuminated by an example of a gray area embedded either in a dark or a white background. Subjectively our eye will tell us that the gray area is brighter when embedded in a dark environment or darker when embedded in a brighter environment. But in reality the two gray values are identical. An eye can objectively differentiate a limited number of gray values. In a chaotic image we may be able to separate only 16 to 64 gray values. Relatively, though, namely in situations where we have two areas adjacent to one another, our eyes become very sensitive to the differences. But we cannot compare a gray-tone in one corner of an image to a gray-tone in another corner of the same image and be certain which one is brighter or darker. That can be easily accomplished in a digital environment. There is a whole host of other advantages that will not be discussed at the same level of detail. First, a very important one is the automation of the visual sense. We can give the computer eyes and can process the visual information by machine, and thereby taking the work of interpreting various visual inputs away from the human. Examples are quality control in a factory environment or in inaccessible, dangerous areas. Second, an advantage is “flexibility”. We have options that we do not have in an analog environment or with the natural visual sense in configuring very flexible sensing systems for very specific tasks. Third, the ability to store, retrieve, transfer and publish visual information at very little cost is another advantage if the information is digital. We all have of course experience now with multimedia information on the web and we all know that duplication and transfer is available at almost no cost. Forth is the advantage to enhance the visual sense of the human by an array of sensors, for example under water imaging, sound imaging, x-ray imaging, microwave imaging. We will address sensors in more detail. Fifth, digital processing of sensor data is essentially independent of the specifics of the sensor. We may have algorithms and software that are applicable to a variety of sensors. That is an advantage in a digital environment. Sixth is cost: digital images are inexpensive. This was mentioned already in the context of storage, transfer and publication. Expensive looking color images can be rendered on a computer monitor and yet we have no direct costs for those images. This is quite a difference from going to a photo lab and getting quality paper prints offer diapositive. The seventh advantage of digital images needs an example to explain. There exist numerous satellites orbiting the Earth and carrying Earth-observing sensors. One such system is from the US-NASA and is called “Landsat”, Slide ?? is an example of a Landsat image of the Ennstal with its rows and columns. What makes this image interesting is that the color presentation of what the sensor in orbit “sees”. The presentation is made from 7 separate spectral channels, not from simple red/green/blue color photography. Something that is very typical of the flexibility and versatility of digital sensors and digital image processing is this ability to extend the visual capabilities of humans and operate with many more images than a human can “see” or cope with. Prüfungsfragen: • Was versteht man unter einem Schwellwertbild“, und für welchen Zweck wird es verwendet? ” • Welche Vorteile haben digitale Bilder gegenüber analogen Bildern? • Was versteht man unter einem Mehrfach- oder Multispektralbild, und wofür wird es verwendet? 1.2. THE IMAGE AS A RASTER DATA SET 1.2 39 The Image as a Raster Data Set A digital image is an array of pixels. It was already mentioned that in principle the images are continuous functions f (x, y). A very simple “image model” states that f (x, y) is the product of two separate functions. One function is the illumination I and the other function describes the properties of the object that is being illuminated, namely the reflection R. The reflection function may vary between 0 and 1 whereas the illumination function may vary between 0 and ∞. We now need to discretize this continuous function in order to end up with a digital image. We might create 800 by 1000 pixels, a very typical arrangement of pixels for the digital sensing environment. So we sample our continuous function f (x, y) into an N × M matrix with N rows and M columns. Typically our image dimension are 2n . So our number of rows may be 64, 128, 512, 1024 etc. We not only discretize or sample the image (x, y)-locations. We also have to take the gray value at each location and discretize it. We do that also at 2b , with b typically being small and producing 2, 4, 8, 12, 16 bits per pixels. Definition 1 Amount of data in an image Definition 3: ”The amount of data of an image” To calculate the amount of data of an image you have to have given the geometric and radiometric resolution of the image. Let’s say we have an image with N columns and M rows (geometric resolution) and with the radiometric resolution of R bits per pixel. The amount of data b of the image is then calculated using the formula: b=N ∗M ∗R A very simple question is shown in Slide 1.20. If we create an image of an object and we need to understand from the image a certain detail in the object, say a spec of dirt on a piece of wood of 60 cm by 60 cm, and if that dirt can be as small as 0.08 mm2 , what’s the size of the image to be sure that we recognize all the dirt spots? The resolution of an image is a widely discussed issue. When we talk about a geometric resolution of an image than we typically associate with this the size of the pixel on the object and the number of pixels in an image. When we talk about radiometric resolution than we describe here the number of bits we have per pixel. Let us take the example of geometric resolution. We have in Slide 1.22 and Slide 1.23 a sequence of images of a rose that begins with a resolution of a 1000 by 1000 pixels. We go down from there to ultimately 64 by 64 or even 32 by 32 pixels. Clearly at 32 by 32 pixels we cannot recognize the rose any more. Lets take a look at the radiometric resolution. We have in Slide 1.24 a black and white image of that a rose at 8 bits per pixel. We reduce the number of bits and in the extreme case we have one bit only, resulting in a binary image (either black or white). In the end we may have a hard time interpreting what we are looking at, unless we know already what to expect. As we will see later, image processing a 8-bits in black & white images is very common. A radiometric resolution at more bits per black & white pixel is needed for example in radiology. In medicine it is not uncommon to use 16 bits per pixel. With 8 bits we obviously get 256 gray values, if we have 12 bits we have 4096 gray values. The color representation is more complex, we will talk about that extensively. In that case we do not have one 8-bit number per color pixel, but we typically have three numbers, one each for red/green/blue, thus 24 bits in total per each color pixel. 40 CHAPTER 1. CHARACTERIZATION OF IMAGES Prüfungsfragen: • Es besteht in der Bildverarbeitung die Idee eines sogenannten Bildmodelles“. Was ist ” darunter zu verstehen, und welche Formel dient der Darstellung des Bildmodells? • Beschreiben Sie den Vorgang der Diskretisierung beim Übergang von einem analogen zu einem digitalen Bild. • Was versteht man unter Sampling, und welche Probleme treten dabei auf? Sie sind eingeladen, in Ihrer Antwort Formeln zu benutzen. • Was bedeuten die Begriffe geometrische“ bzw. radiometrische“ Auflösung eines Bildes? ” ” Versuchen Sie, Ihre Antwort durch eine Skizze zu verdeutlichen. 1.3 System Concepts We talk about image-analysis, image-processing or pattern recognition and about computer graphics. What are their various basic ideas? Image processing goes from the image to a model of an object, and from there to an understanding of the object. In [GW92] an image analysis system is described in the first introduction chapter. One always begins with (a) sensors, thus with the image acquisition step, the creation of an image by a camera, radar system, by sound. Once the image is acquired it is, so to speak, “in the can”. We now can (b) improve the image, this is called “pre-processing”. Improving means fixing errors in the image, making the image look good for the eye if a human needs to inspect it. Preprocessing produces a new, improved image. We now want to decompose the image into its primitives. We would like to (c) segment it into areas or fields, edges, lines, regions. This creates from the pre-processed image as it has been seen visually a new image in which the original pixels are substituted by the image regions, contours, edges. We denote this as “segmentation”. After segmentation we need to create a (d) representation and a description of the image contents. And finally we want to use the image contents and (e) interpret their meaning. How do objects looks like? This phase is called recognition and interpretation. All of this is based on (f) knowledge about a problem domain, about the sensor, about the object, about the application of the information. So once the object information has been interpreted we now can use the information extracted from the image for action. We may make a decision to e.g. move a robot, or to dispose of a defective part or to place an urban waste dump and so forth. The typical ideas at the basis of computer graphics are slightly different. We start out from the computer in which we store data about objects and create an image as a basis for actions. So we have a database and an application model. We have a program to take the data from the database and to feed the data into a graphic system for display. The object of computer graphics is the visual impression of a human user. However, what may seem like two different worlds, image processing versus computer graphics, really are largly one and the same world. Image processing creates from images of the real world a model of that real world. Computer graphics takes a model of objects and creates from it an image of those objects. So in terms of a real world, computer graphics and image processing are entirely complementary. Image processing is going from real world to a model of the real world, and computer graphics takes the object of the real world and creates an image of it. Where those two areas do diverge is in the non-real world. There is no sensing and no image analysis of a non-real world. What is computer graphics of a non-real world? Just look at cartoons and the movies. So there is point-of-view that says that image processing and computer graphics belong together. A slightly different point of view is to say that image processing and computer graphics overlap in areas addressing the real world, and that there are areas that are separate. 1.4. DISPLAYING IMAGES ON A MONITOR 41 Prüfungsfragen: • Skizzieren Sie den Vorgang der Bilderkennung als Kette von Prozessen von der Szene bis hin zur Szenenbeschreibung. 1.4 Displaying Images on a Monitor The customary situation today is with a refresh buffer in which we store numbers and represent the image. We will use a display controller that managers this buffer based on data and software residing on a host computer. And we have a video controller that takes what’s in the buffer and presents this information on a computer monitor. In the buffer we might have a binary image at 1 bit per pixel. Or we may have a color image at 24 bits per pixel. These are the typical arrangements for refresh buffers. The refresh buffer typically is larger than the information on a computer monitor. The computer monitor may display 800 by 1000 pixels. The refresh-buffer might hold 2000 by 2000 pixels. An image is displayed on the monitor using a cathode-ray tube or as LCD-arrangement. On a cathode-ray tube the image is being painted line by line on the phosphor is surface, going from top to bottom. Then the ray gets turned off. So it moves from left to right with the beam-on, right to left with the beam-off, top down with beam-on, down-to-top at beam-off. An image like the one in Slide “Wiedergabe bildhafter Information” is a line drawing. How could this be represented on a monitor? In the early days this was by a vector scan, so the cathode-ray was used to actually paint vectors on the monitor. Very expensive vector display monitors where originally built maybe as long as into the mid-80’s. The development of television monitors became very inexpensive, but vector monitors remained expensive, and so a transition took place from vector monitors to raster monitors, and today everything is represented in this raster. Vector scan) We could have a raster display to present the contours of an object, but we can also fill the object in the raster data format. Not all representations on a monitor are always dealing with the 3-dimensional world. Many representations in image form can be of an artificial world or of technical data, thus of non-image information. This is typically denoted by the concept of “visualization”. Slide “Polyline” is a visualization of data in one dimension. Associated with this very simple idea are concepts such as polylines (representing a bow tie) and we have a table of points 0 to 6 representing this polyline. There are concepts such as “markers” which are symbols that represent particular values in a two dimensional array. This has once been a significant element in computer graphic literature that today no longer represents a big issue. Prüfungsfragen: • Beschreiben sie die Komponenten, die in einem Computer zur Ausgabe und zur interaktiven Manipulation eines digitales Rasterbildes benötigt werden. • Beschreiben Sie unter Verwendung einer Skizze den Aufbau eines digitalen Rasterbildes auf der Leuchtfläche eines Elektronenstrahlschirmes . • Was ist der Unterschied zwischen Vektor- und Rasterdarstellung eines digitalen Bildes? Veranschaulichen Sie Ihre Antwort anhand eines einfachen Beispiels und beschreiben Sie die Vor- und Nachteile beider Verfahren. • Erklären Sie anhand einer Skizze den zeitlichen Ablauf des Bildaufbaus auf einem Elektronenstrahlschirm! 42 CHAPTER 1. CHARACTERIZATION OF IMAGES Algorithm 3 Simple raster image scaling by pixel replication 1: 2: 3: 4: 5: 6: 7: 8: widthratio ⇐ newimagewidth/oldimagewidth heightratio ⇐ newimageheight/oldimageheight for all y such that 0 ≤ y < newimageheight do for all x such that 0 ≤ x < newimagewidth do newimage[x, y] ⇐ oldimage[round(x/widthratio), round(y/heightratio)] end for x⇐0 end for Algorithm 4 Image resizing 1: 2: 3: 4: 5: 6: widthratio ⇐ newgraphicwidth/oldgraphicwidth heightratio ⇐ newgraphicheight/oldgraphicheight for all Points p in the graphic do p.x ⇐ p.x × widthratio p.y ⇐ p.y × heightratio end for 1.5 Images as Raster Data We deal with a continuous world of objects, such as curves or areas and we have to convert them into pixel arrays. Slide “Rasterkonvertiertes Objekt” shows the representation of a certain figure in a raster image. If we want to enlarge this, we obtain a larger figure with the exact same shape but a larger size of the object’s elements. If we enlarged the image by a factor of two, what was one pixel before now is talking up four pixels. The same shape that we had before would look identical but smaller if we had smaller pixels. We make a transition to pixels that are only a quarter as large as before. If we now enlarge the image, starting from the smaller pixels we get back the same shape we had before. However, if we reconvert from the vector to a raster format, then the original figure really will produce a different result at a higher resolution. So we need to understand what pixel size and geometric resolution do in the transition from a vector world to a raster world. Prüfungsfragen: • Was versteht man unter Rasterkonversion“, und welche Probleme können dabei auftreten? ” 1.6 Operations on Binary Raster Images There is an entire world of interesting mathematics dealing with binary images and operations on such binary images. These ideas have to do with neighborhoods, connectivity, edges, lines, and regions. This type of mathematics was developed in the 1970’s. A very important contributor was Prof. Azriel Rosenfeld, who with Prof. Avi Kak wrote the original book on pattern recognition and image processing. What is a neighborhood? Remember that a pixel at location (x, y) has a neighborhood of four pixels, that are up and down, left and right of the pixel in the middle. We call this an N4 neighbourhood or 4-neighbors. We can also have diagonal neighbors ND with the lower left, lower right, upper right, upper left neighbors. We add these ND and the N4 neighbors to obtain the N8 neighbors. This is being further illustrated as Prof. Rosenfeld did in 1970. Slide 1.56 presents the N4 -neighbors and the N8 -neighbors and associates this with a chess game’s movements of the king. We may also have the oblique-neighbors Nv and the springer-neighbors Nsp which are like 1.6. OPERATIONS ON BINARY RASTER IMAGES analogous chess movements of the springer etc. from the “Dame” game. 43 Another diagonal neighborhood would derive We have neighborhoods of the first order, which are the neighbors of a pixel-x. The neighbors of the neighbors are “neighbors of second order” with respect to a pixel at x. We could increase the order by having neighbors of the neighbors of the neighbors. Definition 2 Connectivity 2 Pixel haengen zusammen, wenn sie einanders Nachbarn sind und dieselbe Zusammenhangseigenschaft V besitzen. 4-Zusammenhang: if q N4-Nachbar von p then Pixel p und q haengen zusammen else Pixel p und q haengen nicht zusammen 5: end if 1: 2: 3: 4: {Def. 5} m-Zusammenhang: 1: if (N4 (p) geschnitten N4 (q)) = 0 then {N4( x): Menge der x-N4-Nachbarn} 2: if (q ist N4 -Nachbar von p)||(q ist ND-Nachbar von p) then {Def. 5} 3: Pixel p und q haengen zusammen 4: else 5: Pixel p und q haengen nicht zusammen 6: end if 7: else 8: Pixel p und q haengen nicht zusammen 9: end if Connectivity is defined by two pixels belonging together: They are “connected” if they are one another’s neighbors. So we need to have a neighbor-relationship to define connectivity. Depending on a 4-neighborhood, an 8-neighborhood, a springer-neighborhood we can define various types of connectivities. We therefore say that two pixels p and q are one another’s neighbors if they are connected, if they are neighbors under a neighborhood-relationship. This becomes pretty interesting and useful once we start to do character-recognition and we need to figure out which pixels belong together and create certain shapes. We may have an example of three-by-three pixels of which four pixels are black and five pixels are white. We now can have connections established between those four black pixels under various connectivity rules. A connectivity with eight neighbors creates a more complex shape than a connectivity via so-called mneighbors, where m-neighbors have been defined previously in Slide “Zusammenhaengende Pixel”. Definition 3 Distance Gegeben: Punkte p(x,y) und q(s,t) 1: 2: 3: p De(p,q) = 2 (x − s)2 + (y − t)2 (Euklidische Distanz) D4-Distanz (City Block Distance) D8-Distanz (Schachbrett-Distanz) The neighborhood- and connectivity-relationships can be used to established distances between pixels, to define edges, lines and region in images, to define contours of objects, to find a path between any two locations in an image and to perhaps eliminate pixels as noise if they are not connected to any other pixels. A quick example of a distance addresses two pixels P and Q with 44 CHAPTER 1. CHARACTERIZATION OF IMAGES a distance depending on the neighborhood-relationships that we have defined. The Euclidian distance of course is simply obtained by the pythagorean sum of the coordinate differences. But if we take a 4-neighborhood as the base for distance measurements than we have a “city block distance”, two blocks up, two blocks over. Or if we have the 8-neighborhood than we have a “chessboard type of distance”. Let’s define an “edge”. This is important because there is a mathematical definition that is a little different from what one would define an edge to be in a sort casual way. An edge e in an image is a property of a pair of pixels which are neighbors of one another. That is thus a property of a pair of pixels and one needs to consider two pixels to define this. It is important that the two pixels are neighbors under a neighborhood relationship. Any pair of pixels that are neighbors of one another represent an edge. The edge has a “direction” and a “strength”. Clearly the strength of the edge is what is important to us. The edge is defined on an image B and an edge image is obtained by taking each edge value at each pixel. We can apply a threshold to the weight and the direction of the edge. All edges with a weight beyond a certain value become 1 and all edges less than a certain value become 0. In that case now we have converted our image into a binary edge image. What is a line? A line is a finite sequence of edges, with each edge ei , i = 1, . . . n. A line is a sequence of edges where the edges need to be one another’s neighbor under a neighborhood relationship. The edges must be connected. A line has a length, the length is the number of the edges that form that line. What’s a region in the image? A region is a connected set R of pixels from an image B. A region has a contour. A contour is a line composed of edges and the edges are defined with the property of two neighboring pixels P and Q. P must be part of the region R, Q must not be. This sounds all pretty intuitive, but gets pretty complicated once one starts doing operations. Prüfungsfragen: • Wenn wir eine Distanz“ zwischen zwei Pixeln in einem Digitalbild anzugeben haben, stehen ” uns verschiedene Distanzmaße zur Verfügung. Zählen Sie bitte auf, welche Distanzmaße Sie kennen. Sie sind eingeladen, für die Beantwortung Formeln zu nutzen. • Bei der Betrachtung von Pixeln bestehen Nachbarschaften“ von Pixeln. Zählen Sie alle ” Arten von Nachbarschaften auf, die in der Vorlesung behandelt wurden, und beschreiben Sie diese Nachbarschaften mittels je einer Skizze. • Welche Möglichkeiten gibt es, Pixel in einem digitalen Rasterbild als zusammenhängend zu definieren? Erläutern Sie jede Definition anhand einer Skizze. • Zu welchen Zwecken definiert man Nachbarschafts- und Zusammenhangsbeziehungen zwischen Pixeln in digitalen Rasterbildern? • Geben Sie die Definitionen der Begriffe Kante“, Linie“ und Region“ in einem digitalen ” ” ” Rasterbild an. 1.7 Algebraic Operations on Images We can add two images, subtract, multiply, divide them, we can compare images by some logical operations and we can look at one image using a second image as a mask. Suppose we have a source image, an operator and a destination image. Now, depending on the operator we obtain a resulting image. We take a particular source and destination image and make our operator the function “replace” or the function “or” or the function “X or” or the function “and” to then obtain different results. We may have mask operations. In this case we take an image A to 1.7. ALGEBRAIC OPERATIONS ON IMAGES 45 Algorithm 5 Logical mask operations This is an example for a mask operation. Two images are linked with the Boolean OR-operator, pixel by pixel. 1: for all i=0, i<width, i++ do 2: for all j=0, j<height, j++ do 3: x1=source-image.value(i,j) 4: x2=operate-image.value(i,j) 5: target-image.value(i,j) = x1 OR x2 6: end for 7: end for obtain a resulting image “not A”. We may produce from images A and B a logical addition “and”. Slide “Maskenoperationen 2” is an example of the “or” and the “X or” operation, slide “Maskenoperationen 3” shows the “not and” operation. Operating on raster images with the help of a second image was mentioned earlier as a masking operation. This is a filter operator, also called window operation. Let us assume that we have an image with gray values z and we have a second image with gray values w. We can now take the second image, a small 3 × 3 pixel image, place it over the first image and now define an operation that multiplies each pixel of the second image with the underlying pixel of the first image adding up all the multipled values, resulting in a new z-value at the pixel in the middle of that mask. All we have done here is apply a filter operation. We will address filter operations later in a separate chapter of this class on filters. Algorithm 6 Fast mask operations 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: Framebuffer A, B w1 w2 w3 Mask = w4 w5 w6 w7 w8 w9 B=A Multiply B by w5 Shift A right by one pixel Add w4 · A to B Shift A down by one pixel Add w1 · A to B Shift A left by one pixel Add w2 · A to B Shift A left by one pixel Add w3 · A to B Shift A up by one pixel Add w6 · A to B Shift A up by one pixel Add w9 · A to B Shift A right by one pixel Add w8 · A to B Shift A right by one pixel Add w7 · A to B Shift A left by one pixel Shift A down by one pixel {A . . . source image, B . . . destination image} {Defines filter mask} {Initialize frame buffers} {Multiplication is performed on all pixels of B} {Shifts whole frame buffer to the right} {Now A is in original position again} At this point let us close out with the basic idea of a parallel operation on an image using a second image as a mask or filter or window. What is interesting is that these operations can run very quickly. We do not need to necessarily go sequentially through the image and do these multiplications and additions on one pixel only. We can instead do a fast operation on an entire 46 CHAPTER 1. CHARACTERIZATION OF IMAGES mask. For this we may have an input frame buffer A and an output frame puffer B. We may be able to process everything that is in these two buffers in a 1/30 of a second. So we can do an operation on N times M pixels in (N × M )/30 seconds, as illustrated in Slide “Operationen”. Prüfungsfragen: • Gegeben seien die zwei binären Bilder in Abbildung ??. Welches Ergebnis wird durch eine logische Verknüpfung der beiden Bildern nach einer xor“-Operation erhalten? Verwenden ” Sie bitte eine Skizze. • Erläutern Sie anhand einiger Beispiele, was man unter algebraischen Operationen mit zwei Bildern versteht. • Erklären Sie die Begriffe Maske“, Filter“ und Fenster“ im Zusammenhang mit algebrais” ” ” chen Operationen mit zwei Bildern. Veranschaulichen Sie Ihre Antwort anhand einer Skizze. 1.7. ALGEBRAIC OPERATIONS ON IMAGES 47 48 CHAPTER 1. CHARACTERIZATION OF IMAGES Slide 1.1 Slide 1.2 Slide 1.3 Slide 1.4 Slide 1.5 Slide 1.6 Slide 1.7 Slide 1.8 Slide 1.9 Slide 1.10 Slide 1.11 Slide 1.12 Slide 1.13 Slide 1.14 Slide 1.15 Slide 1.16 Slide 1.17 Slide 1.18 Slide 1.19 Slide 1.20 Slide 1.21 Slide 1.22 Slide 1.23 Slide 1.24 Slide 1.25 Slide 1.26 Slide 1.27 Slide 1.28 1.7. ALGEBRAIC OPERATIONS ON IMAGES 49 Slide 1.29 Slide 1.30 Slide 1.31 Slide 1.32 Slide 1.33 Slide 1.34 Slide 1.35 Slide 1.36 Slide 1.37 Slide 1.38 Slide 1.39 Slide 1.40 Slide 1.41 Slide 1.42 Slide 1.43 Slide 1.44 Slide 1.45 Slide 1.46 Slide 1.47 Slide 1.48 Slide 1.49 Slide 1.50 Slide 1.51 Slide 1.52 Slide 1.53 Slide 1.54 Slide 1.55 Slide 1.56 50 CHAPTER 1. CHARACTERIZATION OF IMAGES Slide 1.57 Slide 1.58 Slide 1.59 Slide 1.60 Slide 1.61 Slide 1.62 Slide 1.63 Slide 1.64 Slide 1.65 Slide 1.66 Slide 1.67 Slide 1.68 Slide 1.69 Slide 1.70 Slide 1.71 Slide 1.72 Slide 1.73 Slide 1.74 Slide 1.75 Chapter 2 Sensing 2.1 The Most Important Sensors: The Eye and the Camera The eye is the primary sensor of a human. It is certainly important to understand how it operates to understand how a computer can mimic the eye and how certain new ideas in computer vision and also in computer graphics have developed taking advantage of the specificities of the eye. In Slide Slide 2.5 we show an eye and define an optical axis of an eye’s lens. This optical axis intersects the retina at a place called the fovea, which is the area of highest geometric and radiometric resolution. The lens can change its focal length using muscles that pull on the lens and change its shape. As a result the human can focus on objects that are near by, for example at a 25 cm distance which is typically used in reading a newspaper or book. Or it can focus at infinity looking out into the world. The light that is projected from the world through the lens onto the retina gets converted into signals that are then fed by nerves into the brain. The place where the nerve leaves the eye is called the blind spot. That is a location where no image can be sensed. The optical system of the eye consists, apart from the lens, of the so called vitreous humor 1 , in front of the lens is a protective layer called the cornea 2 and between the lens and the cornea is a space filled with liquid called the anterior chamber . Therefore the optical system of the eye consists of essentially four optically active bodies: 1. the cornea, 2. the anterior chamber, 3. the lens and 4. the vitreous humor. The conversion of light into nerve signals is accomplished by means of rods and cones that are embedded in the retina. The rods 3 are black-and-white sensors. The eye has about 75 million of them, and they are distributed widely over the retina. If there is very little light, the rods will still be able to receive photons and convert them into recognizable nerve-signals. If we see color, we need the cones 4 . We have only 6 million of those and they are not that evenly distributed as the rods are. They are concentrated at the fovea so that the fovea has about 150.000 of those cones per square millimeter. That number is important to remember for a discussion of resolution later on. We take a look at the camera as an analogon of an eye. A camera may produce black-and-white or color-images, or even false color-images. Slide is a typical color image taken from an airplane of a set of buildings (see these images also in the previous Chapter 0). This color-photograph is built 1 in German: German: 3 in German: 4 in German: 2 in Glaskörper Hornhaut Zäpfchen Stäbchen 51 52 CHAPTER 2. SENSING from three component images. First is a the red channel. Second is the green channel followed by the blue channel. We can combine those red/green/blue channels into a true color-image. In terms of technical imaging, a camera is capable of producing a single image or an entire image sequence. When we have multiple images or image sequences, we typically denote them as multiimages. A first case be in the form of multi-spectral images, if we break up the entire range of electromagnetic radiation from ultraviolet to infrared into individual bands and produce a separate image for each band. We call the sum of those images multi-spectral . If we have many of those bands we might call the images hyper-spectral . Typical hyper-spectral image cameras produce 256 separate images simultaneously, not just red/green/blue! A second case is to have the camera sit somewhere and make images over and over, always in the same color but observing changes in the scene. We call that multi-temporal . A third case is to observe a scene or an object from various positions. A satellite may fly over Graz and take images once as the satellite barely arrives over Graz, a moment later as the satellite already leaves Graz. We call this multi-position images. And then finally, a fourth case might have images taken not only by one sensor but by multiple sensors, not just by a regular optical camera, but perhaps also by radar or other sensors as we will discuss them later. That approach will produce some multi-sensor images. This multiplicity of images presents a very interesting challenge in image processing. Particularly when we have a need to merge images that are taken at separate times from separate positions and with different sensors, and if we want to automatically extract information about an object from many images of that object, we have a good challenge. Multiple digital images of a particular object location results in multiple pixels per given location. Those pixels can be stacked on top of one another and then represent “a vector” with the actual gray values in each individual image being the “elements” of that vector. We can now apply the ideology of vector algebra to these multi-image pixels. Such a vector may be called feature vector , with the features being the color values of the pixel to which the vector belongs. Prüfungsfragen: • Was versteht man in der Sensorik unter Einzel- bzw. Mehrfachbildern? Nennen Sie einige Beispiele für Mehrfachbilder! 2.2 What is a Sensor Model? So far we have only talked about one particular sensor, the camera as an analagon to the eye. We describe in image processing each sensor by a so called sensor model . What does a sensor model do? It replaces the physical image and the process of its creation by a geometric description of the image’s creation. We stay with the camera: this is designed to reconstruct the geometric ray passing through the perspective center of the camera, from there through the image plane and out into the world. Slide 2.11 illustrates that in a camera’s sensor model we have a perspective center 0, we have an image plane P , we have image coordinates x and h, we have an image of the perspective center H at the location that is obtained by dropping a line perpendicular from the perspective center onto the image plane. We find that our image coordinate system x, h, and its origin M does not necessarily have to coincide with location H. So what is now a sensor model? It is a set of techniques and of mathematical equations that allow us to take an image point P 0 as shown in Slide 2.11 and define a geometric ray going from location 2.2. WHAT IS A SENSOR MODEL? 53 Definition 4 Perspective camera Definition 10 (Modellierung einer perspektiven Kamera(siehe Abschnitt 2.2)): Ziel: eine Beziehung zwischen dem perspektivischen Zentrum und der Welt aufzustellen; Werkzeug: perspektivische Transformation (projeziert 3 D-Punkte auf eine Ebene), ist eine nichtlineare Transformation. Beschreibung von Slide 2.12: Man arbeitet mit 2 Koordinatensystemen: 1.Bild-Koordinatensystem (x,y,z), 2.WeltKoordinatensystem (X,Y,Z). Ein Strahl vom Punkt w im 3 D-Objektraum trifft auf die Bildebene (x,y) im Bildpunkt c. Das Zentrum dieser Bildebene ist der Koordinatenursprung, von dem aus normal zu deren Ebene noch eine zusaetzliche z-Achse verlaeuft, die identisch mit der optischen Achse unserer Kameralinse ist. Dort, wo der Strahl diese z-Achse schneidet, hat man das sogenannte Linsenzentrum, welches die Koordinaten (0,0,L) besitzt; L ist bei Focuskameras mit der Focuslaenge zu vergleichen. Bedingung: Z>L d.h., alle Punkte, die uns interessieren, liegen hinter der Linse. Vektor w0 gibt die Position der Rotationsachsen im 3 D-Raum an, vom Ursprung des WeltKoordinatensystems bis zum Zentrum der Aufhaengung der Kamera Vektor r definiert, wo der Bildursprung ist unter Beruecksichtigung der Rotationsachsen (X0 , Y0 , Z0 ), welche die Kamera auf und ab rotieren lassen koennen, vom Zentrum der Aufhaengung bis zum Zentrum der Bildebene, r = (r1 , r2 , r3 )T . Perspektivische Transformation: Beziehung zwischen (x,y) und (X,Y,Z) Hilfsmittel: aehnliche Dreiecke x : L = (−X) : (Z − L) = X : (L − Z) y : L = (−Y ) : (Z − L) = Y : (L − Z) ’-X’ bzw. ’-Y’ bedeuten, dass die Bildpunkte invertiert auftreten (Geometrie) x = L · X : (L − Z) y = L · Y : (L − Z) Homogene Koordinaten von einem Punkt im kartesischen Koordinatensystem: wkar = (X, Y, Z)T whom = (k · X, k · Y, k · Z, k)T = (whom1 , whom2 , whom3 , whom4 )T , k = const.! = 0 Zurueckwandlung in kartesische Koordinaten: wkar = (whom1 : whom4 , whom2 : whom4 , whom3 : whom4 )T Perspektivische Transformationsmatrix: P 1 0 = 0 0 0 0 1 0 0 1 0 −1 : L 0 0 0 1 54 CHAPTER 2. SENSING 0 (the perspective center) through P 0 into the world. What the sensor model does not tell us is where the camera is and how this camera is oriented in space. So we do not, from the sensor model, find the world point P in three dimensional space (x, y, z). We only take a camera and an image with its image point P 0 and from that can project back into the world a ray, but where that ray intersects the object point in the world needs something that goes beyond the sensor model. We need to know where the camera is in a World system and how it is oriented in 3D-space. In computer vision and in computer graphics we do not always deal with cameras that are carried in aircraft looking vertically down and having therefore a horizontal image plane. Commonly, we have cameras that are in a factory environment or similar situation and they look horizontally or obliquely at something that is nearby. Slide 2.12 illustrates the relationships between a perspective center and the world. We have an image plane which is defined by the image coordinate axes x and y (was x and h before) and a ray from the object space denoted as W will hit the image plane at location C. The center of the image plane is defined by the coordinate origin. Perpendicular onto the image plane (which was defined by x and y) is the Z-axis and may in this case be identical to the optical axis of the lens. In this particular case we would not have a difference between the image point of the perspective center (was H before) and the origin of the coordinate system (was M before). Now, in this robotics case we have two more vectors that define this particular camera. We have a vector r that defines where the image origin is with respect to our rotation axis that would rotate the camera. And we have a vector W0 that gives us the position of that particular rotation axis in 3D-space. We still need to define for that particular camera its rotation axis that will rotate the camera up and down and that is oriented in a horizontal plane. We will talk about angles and positions of cameras later in the context of transformations. Let us therefore not pursue this subject here. All we need to say at this point is that a sensor model relates to the sensor itself and in robotics one might understand the sensor model to include some or all of the exterior paraphernalia that position and orient the camera in 3D-space (the pose). In photogrammetry, just that later data are part of the so-called exterior orientation of the camera. Prüfungsfragen: • Erläutern Sie den Begriff Sensor-Modell“! ” 2.3 Image Scanning Images on film need to be stored in a computer. But before they can be stored they need to be scanned. On film an image is captured in an emulsion. The emotion contains chemistry and as light falls onto the emulsion the material gets changed under the effect of photons. Those changes are very volatile. They need to be preserved by developing the film. The emulsion is protected from the environment by supercoats. The emulsion itself is applied to a film base. So the word “film” really applies to just the material on which the emulsion is fixed. There is a substrate that holds the emulsion onto the film base and the film base on its back often has a backing layer. That will be a black and white film. With colored film we have more than one emulsion. We have three of those layers on top of one another. We are dealing mostly with digital images, so analog film, photolabs and chemical film developments are not of great interest of us. But we need to understand a few basic facts about film and the appearances of objects in film. Slide 2.15 illustrate that appearance. We have the ordinate of a diagram to record the density that exists from the reflections of the world onto the emulsion. Those densities are 0 when it is very white, there is no light and the film is totally transparent (negative film!). And as more and more light falls onto that film the film will get more exposed and the density will get higher until 2.3. IMAGE SCANNING 55 the negative is totally black. Now this negative film is exposed by the light that is emitted from the object through a lens onto the film. Typically, the relationship between the density recorded on film and light emitted from an object is a logarithmic one. As the logarithm of the emitted light increases along the abszissa the density will typically increase linearly and that is the most basic relationship between the light falling onto a camera and the light recorded on film, except in the very bright and the very dark areas. When there is almost no light falling on the film, the film will still show what is called a gross fog. So film typically will never be completely unexposed. There will always appear to be an effect as if a little bit of light had fallen onto the film. We have a lot of light coming in, we loose the linear relationship again and we come to the “shoulder” of the gradation curve. As additional light comes in, the density of the negative does not increase any more. Note that the slope of the linear region is denoted here by tan(α) and is called the gamma of the film. This defines more or less sensitive films and the sensitivity has to do with the slope of that linear region. If a lot of light is needed to change the density, we call this a slow or “low sensitivity film”. If a small change in light causes large change in density then we call this a “very sensitive film” and the linear region is shallower. The density range that we can record on film is often perhaps between 0 to 2. However, in same technical applications or in the graphic arts and in the printing industry, densities may go up to 3.6. And in medicine X-ray film density is going up as high as 4.0. Again, we will talk more about density later so keep in mind those numbers: Note that they are dimensionless numbers. We will interpret them later. We need to convert film to digital images. This is based on one of three basic technologies. First, so-called drum scanners have the transparent film mounted on the drum, inside the drum is a fixed light source, the drum rotates, the light source illuminates the film and the light that is coming through the film is collected by a lens and put on a photo detector (photo-multiplier 5 ). The detector sends electric signals which get A/D converted and produce at rapid intervals a series of numbers per one rotation of the drum. We do get a row of pixels per one drum rotation. That has been very popular but has recently been made obsolete because this device has sensitive and rapid mechanic movements. It is difficult to keep these systems calibrated. Second, a much simpler way of scanning is by using not a single dot but a whole array of dots, namely a CCD (charge-coupled-device). We put them in a scan-head and collect light that is for example coming from below the table, shining through the film, gets collected through the lens and gets projected onto a serious of detectors. There may be 6000, 8000, 10.000 or even 14.000 detectors. And these detectors collect the information about one single line of film. The detector charges are being read out, an A/D converter produces for each detector element one number. Again, the entire row of detectors will create in one instant a row of pixels. How do we get a continuous image? Of course by moving the scan head and we can be in the process of collecting the charges built up row by row into an image (push-broom technology). Third, we can have a square array detector field. The square CCD is mounted in the scan-head and the scan-head “grabs” a square. How do we get a complete image that is much larger that a single square? By stepping the camera, stopping it, staring at the object, collecting 1000 by 1000 pixels, reading them out, storing them in the computer, moving the scan head, stopping it again, taking the next one and so on. That technology is called step and stare. An array CCD is used to cover a large document by individual tiles but then assemble the tiles into a seam-less image. We get the push-broom single-path linear CCD array scanner typically in desktop-, household-, H.P.-, Microtec-, Mostec-, UMAX-type products. Those create an image in a single swath and are limited by the length of the CCD array. If we want to create a larger image than the length of a CCD array then we need to assemble image segments. 5 in German: Sekundärelektronenverfielfacher 56 CHAPTER 2. SENSING So to create a swath by one movement of the scan head, we step the scan head over and repeat this swath in the new location. This is called the multiple path linear CCD scanner. Another name for this is xy-stitching. The scan head moves in x and y, individual segments are collected, then will be “stitched” together. Prüfungsfragen: • Skizzieren Sie drei verschiedene Verfahren zum Scannen von zweidimensionalen Vorlagen (z.B. Fotografien)! 2.4 The Quality of Scanning People are interested in how accurate scanning is geometrically. The assessment is typically based on scanning a grid and comparing the grid intersections in a digital image with the known grid intersection coordinates of the film document. A second issue is the geometric resolution. We check that by imaging a pattern. Slide 2.22 is called a US Air Force Resolution Target and each of the patterns has a very distinct distance between the black lines and intervals between of those black lines. As those black lines get smaller and narrower together we challenge the imaging system more and more. If we take a look at an area that is beyond the resolution of the camera than we will see that we cannot resolve the individual bars anymore. The limiting case that we can just resolve is used to describe the resolution capability of the imaging system. That may describe the performance of a scanner but it may just as well describe the resolution of a digital camera. These resolution targets come with tables that describe what each element resolves. For example, we have groups of six elements each (they are called Group 1, 2, 3, 4, 5, 6) and within each group we find six elements. In the example shown in Slide 2.24 one sees how the resolution is being designated by line pairs per millimeter . However, we have a pixels and the pixels have a side length. How do we relate the line pairs per millimeter to pixel diameter? We will discuss this later. The next subject for evaluating a digital image and developing a scanner is the gray value performance. We have a Kodak gray wedge that has been scanned. On the bright end the density is 0, on the dark end the density is 3.4. We have now individual steps of 0.1 and we can judge whether those steps get resolved both in the bright as well as in the dark area. On a monitor like this we can not really see all thirty-four individual steps in intervals of 0.1 D from 0 to 3.4. We can use Photoshop and do a function called histogram equalization, whatever that means, on each segment of this gray wedge. As a result we see that all the elements have been resolved in this particular case. Prüfungsfragen: • Wie wird die geometrische Auflösung eines Filmscanners angegeben, und mit welchem Verfahren kann man sie ermitteln? 2.5 Non-Perspective Cameras Cameras per se have been described as having a lens projecting light onto film and then we scan the film. We might also have instead of film a digital square array CCD in the film plane to get the direct digital image. In that case we do not go through a film scanner. We can also have a 2.6. HEAT IMAGES OR THERMAL IMAGES 57 camera on a tripod with a linear array, moving the linear array while the light is falling on the image plane collecting the pixels in a sequential motion much like a scanner would. There also are stranger cameras yet which do not have a square array in the film plane and avoid a regular perspective lens. These are non-perspective cameras. First let us look at a linear CCD array in howing a CCD array with 6000 elements that are arranged side by side, each element having a surface of 12 mm x 12 mm. These are being read out very rapidly and so that a new line can be exposed as the array moves forward. For example, an interesting arrangement with two lenses is shown in Slide 2.28: the two lenses expose one single array in the back. Half of the array looks in one direction, half in the other direction. By moving the whole scan head we now can assemble two digital strip images. Such a project to build this camera was completed as part of a PhD thesis in Graz. The student built a rig on top of his car, mounted this camera, he drove through the city, collecting images of building facades as we have seen earlier (See Chapter 0). Prüfungsfragen: • Welche Vor- und Nachteile haben nicht-perspektive (optische, also etwa Zeilen-, Wärmeoder Panorama-) Kameras gegenüber herkömmlichen (perspektiven) Kameras? 2.6 Heat Images or Thermal Images Heat images collect electromagnetic radiation in the middle to the far infrared, not in the near infrared. So it is not next to visible light in the electromagnetic spectrum. That type of sensing can be accomplished by a mirror that would illuminate (look at) essentially one small instantaneous field-of-view (IFOR), in the form of a circular area on the ground, collect the light from there, project it onto a detector and make sure that in a rapid sequence one can collect a series of those circular areas on the ground. What we have here is an instantaneous angle-of-view α. We have the center of a cone that relates the sensor to the ground, and the axis of the cone is at an angle of the vertical called “A”. In the old days, say in the sixties and seventies, often-times the recording was not digital but on film. Slide 2.35 illustrates the old-fashioned approach. We have infrared-light coming from the ground. It is reflected off a mirror, goes through an optical system that focuses that light on the IRdetector, it converts the incoming photons into an electric signal which is then used to modulate the intensity of light which is then projected via another lens and a mirror onto a piece of curved film. Slide 2.36 was collected in 1971 or 1972 in Holland. These thermal images were taken from an airplane over regularly patterned Dutch landscapes. What we see here is the geometrical distortion of fields, as a result of the airplane wobbling in the air as the individual image lines are collected in each row. Each image line is accrued to its previous one by a sequential motion of the airplane. A closer look shows that there are areas that are bright and others that are dark. If it is a positive then the bright things are warm, the dark things are cold. Prüfungsfragen: • Welche Vor- und Nachteile haben nicht-perspektive (optische, also etwa Zeilen-, Wärmeoder Panorama-) Kameras gegenüber herkömmlichen (perspektiven) Kameras? 58 CHAPTER 2. SENSING 2.7 Multispectral Images We already saw the concept of multi-spectral images. In principle they get, or in the past have been, collected by a rotating mirror that reflects the light from the ground off a mirror onto a refraction prism. The refraction prism splits the white light coming from the ground into its color-components. We have for each color a detector. This could be three for red/green/blue or 226 for hyper-spectral-systems. Detectors convert the incoming light into an electric signal and they get either A/D converted or directly recorded. In the old days recording was onto a magnetic tape unit, today we record everything on a digital disc with a so-called direct capture system DCS. When one does these measurements with sensors one really is into a lot of open air physics. One needs to understand what is light, electromagnetic radiation. When energy comes from the sun a lot of it is in the visible area, somewhat less in the ultraviolet, some what less in the infrared. The sun’s energy is augmented by energy that the Earth itself radiates off as an active body. However, its energy is in the longer wavelengths. The visible light goes, of course, from blue via green to red. The infrared goes from the near infrared to the middle and far infrared. As our wavelengths get longer we go away from infrared and we go into the short waves, microwaves, long microwaves and radiowaves. When we observe in a sensor the radiation that comes in from the surface of the Earth we don’t get an even distribution of the energy as the sun has sent it to the Earth but we get the reflection of the surface and those reflections are depending on what’s on the ground, but also depends on what the atmosphere does to the radiation. A lot of that radiation gets blocked by the atmosphere, in particular from the infrared on. There are a few windows at 10 micrometers, and at 14 micrometers wavelength, where the energy gets blocked less and we can obtain infrared radiation. In the visible and near infrared the atmosphere lets this radiation through unless, of course, the atmosphere contains a lot of water in form of clouds, rain or snow: that will block the visible light just as well as it blocks a lot of the longer wavelength. The blocking of the light in the atmosphere is also a measure of the quality of the atmosphere. In imaging the Earth’s surface, the atmosphere is a “nuisance”. It reduces the ability to observe the Earth’s surface. However, the extent to which we have blockage by the atmosphere tells us something about pollution, moisture etc. So something that can be a nuisance to one application can also be useful in another. We are really talking here about the ideas that are at a base of a field called remote sensing. A typical image of the Earth’s surface shown in Slide 2.42. In a color photograph has no problem from the atmosphere, we have the energy from the sun illuminating the ground, we have the red/green/blue colors of a film image, it can be scanned and put into the computer, and the computer can use the colors to assess what is on the ground. Prüfungsfragen: • Skizzieren Sie das Funktionsprinzip eines multispektralen Abtastsystemes“ (Multispectral ” Scanner). Sie sind eingeladen, in der Beantwortung eine grafische Skizze zu verwenden. 2.8 Sensors to Image the Inside of Humans Sensors cover a very wide field and imaging is a subset of sensing (think also of acoustics, temperature, salinity and things like that). Very well known are so called CAT scans (computer aided tomography). That was invented in 1973 and in 1975 the inventors received the Nobel prize, two scientists from England (Houndsfield&Cormack). It was the fastest recognition of a breakthrough ever. It revolutionized medicine because it allowed medical people to look at the inside 2.9. PANORAMIC IMAGING 59 of humans at a resolution and accuracy that was previously unavailable without having to open up that human. Slide 2.44 illustrates the idea of the CAT scan that represents the transmissivity of little cubes of tissue inside the human. While a pixel is represented in two dimensions, here each gray value represents how much radiation was transmitted through a volume element. So therefore those gray values do not associate well with a 2D pixel but with a 3D voxel or volume element. A typical CAT image that may appear in 2D really reflects in x and y a 1 mm × 1 mm base, but in z it may reflect a 4 mm depth. Prüfungsfragen: • Erklären Sie, wie man mit Hilfe der Computertomografie ein dreidimensionales Volumenmodell vom Inneren des menschlichen Körpers gewinnt. 2.9 Panoramic Imaging We talked in Chapter 0 about the increasingly popular panoramic images. They used to be produced by spy satellites, spy airplanes, spacecraft of other planets or of the Earth. The reason why we are interestest in these images is that we would like to have a high geometric resolution and a very wide swath, thus a wide field of view at high resolution. Those two things are in conflict. A wide angle lens gives an overview image or one has to have a tele-lens to give a very detailed image, but only of a small element of the object. How can we have both a very high resolution of a tele-lens and still have a coverage from a wide angle lens? That is obtained by moving the tele lens, by sweeping it to produce a panoramic image (compare the material from Chapter 0). Prüfungsfragen: • Welche Vor- und Nachteile haben nicht-perspektive (optische, also etwa Zeilen-, Wärmeoder Panorama-) Kameras gegenüber herkömmlichen (perspektiven) Kameras? 2.10 Making Images Independent of Sunlight and in Any Weather: Radar Images Slide 2.49 is an image taken from a European Space Agency (ESA) satellite called ERS-1, of an area in Tirol’s Ötztal. There exists a second image so that the two together permit us to see a three dimensional model in stereo. We will talk about this topic of stereo later. How is a radar image being produced? Let’s assume we deal with an aircraft sensor. Because we are making images with radiation that is way beyond the infrared, namely in the microwaves (we have one millimeter to two meter wavelengths, but typically 3 to 4 to 5 cm wavelengths). We can not use glass lenses to focus that radiation. We need to use something else, namely antennas. So a wave gets generated, it’s traveling through a waveguide to an antenna. The antenna transmits the small burst of energy, a pulse. That travels through the atmosphere to the ground. It illuminates the area on the ground with a footprint that is a function of the shape of the antenna. The ground reflects it back, the antenna goes into the listening mode and “hears” is the echo. The echo is coming from the nearby objects first, from the far away objects latest. This gets amplified, gets A/D converted, gets sampled and produces a row of pixels, in this case radar image pixels. The aircraft moves forward, the same repeats itself 3000 times at second. One 60 CHAPTER 2. SENSING obtains a continuous image of the ground. Since we illuminate the ground by means of the sensor, we can image day-and-night. Since we use microwaves, we can image through clouds, snow and rain (all weather). Prüfungsfragen: • Beschreiben Sie das Prinzip der Bilderfassung mittels Radar! Welche Vor- und Nachteile bietet dieses Verfahren? • Mit Hilfe von Radarwellen kann man von Flugzeugen und Satelliten aus digitale Bilder erzeugen, aus welchen ein topografisches Modell des Geländes (ein Höhenmodell) aus einer einzigen Bildaufnahme erstellt werden kann. Beschreiben Sie jene physikalischen Effekte der elektromagnetischen Strahlung, die für diese Zwecke genutzt werden! 2.11 Making Images with Sound There is a very common technique to map the floor of the oceans. There exists really only one technique right now that is widely applicable. Under-Water SONAR. SONAR means sound, navigation and range. It is a total analogy to radar except that we don’t use antennas and electromagnetic energy but we use membranes that vibrate instead of sound impulses and we need water for sound to travel. The sound pulse travels through the water, hits the ground, gets reflected, the membrane goes into a listening mode for the echos. These get processed and create one line of pixels. As the ship moves forward, line by line gets accrued into a continuous image. The medical ultrasound technology is similar to under-water imaging, but there are various different approaches. Some methods of sound imaging employ the Doppler-effect. We will not discuss medical ultrasound in this class, but defer to later classes in the “image processing track”. Prüfungsfragen: • Nennen Sie Anwendungen von Schallwellen in der digitalen Bildgebung! 2.12 Passive Radiometry We mentioned earlier that the Earth is active, is transmitting radio-waves without being illuminated by the sun. This can be measured by passive radiometry. We have an antenna, not a lens. It “listens” to the ground. The antenna receives energy which comes from a small circular area on the ground. That radiation is collected by the antenna, is processed and creates an image point. By moving the antenna we can move that point on the ground and thereby have a scanning motion producing an image scan that gets converted into a row of pixels. By moving the aircraft forward we accumulate rows of pixels for a continuous image. Passive radiometry is the basis of weather observations from space where large areas are being observed, for example the arctic regions. Prüfungsfragen: • Was versteht man unter passiver Radiometrie“? ” 2.13. MICROSCOPES AND ENDOSCOPES IMAGING 2.13 61 Microscopes and Endoscopes Imaging The most popular microscopes for digital imaging are so called scanning electron-microscopes (SEM) or X-ray-microscopes. Endoscopes are optical devices using light to look “inside things”. Most users are in medicine to look into humans. There is a lens-system and light to illuminate the inside of the human. The lens collects the light, brings it back out, goes in the computer and on the monitor the medical staff can see the inside of the human, the inside of the heart, the inside of arteries and so forth. The endoscopes are often times taking on the shape of thick “needles” that be inserted into a human. The same approach is used in mechanical engineering to inspect the inside of engines, for example to find out what happens while an explosion takes place inside a cylinder chamber in an engine. Prüfungsfragen: • Beschreiben Sie mindestens zwei Verfahren oder Geräte, die in der Medizin zur Gewinnung digitaler Rasterbilder verwendet werden! 2.14 Objects-Scanners The task is to model a 3D object, a head, a face, an engine, a chair. We would like to have a representation of that object in the computer. This could already be a result of a complete image processing system, of which the sensor is only a component, as is suggested in Slide 2.58. The sensor produces a 3D model from images of the entire object. This could be done in various ways. One way is to do it by a linear array camera that is being moved over the object and obtains a strip-image. This is set up properly in the scanner, to produce a surface patch. Multiple patches must be assembled. This is done automatically by making various sweeps of the camera over the object as it gets rotated. We can also have a person sit down on a rotating chair and a device will optically (by means of an infrared laser) scan the head and produce a 3D replica of the head. Or the object is fixed and the IR-laser is rotating. The next technique would be to scan an object by projecting a light pattern on to the surface. That is called structured light 6 . Finally we can scan an object by having something touch it with a touch-sensitive pointer and the pointer is under a force that keeps the tip of the pointer on the object as it moves; another approach is to have a pointer move along the surface and track the pointer by one of may Tracking Technologies (optical, magnetic, sound, see also Augmented Reality later on). Prüfungsfragen: • Welchem Zweck dient ein sogenannter Objektscanner“? Nennen Sie drei verschiedene Ver” fahren, nach denen ein Objektscanner berührungslos arbeiten kann! 2.15 Photometry We are now already at the borderline between sensors and image processing/image analysis. In photometry we do not only talk about sensors. However, photometry is a particular type of sensor 6 in German: Lichtschnitte 62 CHAPTER 2. SENSING arrangement. We image a 3D object with one camera taking multiple images like in a time series, but each image is taken with a different illumination. So we may have four or ten lamps at different positions. We take one image with lamp 1, a second image with lamp 2, a third image with lamp 4 etc. We collect these multiple images thereby producing a multi illumination image dataset. The shape reconstruction is based on a model of the surface reflection properties. Reviewing those properties, the radiometry of the image produces the object shape. 2.16 Data Garments Developments attributed to computer graphics concern so-called data-garments. We need to sense not only properties of the objects of interest, but also need to sense where an observer is because we may want to present him or her with a view of an object in the computer from specific places and directions. The computer must know in these cases where we are. This is achieved with data-gloves and head-mounted displays (HMD). For tracking the display’s pose, we may have magnetic tracking devices to track where our head is, in which direction we are looking. There is also optical tracking which is more accurate and less sensitive to electric noise, there may be acoustic tracking of the position and attitude of the head using ultrasound. Prüfungsfragen: • Was versteht man unter data garmets“ (Datenkleidung)? Nennen Sie mindestens zwei ” Geräte dieser Kategorie! 2.17 Sensors for Augmented Reality In order to understand what the sensor needs for augmented reality, we need first to understand what augmented reality is. Let us take a simple view. Augmented reality is a simultaneous visual perception by a human being of the real environment, of course by looking at it, and superimposing onto that real environment virtual objects and visual data that are not physically present in the real environment. How do we do this? We provide the human with transparent glasses which double as computer monitors. So we use one monitor for the left eye, another monitor for the right eye. The monitors show a computer generated image, but they are transparent (or better semitransparent). We not only see what is on the monitor, we also see the real world. The technology is called head mounted displays or HMDs. Now, for an HMD to make any sense, the computer needs to know where the eyes are and in what direction they are looking. Therefore we need to combine this HMD with a way of detecting the exterior orientation or pose. That is usually accomplished by means of magnetic positioning. Magnetic positioning, however, is fairly inaccurate and heavily affected by magnetic fields that might exist in a facility with computers. Therefore we tend to augment magnetic positioning by optical positioning as suggested in Slide 2.63. A camera is looking at the world, mounted rigidly with the HMDs. Grabbing an image, one derives from the image where the camera is and in which direction it is pointed and one also detects where the eyes are and in which direction they are looking. Now we have the basis for the computer to feed into the glasses the proper object in the proper position and attitude so that the objects are where they should be. As we review augmented reality, we immediately can see an option of viewing the real world via the cameras and feeding the eyes not with the direct view of reality, but indirectly with the camera’s views. This reduces the calibration effort in optical tracking. Prüfungsfragen: 2.18. OUTLOOK 63 • Erklären Sie das Funktionsprinzip zweier in der Augmented Reality häufig verwendeter Trackingverfahren und erläutern Sie deren Vor- und Nachteile! Antwort: Tracking magnetisch optisch 2.18 Vorteile robust schnell genau Nachteile kurze Reichweite ungenau Anforderung an Umgebung aufwändig Outlook The topic of imaging sensors is wide. Naturally we have to skip a number of items. However, some of these topics will be visited in other classes for those interested in image processing or computer graphics. They also appear in other courses of our school. Two examples might illustrate this matter. The first is Interferometry, a sensing technology combined with a processing technology that allows one to make very accurate reconstructions of 3D shapes by making two images and measuring the phase of the radiation that gave rise to each pixel. We will deal with this off and on throughout “image processing”. Second, there is the large area of medical imaging, with a dedicated course. This is a rapidly growing area where today there are ultrafast CAT scanners producing thousands of images of a patient in a very short time. It becomes a real challenge for the doctor to take advantage of these images and reconstruct what the objects are of which those images are taken. This very clearly needs a sophisticated level of image processing and computer graphics to help human analysts with an understanding what’s in the images and to reconstruct the relevant objects in 3D. A clear separation of the field into Image Processing/Computer Vision and Computer Graphics/Visualization is not really useful and feasible. 64 CHAPTER 2. SENSING 2.18. OUTLOOK 65 Slide 2.1 Slide 2.2 Slide 2.3 Slide 2.4 Slide 2.5 Slide 2.6 Slide 2.7 Slide 2.8 Slide 2.9 Slide 2.10 Slide 2.11 Slide 2.12 Slide 2.13 Slide 2.14 Slide 2.15 Slide 2.16 Slide 2.17 Slide 2.18 Slide 2.19 Slide 2.20 Slide 2.21 Slide 2.22 Slide 2.23 Slide 2.24 Slide 2.25 Slide 2.26 Slide 2.27 Slide 2.28 66 CHAPTER 2. SENSING Slide 2.29 Slide 2.30 Slide 2.31 Slide 2.32 Slide 2.33 Slide 2.34 Slide 2.35 Slide 2.36 Slide 2.37 Slide 2.38 Slide 2.39 Slide 2.40 Slide 2.41 Slide 2.42 Slide 2.43 Slide 2.44 Slide 2.45 Slide 2.46 Slide 2.47 Slide 2.48 Slide 2.49 Slide 2.50 Slide 2.51 Slide 2.52 Slide 2.53 Slide 2.54 Slide 2.55 Slide 2.56 2.18. OUTLOOK 67 Slide 2.57 Slide 2.58 Slide 2.59 Slide 2.60 Slide 2.61 Slide 2.62 Slide 2.63 Slide 2.64 68 CHAPTER 2. SENSING Chapter 3 Raster-Vector-Raster Convergence Algorithm 7 Digital differential analyzer 1: 2: 3: 4: 5: 6: 7: 8: dy = y2 − y1 dx = x2 − x1 m = dy/dx y = y1 for x = x1 to x2 do draw (x, round(y)) y =y+m end for 3.1 {Step y by slope m} Drawing a straight line We introduce the well-known Bresenham Algorithm from 1965. The task is to draw a straight line on a computer monitor and to replace a vector representation of a straight line that goes from the beginning point to an end point by a raster representation in the form of pixels. Obviously, as we zoom in on a straight line, that is shown on a computer monitor we do notice that we are really looking at an irregular edge of an area that is representing the straight line. The closer we look, the more we see that the edge of that straight line is not straight at all. Conceptually, we need to find those pixels in a raster representation that will represent the straight line, as shown in Slide 3.4. The simplest method of assigning pixels to the straight line is the so-called DDA Algorithm (Digital Differential Analyzer). Conceptually we intersect the straight line with the columns that pass through the center of the patterns of pixels. The intersection coordinates are (xi , yi ) and at the next column of pixels they are (xi + 1, yi + n). The DDA algorithm (see Algorithm 7) uses rounding operations to find the nearest pixel simple by rounding the y-coordinates. Slide ?? illustrates graphically the operations of the DDA algorithm, Slide 3.6 is a conventional procedure doing what was just described graphically. Obviously the straight line’s beginning point is defined by (x0 , y0 ), the end point is defined by (xl , yl ) for simplicity’s sake we say that x is an integer value, then we define auxiliary values, dx, dy, y and m as real numbers and we go then through a loop column by column of pixels doing rounding operations to find those pixels that will represent the straight line. The DDA Algorithm is slow because it uses rounding operations. In 1965 Bresenham proposed his algorithm that was exceedingly fast and outperformed the DDA Algorithm by far and Pittoray in 1967 proposed the Midpoint-Line-Algorithm (see Algorithm ??). These algorithms avoid the rounding operations and simply operate with decision variables only. For a long time, the vector 69 70 CHAPTER 3. RASTER-VECTOR-RASTER CONVERGENCE to raster conversion implemented by Bresenham and Pittoray was only applicable to straight lines. It was as late 1985 that this ideology of very fast conversion of vector to raster was extended to involve circles and ellipses. In this class we will not go beyond straight lines. The next six illustrations address the Bresenham Algorithm. We begin by defining the sequence of pixels that are being visited by the algorithm as East, and North-East of the previous pixel and we find an auxiliary position m which is halfway between the North-East and the East pixel. The actual intersection of the straight line with a line through the column of pixels to be visited is denoted by Q. Essentially Bresenham now says: ”Given that we know the previous pixel we must make a decision whether we should assign to the straight line the pixel N E or the pixel E. You can of course immediately see that the approach here is applicable to straight lines that progress between the angles of 0 and 45 degrees. However, for directions between 45 and 90 degrees and so forth the same ideas apply and with minimum modifications. Slide 3.10 and Slide 3.11 actually describe the procedure used for the Midpoint Line Algorithm with the beginning point (x0 , y0 ) and end point (x1 , y1 ) and will come back from the procedure with the set of raster pixels that describe the straight line. Again we have to have a dx and a dy with increments E and increments N E, we have an auxiliary variable b and we have variables x and y. Now the algorithm itself is self-explanatory, we really do not need much text to explain it. The reader is invited to work through the algorithm. The next two Slide 3.12 and Slide 3.13 explain the basic idea behind the midpoint line algorithm. Note that we have introduced an auxiliary point M into the approach and the coordinates of that point are (xp + 1, yp + 1/2). The equation of a straight line clearly is axm + bym + c = 0, a point that is not on the straight line will produce with the equation a value of either more or less than zero. Values larger than zero would be above the straight line, values less than zero with a negative signal below the straight line. Now we can write the equation of a straight line also as y= dy x + b. dx This can be rearranged as shown in Slide 3.13, we can ultimately write down that a variable d that can be larger than zero, equal to zero or less than zero equals 1 d = dy(xp + 1) − dx(yp + ) + c. 2 If d is larger than zero, then the pixel of interest is N E, otherwise the pixel of interest is E. If E is selected as the next pixel, then we have to compute a new value for d, a dnew , by putting into the equation of a straight line the coordinate of a new midpoint M which then we would have to call (xp + 2, yp + 0.5), which, if we look at is, is really nothing else but the old value of d + dy. But if we select N E as the next pixel, then our midpoint has the coordinates (xp + 2, yp + 1.5), which is nothing else but the old value of d + dy − dx. Once we realize that, we see that the equation of a straight line comes out for a value of the midpoint M as a + b/2 and if we do that and we don’t want to divide anything by two, we simply multiply everything by a factor of 2 and we end up by saying 2d = 2dy − dx. So Bresenham’s trick was to avoid multiplications and divisions, and simply make decision whether things are larger or smaller than zero and by finding a value that is larger than zero add to that value one number if it is less than zero, add another number and work one’s way along the straight line from pixel to pixel. So this was a pretty creative algorithm to be fast. There is a problem. The line that is horizontal has a sequence of pixels that are basically a pixel 3.2. FILLING OF POLYGONS 71 diameter of ?. See in Slide 3.15 that line a would be a dark line. However if we incline that line by 45 degrees, then the pixels we find to be assigned to that line have a distance that is the diameter of a pixel times the square root of 2. Therefore we have across the entire length of straight line fewer pixels, the same line would be less dark. We will address this and related subjects later in section 3.3. Prüfungsfragen: • Beschreiben Sie in Worten die wesentliche Verbesserungsidee im Bresenham-Algorithmus gegenüber dem DDA-Algorithmus. • Zeichnen Sie in Abbildung B.9 jene Pixel ein, die vom Bresenham-Algorithmus erzeugt werden, wenn die beiden markierten Pixel durch eine (angenäherte) Gerade verbunden werden. Geben Sie außerdem die Rechenschritte an, die zu den von Ihnen gewählten Pixeln führen. • Das Quadrat Q in normalisierten Bildschirmkoordinaten aus Beispiel B.2 wird in ein Rechteck R mit den Abmessungen 10 × 8 in Bildschirmkoordinaten transformiert. Zeichnen Sie die Verbindung der zwei Punkte p01 und p02 in Abbildung B.20 ein und bestimmen Sie grafisch jene Pixel, die der Bresenham-Algorithmus wählen würde, um die Verbindung diskret zu approximieren! 3.2 Filling of Polygons Another issue when converting from the vector world to the raster world is dealing with areas that have boundaries in the form of polygons. Such polygons could be convex, concave, they could intersect themselves, they could have islands. It is very quickly a non-trivial problem to take a polygon from the vector world, create from it a raster representation and fill the area inside the polygons. Slide 3.17 illustrates the issue. Instead of finding pixels along the polygon we simply have the task of finding pixels that are inside the polygon represented by a sequence of vectors. We define a scan line as a row of pixels going from left to right . The illustrations in Slide 3.17 illustrate that the first pixel would be assigned when along the scan line we intersect the first vector, and every time we find along the scan line an intersection with a vector from the polygon, we change from assigning pixels to not-assigning pixels and vice-versa. A second approach shown in Slide 3.18 is the idea of using the Bresenham algorithm to rastorize all the vectors defining the polygon and then, after that, go along the scan lines and take the pairs of pixels from the Bresenham algorithm and fill intermediate spaces with additional pixels. As we can see in this example, that approach may produce pixels that have a center outside of the actual polygon. There is yet another algorithm that we could use that takes the polygonal points at the inside of the polygon. That is different from the previous application of the Bresenham algorithm. Slide 3.21 illustrates for the first time a concept, which we will address in a moment, and that is if we have a very narrow polygon, a triangle, we might get a very irregular pattern of pixels, and when we look at this kind of pattern, we notice that we have a severe case of aliasing. Aliasing is a topic of interest in computer graphics. Prüfungsfragen: • Gegeben sei ein Polygon durch die Liste seiner Eckpunkte. Wie kann das Polygon ausgefüllt (also mitsamt seinem Inneren) auf einem Rasterbildschirm dargestellt werden? Welche Probleme treten auf, wenn das Polygon sehr spitze“ Ecken hat (d.h. Innenwinkel nahe bei Null)? ” 72 CHAPTER 3. RASTER-VECTOR-RASTER CONVERGENCE 3.3 Thick lines A separate subject is the various ways one can use to plot thick lines, not simply applying a Bresenham algorithm to a mathematically infinitely line, but to say a fat line. One way of doing that is to apply a Bresenham algorithm and then replicate the pixels along the columns and saying that when found a pixel according to Bresenham I now make five pixels out of that. We do that, then the thickness of the line becomes a function of the slope of the straight line. A second way of plotting a thick line is by taking the Bresenham pixels and think of applying at each location of the pixel a rectangular pen. That is, as in the example of Slide 3.23 a pensize of 5 × 5, each 25 pixels (see Algorithm 8. Algorithm 8 Thick lines using a rectangular pen 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: procedure drawThickLine2(x1,y1,x2,y2,thickness,color); var x,i:integer; p1x,p1y,p2x,p2y:integer; dx,dy,y,m:real; Begin dy:=y2-y1; dx:=x2-x1; m:=dy/dx; y:=y1; for x:=x1 to x2 do p1x:=x-(thickness div 2); {upper left point} p1y:=Round(y)+(thickness div 2); p2x:=x+(thickness div 2); {lower right point} p2y:=Round(y)-(thickness div 2); drawFilledRectangle(p1x,p1y,p2x,p2y,color); {rectangle with p1 and p2} y:=y+m; end for; end; {drawThickLine2} {Note: drawFilledRectangle draws a rectangle given by the upper left and the lower right point. If you want to use a circular pen simply replace the rectangle with drawFilledCircle(x,y,(thickness div 2),color). Syntax: drawFilledCircle(mx,my,radius,color)} The difficulty of fat lines becomes evident if we have circles. Let us assume in Slide 3.25 that we have pixel replication as the method, we use Bresenham to assign pixels to the circle and then we add one pixel at top and one pixel below at each pixel. What we can very quickly see is that the thickness of the line describing the circle is good at zero and ninety degrees, but is narrower at 45 degrees where the same thickness, which was t at 0 and 90 degrees reduces to t divided by square root of two. This problem goes away if we think of using a moving pen with 3 × 3 pixels. In that case the variation in pixels goes away. Yet another approach will be, that if we apply a vector-to-raster-conversion algorithm, to two contours by changing the radius of the circle and then we fill the area described by the two contours with pixels again, we see that we do avoid the change in thickness of the lines. Prüfungsfragen: • Nennen Sie verschiedene Techniken, um dicke“ Linien (z.B. Geradenstücke oder Kreisbögen) ” zu zeichnen. 3.4. THE TRANSITION FROM THICK LINES TO SKELETONS 73 Definition 5 Skeleton The skeleton of a region R contains all points p which have more than one nearest neighbour on the border-line of R. The points p are the centers of these discs which intersect the border-line b in two or more points. The detection of skeletons is useful for shape recognition and runs in O(n2 ) for concave polygons and O(n log n) for convex polygons. 3.4 The Transition from Thick Lines to Skeletons The best known algorithm to make a transition from a thick line or an area to a representation by the area’s skeleton is by Bloom from the year 1967. We define a region R, and its borderline B. The basic idea of the medial axis transform (see Definition 3.4) is to take a region as shown in Slide 3.30 and replace this region by those pixels (string of individual pixels) which have more than one single nearest neighbor along the boundary of the area. When we look at the area at example (a) in the slide, we can very quickly recognize that every point along the dashed lines has two nearest points along the border, either on the left and right border or on the left and top border etc. When we create a pattern like that and we have a disturbance as we see in image (b) of that slide, we see that we get immediately a stop from the center line leading towards the disturbances. Example (c) shows how this basic idea of finding pixels who have two nearest neighbors along the borderline will create a pattern when the area itself is not rectangular, but has an L-shape. Slide 3.31 summarizes in words, how we go from a region to a boundary line b, and from the boundary line we go to pixels p which have more than a single nearest neighbor on the boundary line b. As a result the pixels p form the so-called medial axis of region R. This basic matter of finding the medial axis is expensive, because the distances need to be computed among all the pixels within the region R and all the pixels on the boundary line B. A lot of sorting would go on. For this reason, Bloom considered a different approach. He said, the transition from the region to the skeleton, or the medial axis, is better achieved by means of rethinning algorithm. Therefore we go from the edge of a region and we delete contour pixels. What is a pixel on the contour? A pixel on the contour is part of the region R and has a value of 1 in a binary representation, and it has at least one zero among its eight neighbors, which is therefore a pixel that does not belong to region R. Slide 3.32 explains the basic idea of a thinning algorithm. We have a pixel p1 and its eight neighbors p2 through p9 . We can now associate with a pixel p1 a number of non-zero neighbors by simply adding up the gray values of the eight neighborhoods. We compute a second auxiliary number S of p1 , which is the number of transitions from zero to one in the ordered set of values of pixels p2 to p8 . The decision whether a pixel p1 gets deleted or not depends on the outcome of four computations. We compute ( also shown in Slide 3.34 ). Pixel p is deleted if: 2 <= N (p1 ) <= 6 S(p1 ) = 1 p2 ∗ p4 ∗ p6 = 0 p4 ∗ p6 ∗ p8 = 0 Pixel p is also deleted if: 1. and 2. as above p2 ∗ p4 ∗ p8 = 0 p2 ∗ p6 ∗ p8 = 0 The workings of the algorithm are further illustrated in Slide 3.35 for a particular pattern of pixels. We compute in p1 and S(p1 ) to document the interpretation of the computation of N and S. Slide 3.36 illustrates the workings of the iterative algorithm. By going from a letter “H” to its skeleton, 74 CHAPTER 3. RASTER-VECTOR-RASTER CONVERGENCE we can see that after the initial iteration through all pixels, which pixels have been deleted. After five iterations the result is obtained in slide Slide 3.36. We have now dealt with the issue of converting a given vector to a set of binary pixels and have denoted that as vector raster conversion, this is also denoted as scan conversion and it occurs in the representation of vector data in a raster monitor environment. What we have not yet talked about is the inverse issue, namely given is a raster and a pattern and we would like to get vectors from it. We have touched upon a raster pattern and replacing it by a medial axis or skeleton. But we have not yet really come out from that conversion with a set of vectors. Yet, the raster-vectorconversion is an important element in dealing with object recognition. A particular example has been hinted at in Slide 3.36 because it clearly represents an example from character recognition. The letter H in a binary raster image is described by many pixels. To recognize as a raster H, might be based on a conversion to a skeleton, a replacement of the skeleton by a set of vectors and then by submitting those vectors to a set of rules that would tell us which letter we are dealing with. Prüfungsfragen: • Wenden Sie die medial axis“ Transformation von Bloom auf das Objekt in Abbildung B.39 ” links an! Sie können das Ergebnis direkt in Abbildung B.39 rechts eintragen. 3.4. THE TRANSITION FROM THICK LINES TO SKELETONS 75 76 CHAPTER 3. RASTER-VECTOR-RASTER CONVERGENCE Slide 3.1 Slide 3.2 Slide 3.3 Slide 3.4 Slide 3.5 Slide 3.6 Slide 3.7 Slide 3.8 Slide 3.9 Slide 3.10 Slide 3.11 Slide 3.12 Slide 3.13 Slide 3.14 Slide 3.15 Slide 3.16 Slide 3.17 Slide 3.18 Slide 3.19 Slide 3.20 Slide 3.21 Slide 3.22 Slide 3.23 Slide 3.24 Slide 3.25 Slide 3.26 Slide 3.27 Slide 3.28 3.4. THE TRANSITION FROM THICK LINES TO SKELETONS 77 Slide 3.29 Slide 3.30 Slide 3.31 Slide 3.32 Slide 3.33 Slide 3.34 Slide 3.35 Slide 3.36 Slide 3.37 Slide 3.38 78 CHAPTER 3. RASTER-VECTOR-RASTER CONVERGENCE Chapter 4 Morphology Prüfungsfragen: • Gegeben sei die in Abbildung B.56 dargestellte Pixelanordnung. Beschreiben Sie grafisch, mittels Formel oder in Worten einen Algorithmus zur Bestimmung des Schwerpunktes dieser Pixelanordnung. 4.1 What is Morphology This is an interesting subject. It is not very difficult yet also not to be underestimated. We talk about shape and the structure of objects in images. It’s a topic that has to do with binary image processing. Recall that binary images have pixels that are only either black or white. Objects typically are described by a group of black pixels and the background consists of all white pixels. So one has a two-dimensional space of integer numbers to which we apply set theory in morphology. Let us take an object - we call it A - and that object is hinged at a location designated in Slide 4.5 by a little round symbol. Morphology now says that A is a set of pixels in this two-dimensional space. A separate object B is also defined by a set of pixels. We now translate A by distance x and obtain a new set called Ax. The translation is described by two numbers, x1 and x2 , for the two dimensions of the translation. We can write the expression in Slide 4.6 to define the result A after the translation: Ax consists of all pixels c, so that c is equal to a + x, where a are all the pixels from pixel set A. Geometrically and graphically we can illustrate the translation very simply by the two distances x1 and x2 of Slide 4.7. Instead of A we have (A)x. A very simple concept for humans becomes a somewhat complex equation in the computer. Morphology also talks about “reflection”. We have a pixel set B and reflect it into a set B̂, which is the set of all pixels x such that x is −b, where b is each pixel from pixel set B. The interpretation of −b is needed. Geometrically, B̂ is the mirror reflection of B, and we have mirrored B over the hinge point (point of reflection). The next concept we look at is “complementing” a set A into a set AC . AC is the set of all pixels x so that x are just all those pixels that do not belong to set A. An object to be composed of all the pixels inside a contour is called A, and AC is the background. Next we can take two or objects A, B and we build a difference A − B. The difference is the set of all pixels x, such that x belongs to set A but not to set B. We can describe this by a new symbol and say this is the interscetion of two sets, namely of set A and the complement B C of B. 79 80 CHAPTER 4. MORPHOLOGY Definition 6 Difference Given two objects A and B as sets of pixels (points of the 2D-Integer-space). The difference of the two sets A and B is defined as A − B = {x|x A, x not B} = A intersects B C . Slide 4.14 shows A, B and A − B is now A reduced by the area of B covering part of A. Prüfungsfragen: • Was ist Morphologie“? ” Antwort: die Anwendung nichtlinearer Operatoren auf die Form eines Objekts 4.2 Dilation and Erosion “Dilation” means that we make something bigger (in German: Blähung). The symbol we use to describe the dilation of a set A using a “structure element” B is shown in Slide ??. A dilated by B is the collection of all the pixels x that belong to the reflected and translated structure element B, and belong to A, provided they are not zero, or not empty. This sounds pretty difficult, but when we look at it geometrically it is very simple. A be a square with a side length d and B is another square of a diameter d/4. If we reflect B around a reflection point that is in the center of the square then the reflection is the same as the original structure element. Thus we reflect (with no effect) and shift B by a distance to pixel x. As we go to each pixel x of set A, we place the (reflected) structure element there - we translate B̂ to the location x - and we now have the union of the pixels in set A and the structure element B̂. We add up all pixels that are in the union of A and B̂. What we do is to add a little fringe around area A that is obtained by moving the pixels of set B over A and through all pixels of A. B̂ will extend along the fringe of A, so we make A a little larger. If our structure element is not a square but a rectangle of dimension d in one direction and d/4 in the other, then we obtain an enlargement of our area A that is significant in one direction and less significant in the other direction. Algorithm 9 Dilation 1: 2: 3: 4: 5: 6: 7: 8: 9: for all x do Y = Translate(Reflect(B), x) for all y element Y do if (y element A) AND (x not element X) then Insert(X, x) end if end for end for return X Dilation has a sister operation called “erosion” (Abmagerung). The erosion is thex x opposite of a dilation, and the symbol designating an erosion is a little circle with a minus in it, shown in Slide 4.18. The result of an erosion consists of all those pixels x that come from the structure element B placed at location x such that the shifted structure element completely lies within set A. How does this look like geometrically? 4.2. DILATION AND EROSION 81 Definition 7 Erosion X B = {d ∈ E 2 : Bd ⊆ X} B . . . binary erosion matrix Bd . . . B translated by d X . . . binary image matrix Outgoing from this equation we get to following equal expression: \ X−b X B = b∈B In Slide 4.19 we have subtracted from set A a fringe that has been deleted like an eraser of the size of B. Doing this with a non-square but rectangular structure element we receive a result that in the particular case of Slide 4.19 reduces set A to merely a linear element because there is only one row of pixels that satisfies the erosion condition using this type of structure element with dimensions d and d/4. There is a duality of erosion and dilation because we can express an erosion of set A by structure element B as a dilation taking the complement of AC and dilate the complement AC of A with a reflection B̂ of B. This is being demonstrated in Slide 4.21 where we go through the erosion definition of A by structure element B and say the complement of that eroded object A by structure element B equals the complement of the set of all pixels x, such that B gets placed over x and we count those pixels of A that are not on B. We go through our previous definitions and we can show in Slide 4.21 that we end up with a dilation of the complement AC of set A with the reflection B̂ of structure element B. Prüfungsfragen: • Erläutern Sie die morphologische Erosion“ unter Verwendung einer Skizze und eines Forme” lausdruckes. • Auf das in Abbildung B.65 links oben gezeigte Binärbild soll die morphologische Operation Erosion“ angewandt werden. Zeigen Sie, wie die Dualität zwischen Erosion und Dila” tion genutzt werden kann, um eine Erosion auf eine Dilation zurückzuführen. (In anderen Worten: statt der Erosion sollen andere morphologische Operationen eingesetzt werden, die in geeigneter Reihenfolge nacheinander ausgeführt das gleiche Ergebnis liefern wie eine Erosion.) Tragen Sie Ihr Ergebnis (und Ihre Zwischenergebnisse) in Abbildung B.65 ein und benennen Sie die mit den Zahlen 1, 2 und 3 gekennzeichneten Operationen! Das zu verwendende Formelement ist ebenfalls in Abbildung B.65 dargestellt. Hinweis: Beachten Sie, dass das gezeigte Binärbild nur einen kleinen Ausschnitt aus der Definitionsmenge Z2 zeigt! Antwort: Die morphologische Erosion kann durch eine Abfolge der folgenden Operationen ersetzt werden (siehe Abbildung 4.1): 1. Komplement 2. Dilation 3. Komplement 82 CHAPTER 4. MORPHOLOGY 1 2 3 Formelement Figure 4.1: Morphologische Erosion als Abfolge Komplement→Dilation→Komplement • Die Dualität von Erosion und Dilation betreffend Komplementarität und Reflexion lässt sich durch die Gleichung (A B)c = Ac ⊕ B̂ formulieren. Warum ist in dieser Gleichung die Reflexion (B̂) von Bedeutung? • Nehmen Sie an, Sie müssten auf ein Binärbild die morhpologischen Operationen Erosion“ ” bzw. Dilation“ anwenden, haben aber nur ein herkömmliches Bildbearbeitungspaket zur ” Verfügung, das diese Operationen nicht direkt unterstützt. Zeigen Sie, wie die Erosion bzw. Dilation durch eine Faltung mit anschließender Schwellwertbildung umschrieben werden kann! Hinweis: die gesuchte Faltungsoperation ist am ehesten mit einem Tiefpassfilter zu vergleichen. Antwort: Man betrachtet den gewünschten Kernel für die morphologischen Operationen als Filtermaske (mit 1“ für jedes gesetzte Pixel“ im Kernel, 0“ sonst) und faltet das ” ” ” Binärbild mit dieser Maske. Im Ergebnisbild stehen nun Werte g(x, y), wobei – g(x, y) ≥ 1, wenn mindestens ein Pixel der Maske mit dem Inputbild in Deckung war (Dilation), bzw. – g(x, y) ≥ K, wenn alle Pixel der Maske mit dem Inputbild in Deckung waren (Erosion), wobei K die Anzahl der gesetzten Maskenpixel ist. 4.3 Opening and Closing We have a more complex operation that is a sequence of previously defined operations. We call them “opening” and “closing”. Let’s take first the question of opening. We may have two objects, one to the left and the other one to the right and they are connected by a thin bridge, perhaps because of a mistake in sensing and preprocessing of the data. 4.3. OPENING AND CLOSING 83 We can separate those two objects by an operation called “opening”. Opening a set A by means of a structure element B is defined by a symbol shown in Slide 4.24 namely by an open little circle. This begins with the erosion of A using structure element B, and subsequently dilating again the result by the structure element B as well. So we first shrink, then we enlarge again. But in shrinking we get rid of certain things that are not there anymore when we enlarge. Slide 4.25 shows the circular structure Definition 8 Open A ◦ B = (A B) ⊕ B ◦. . .open , . . .erosion , ⊕. . .dilation ; B is a circular structure element element B and the original object A. As we now erode object A and obtain a shrunk situation, object A is certainly broken up into two eroded smaller objects. Now the bridge between the two points in the original set A is narrower than the size of the structure element, so the structure element will, like an eraser, erase that bridge. Now we want to go back to the original size. So we dilate with the structure element B again and what we obtain is now the separation of thinly connected objects. Slide 4.27 and Slide 4.28 are a summary of the opening operation. We proceed to the “closing” operation. Closing set A with the help of structure element B is defined by a little filled circle. We first dilate A by B and then we erode the result by structure element B. We do the opposite of opening. The process will remove little holes in things. One will not break up, but connect, one will fill in, remove noise. Definition 9 Closing A • B = (A ⊕ B) B Erosion: remove all structures smaller than the structure element B ⊕ Dilation: restore the original size excepting the removed structures Closing set A with structure element B means to first dilate A by B and afterwards erode the result by structure element B. Slide 4.30 Slide 4.31 Slide 4.32 Slide 4.33 feature a complex shape. The shape seems to break apart when it really should not. We take the original figure and dilate (make it larger). As it grows, this will reduce small details. The resulting object is less sophisticated, less detailed than we had before. Closing an object A using the structure element B can again be shown to be the dual with opening, concerning complementarity and reflection. Closing an object A with respect of structure element B and creating the complement of the result is the same as opening the complement of A with the mirror reflection of structure element B. Prüfungsfragen: • Erläutern Sie das morphologische Öffnen“ unter Verwendung einer Skizze und eines Formel” ausdruckes. 84 CHAPTER 4. MORPHOLOGY Formelement Figure 4.2: morphologisches Öffnen • Um den Effekt des morphologischen Öffnens (A ◦ B) zu verstärken, kann man1 die zugrundeliegenden Operationen (Erosion und Dilation) wiederholt ausführen. Welches der folgenden beiden Verfahren führt zum gewünschten Ergebnis: 1. Es wird zuerst die Erosion n-mal ausgeführt und anschließend n-mal die Dilation, also (((A B) . . . B) ⊕B) . . . ⊕ B | {z }| {z } n−mal n−mal ⊕ 2. Es wird die Erosion ausgeführt und anschließend die Dilation, und der Vorgang wird n-mal wiederholt, also (((A B) ⊕ B) . . . B) ⊕ B | {z } n−mal abwechselnd /⊕ Begründen Sie Ihre Antwort und erklären Sie, warum das andere Verfahren versagt! Antwort: (a) ist richtig, bei (b) bleibt das Objekt nach der ersten /⊕-Iteration unverändert. • Wenden Sie auf das Binärbild in Abbildung B.31 links die morphologische Operation Öff” nen“ mit dem angegebenen Formelement an! Welcher für das morphologische Öffnen typische Effekt tritt auch in diesem Beispiel auf? Weiße Pixel gelten als logisch 0“, graue Pixel als logisch 1“. Sie können das Ergebnis ” ” rechts in Abbildung B.31 eintragen. Antwort: siehe Abbildung 4.2, typischer Effekt: Trennung von Regionen, die durch eine schmale Brücke“ verbunden sind ” 4.4 Morphological Filter Definition 10 Morphological filter A morphological filter consits out of one or more morphologic operations such as dilation, erosion, open, close, hit and miss that are applied sequentially to an input image. A very simple application is morphological filtering. Say we have an object such as an ice floe on the ocean and we have several little things floating around it. We would like to recognize and map the large ice floe. 1 abgesehen von einer Vergrößerung des Maskenelements B 4.5. SHAPE RECOGNITION BY A HIT OR MISS OPERATOR 85 We would like to isolate this object, measure its surface, its contour, see where it is. In an automated process we need to remove all the trash around it. We need to fill the holes and get rid of the extraneous details on the open water. Morphological filtering is illustrated in Slide 4.38 and Slide 4.39. We find a structure element which has to be a little larger than the elements that we would like to remove. We first let that structure element run over the image and perform an erosion operation. When we erode with the structure element every object that is smaller than that structure element will disappear, but those holes will get bigger. We follow with dilation after the erosion. That combination is what we call the opening operation. We have removed all the small items outside the object, but the elements inside the object are still there. We now do the opposite operation, namely the closing. That means we take the opening result and do a dilation, which increases the size of the object in such a way that it will also close up all the holes, then we have to shrink it again. We have to do a dilation with our structure element B and that operation is “closing”. The sequence of opening, thinning a result, and closing, produces a clean object without extraneous detail. We have applied morphological filtering. Prüfungsfragen: • Abbildung B.55 zeigt ein rechteckiges Objekt und dazu einige kleinere Störobjekte. Erläutern Sie bitte ein Verfahren des morphologischen Filterns, welches die Störobjekte eliminiert. Verwenden Sie bitte dazu Formelausdrücke und zeigen Sie mit grafischen Skizzen den Verfahrensablauf. Stellen Sie auch das Ergebnisbild dar. • Erklären Sie anhand eines Beispiels den Vorgang des morphologischen Filterns! 4.5 Shape Recognition by a Hit or Miss Operator Morphology can recognize shapes in an image with the hit-or-miss operator. Assume we have three small objects X, Y and Z and we would like to find object X as shown in Slide 4.41 The union of X, Y , and Z is denoted as the auxiliary object A. Now we define a structure element W , and from that structure element a second structure element as the difference of W and shape X that we are looking for. That gives an interesting structure element which in this case looks like the frame of a window. We build the complement AC of A, which is the background without the objects X, Y , and Z. If we erode A with X then the object that is smaller than X gets wiped out, the object that is larger than X will be showing as an area which results from the erosion by object X. For X we obtain a single pixel in Slide 4.42. The automated process has produced pixels that are candidates for the object of interest, X. We need to know which pixel to choose. We go through this operation again, but use AC as the object and W − X as structure element. The erosion of AC by the structure element W − X produces the background with an enlarged hole for the 3 objects X, Y , and Z, and two auxiliary objects, namely the single pixel where our X is located and a pattern consisting of several pixels for the small objects in Slide 4.43. We intersect the two erosion results we had obtained, once eroding A with object X, the other with AC eroded by W − Z. The intersection produces a single pixel at the location of our object X. This is the so-called Hit-or-Miss-Method of finding an instance where object X exists. All other objects that are either bigger or smaller will disappear. The process and the formula are shown in Slide 4.46. Slide 4.46 summarizes the Hit-or-Miss Process that was illustrated in the previous paragraph. The process uses a symbol with a circle and a little asterisk in it. Again: A 86 CHAPTER 4. MORPHOLOGY Definition 11 Hit or Miss Operator A ⊗ W = (A W1 ) ∩ (AC W2 ) Morphology can recognize shapes in an image with the hit-or-miss operator. Assume we have three small objects X, Y and Z and we would like to find object X. The union of X, Y , and Z is denoted as the auxiliary object X. Now we define a structure element W , and from that structure element a second structure element as the difference of W and shape X that we are looking for. That gives an interesting structure element which in this case looks like the frame of a window. We build the complement AC of A, which is the background without the objects X, Y and Z. If we erode A with X then the object that is smaller than X gets wiped out, the object that is larger than X will be showing as an area which results from the erosion by object X. For X we obtain a single pixel. The automated process has produced pixels that are candidates for the object of interest, X. We need to know which pixel to choose. We go through this operation again, but use AC as the object and W − X as structure element. The erosion of AC by the structure element W − X produces the background with an enlarged hole for the 3 objects X, Y , and Z and two auxiliary objects, namely the single pixel where our X is located and a pattern consisting of several pixels for the small objects. We intersect the two erosion results we had obtained, once eroding A with object X, the other with AC eroded by W − Z. The intersection produces a single pixel at the location of our object X. This is the so-called Hit-or-Miss-Method of finding an instance where object X exists. All other objects that are either bigger or smaller will disappear. The process uses a symbol with a circle and a little asterisk in it. Again: A is eroded by X and the complement of A is eroded by W − X. The two results get intersected. We have two structure elements, X and W − X. is eroded by X and the complement of A is eroded by W − X. The two results get intersected. We have two structure elements, X and W − X. Slide 4.46 shows that the equation can be rewritten in various forms. Prüfungsfragen: • Wie ist der Hit-or-Miss“-Operator A ~ B definiert? Erläutern Sie seine Funktionsweise zur ” Erkennung von Strukturen in Binärbildern! Antwort: Es gilt A ~ B = (A B) ∩ AC (W − B) , wobei W ein Strukturelement größer als B ist. Bei Erosion von A mit B verschwinden alle Teile von A, die kleiner sind als B, ein Teil in der Form von B bleibt als isoliertes Pixel zurück. Bei Erosion von AC mit W − B werden alle Löcher von AC , die größer sind als B, aufgeweitet, während Teile der Form B wieder ein einzelnes Pixel ergeben. Der Mengendurchschnitt liefert also genau dort ein gesetztes Pixel, wo ein Teil von A mit B identisch ist. 4.6 Some Additional Morphological Algorithms Morphological algorithms that are commonly used deal with finding the contour of an object, findintranslationg the skeleton of an object, filling regions, cutting off branches from skeletons. The whole world of morphological algorithms is clearly applicable in character recognition, particularly in dealing with handwriting. It is always applied in those cases where the object of interest can be described in a binary image, where we do not need color nor gray values. Instead we simply have object or non-object. 4.6. SOME ADDITIONAL MORPHOLOGICAL ALGORITHMS 87 Given an object A in Slide 4.48 and Slide 4.49, we are looking for the contour of A as b(A). We use a structure element B to find the contour. The contour of region A is obtained by subtracting from A an eroded version of A. The erosion should just be by one pixel. Structure element B is a 3 × 3 window. Definition 12 Contour We present the formal definition of a contour. It is the digital counterpart of a boundary of an analog set. We are looking for the contour of A as b(A). b(A) = A − (A B) (4.1) We use a structure element B to find the contour. The contour of region A is obtained by subtracting from A an eroded version of A. The erosion should just be by one pixel. Structure element B is a 3x3 window which looks like : a11 a12 a13 .. .. B = ... (4.2) . . a31 a32 a33 The contour of a connected set of points R is defined as the points of R having at least one neighbor not in R. The contour is the outline or visible edge of a mass, form or object. Slide 4.49 shows the erosion of region A and the difference from region A to get to the contour pixels. Region filling is the opposite operation, starting from a binary representation of a contour. We want to fill the interior of the contour. This particular contour is continuous, non-interrupted, under an 8-neighborhood relationship (recall: up, down, left, right plus all oblique relationships). We build the complement AC of contour A. The structure element B is again a 3 × 3 matrix but only using the 4-neighbors. Region filling is an iterative process according to Slide 4.51. We get a running index k, which increases as we go through the iterations, create at each step an intermediate result that always looks back at the complement AC of A, using the structure element B and applying a dilation of the previous iteration by the structure element and the union with the complement AC of A, and repeat this step by step, until such time that we do not get any new pixels added. The issue is the starting point X0 , which is an arbitrary pixel inside the contour from which we start the process. A final illustration of the usefulness of morphology deals with the automated recognition of zip codes that are hand-written. Slide 4.52 and Slide 4.53 presents a hand-written address that is being imaged. Through some pre-processing that hand-writing has been converted into a binary image. The first step might be to threshold the gray-tone image to convert to a binary image. After having thresholded the address we need to find the area with the zip-codes. Let us address the task of extracting all connected components in the area that comprises the address field. From a segmentation into components, one finds rectangular boxes containing a connected object. One would assume to have now each digit separate from all the other digits. However, if two digits are connected like in this example with a digit 3 and a digit 7, then we misread this to be one single digit. We can help ourselves considering the shape of the rectangular box, plus using knowledge about how many digits one has in a zip-code. It is five basically in the United States, so one needs to have five digits and so one can look for joined characters by measuring the relative widths of the boxes that enclose the characters. We must expect certain dimensions of the box surrounding a digit. Opening and closing operations can separate digits that should be separate, or merge broken elements that should describe a single digit. Actual character recognition (OCR for “Optical Character Recognition”) then takes each binary image 88 CHAPTER 4. MORPHOLOGY window with one digit and seeks to find which value between 0 and 9 this could be. This can be based on a skeleton of each segment, and a count of the structure with nodes and arcs. We will address this topic later in this class. As a short outlook beyond morphology of binary images, let’s just state that there is a variation of morphology applied to gray value images. Gray-tone images can be filtered with morphology, and an example is presented in Slide 4.55. Prüfungsfragen: • Gegeben sei die in Abbildung ?? dargestellte Pixelanordnung. Beschreiben Sie grafisch und mittels Formel das Verfahren der morphologischen Ermittlung des Umrisses des dargestellten Objektes mit einem von Ihnen vorzuschlagenden Strukturelement. • Beschreiben Sie mit Hilfe morphologischer Operationen ein Verfahren zur Bestimmung des Randes eines Region. Wenden Sie dieses Verfahren auf die in Abbildung B.23 eingezeichnete Region an und geben Sie das von Ihnen verwendete 3 × 3-Formelement an. In Abbildung B.23 ist Platz für das Endergebnis sowie für Zwischenergebnisse. 4.6. SOME ADDITIONAL MORPHOLOGICAL ALGORITHMS 89 90 CHAPTER 4. MORPHOLOGY Slide 4.1 Slide 4.2 Slide 4.3 Slide 4.4 Slide 4.5 Slide 4.6 Slide 4.7 Slide 4.8 Slide 4.9 Slide 4.10 Slide 4.11 Slide 4.12 Slide 4.13 Slide 4.14 Slide 4.15 Slide 4.16 Slide 4.17 Slide 4.18 Slide 4.19 Slide 4.20 Slide 4.21 Slide 4.22 Slide 4.23 Slide 4.24 Slide 4.25 Slide 4.26 Slide 4.27 Slide 4.28 4.6. SOME ADDITIONAL MORPHOLOGICAL ALGORITHMS 91 Slide 4.29 Slide 4.30 Slide 4.31 Slide 4.32 Slide 4.33 Slide 4.34 Slide 4.35 Slide 4.36 Slide 4.37 Slide 4.38 Slide 4.39 Slide 4.40 Slide 4.41 Slide 4.42 Slide 4.43 Slide 4.44 Slide 4.45 Slide 4.46 Slide 4.47 Slide 4.48 Slide 4.49 Slide 4.50 Slide 4.51 Slide 4.52 Slide 4.53 Slide 4.54 Slide 4.55 92 CHAPTER 4. MORPHOLOGY Chapter 5 Color 5.1 Gray Value Images A precursor to color images is of course a black & white image. Some basic issues can be studied with black & white images before we proceed to color. A regular gray value image is shown in Slide 5.3. We need to characterize a gray value image by its densities, the way it may challenge our eyes, the manner in which it captures the physics of illumination and reflection, and how it is presented to the human viewer. We have discussed such concepts as the density of film, the intensity of the light that is reflected from objects, and the quality of an image in terms of its histogram. Intensity describes the energy, light or brightness. When an intensity value is zero, we are talking about darkness, no light. If the intensity is bright, then we should have a large value describing it. The opposite is true for film. A film with a density zero is completely transparent, whereas a film at a density 4 is totally opaque and will not let any light go through. A negative film that is totally transparent is representing an object that does not send any light through the optical system. A negative that is totally opaque had been brightly illuminated. The opposite is true for a positive film. The darker the positive film, the less light it represents. In Chapter 2 we already talked about the eye, but we did not address sensitivity of the eye to brightness, energy and light. A notable characteristic of the eye is that it is very sensitive to ratios. If we present to an eye two different brightnesses, let’s say a density of 0.11 and 0.10 then the eye might perceive this as if it were the same densities as 0.55 and 0.5, both being 10% different from one another. The sensitivity of the eye to differences ∆I of the intensity of light I is expressed by the Weber-Ratio ∆I/I. What is now the interval, as a ratio r, when presenting an image with n discrete gray values? Let us define n intensity steps I: In = rn I0 If we say intensity I is the maximum and intensity I0 is the minimum, then we will have to compute the value of r that allows to break up the interval I0 to I into n steps. Slide 5.5 illustrates the issue: r In r= n I0 If n = 3, then we have 4 different levels of intensity, namely 1/8, 1/4, 1/2 and 1. The eye needs an r value of 0.01 or the differences between two intensities will not be recognizable, conceptually presenting a capability of resolving 100 different gray values. 93 94 CHAPTER 5. COLOR A monitor presents an intensity I, that is a function of N , the number of electrons creating the intensity on the monitor. Slide 5.6 presents the relationship. Film has a density that relates linearly to the logarithm of the energy of light that falls onto the film. The dynamic range is the ratio of the highest and lowest intensity that a medium can represent. In the event of a monitor that value might be 200, in the event of film it might be 1000, in the event of paper it might be 100. Note that the dynamic range d is the power of base 10 that the medium can support, thus 10d. For film the ratio of brightest and darkest intensity is 1000 and therefore film typically has a density range d = 3, whereas paper lies at d < 2. Continuous tone photography cannot be printed directly. Instead one needs to create so-called half tone images by means of a raster pattern. These images make use of the spatial integration that human eyes perform. A half tone is a representation of a gray tone. The image is resolved into discrete points, each point is associated with an area on paper. At each point one places a small dot proportional in size to the density of the object. If it is bright the dots are small, and they are large dots if the object is dark. One denotes this also as screening of a gray tone image. Note that screening typically is arranged at an angle of 45o . In Slide 5.7 is a so-called half-tone image. Slide 5.8 makes the transition to the digital world. Gray tones can be obtained in a digital environment by substituting for each pixel a matrix of subpixels. If we have 2 × 2 subpixels we can represent five gray values as shown in Slide 5.9. Similarly, a 3 × 3 pattern will permit one to represent 10 different gray values. We call the matrix into which we subdivide pixels a dither matrix : a D2 - dither matrix means that 2 × 2 pixels are used to represent one digital gray value of a digital image. The basic principle is demonstrated in Algorithm 10. An example for the creation of a 3 × 3 dither matrix would be: 6 D= 1 5 8 0 2 4 3 7 An image gray value is checked against each element of the dither matrix and only those pixels are set, where the gray value is larger than the value in the dither matrix. For a gray value of 5 the given matrix D would produce the following pattern 0 P= 1 0 0 1 1 1 1 0 A dither matrix of n × n defines n2 + 1 different patterns. It should be created wisely in order not to define patterns that produce artefacts. For instance the following pattern (for a gray value of 3) would create horizontal lines if applied on larger areas. 5 D= 1 8 3 6 0 v=3 0 2 −→ P = 1 4 7 0 0 1 0 0 1 0 Prüfungsfragen: • Was versteht man unter dem dynamischen Bereich“ eines Mediums zur Wiedergabe bild” hafter Informationen, und im welchem Zusammenhang steht er mit der Qualität der Darstellung? Reihen Sie einige gebräuchliche Medien nach aufsteigender Größe ihres dynamischen Bereiches! 5.2. COLOR IMAGES 95 Algorithm 10 Halftone-Image (by means of a dither matrix) 1: 2: 3: 4: 5: 6: Häufigkeit 7: 8: 9: 10: 11: dm = createDitherMatrix(n, n) {create a Dither-Matrix n × n} for all pixels (x, y) of the image do v == getGrayValueOfPixel(x, y) for all elements (i, j) of dm do {checking the value against the matrix} if v > dm(i, j) then setPixel(OutputImage,x · n + i,y · n + j,black) {applying the pattern} else setPixel(OutputImage,x · n + i,y · n + j,white) end if end for end for Grauwert Figure 5.1: Histogramm von Abbildung B.29 • Gegeben sei ein Druckverfahren, welches einen Graupunkt mittels eines Pixelrasters darstellt, wie dies in Abbildung B.5 dargestellt wird. Wieviele Grauwerte können mit diesem Raster dargestellt werden? Welcher Grauwert wird in Abbildung B.5 dargestellt? • Skizzieren Sie das Histogramm des digitalen Grauwertbildes aus Abbildung B.29, und kommentieren Sie Ihre Skizze! Antwort: Das Histogramm ist bimodal, wobei die Spitze im Weiß-Bereich etwas flacher ist als im Schwarz-Bereich, da das Bild im hellen Bereich mehr Struktur aufweist als im dunklen Bereich (siehe Abbildung 5.1). 5.2 Color images Of course computer graphics and digital image processing are significantly defined by color. Color has been a mysterious phenomenon through the history of mankind and there are numerous models that explain color and how color works. Slide 5.12 does this with a triangle: the three corners of the triangle represent white, black and color, so that the arcs of the triangle represent values of gray, tints between white and pure color or shades between pure color and black. The concept of tones fills the area of the triangle. A color is being judged against existing color tables. A very widely used system is by Munsell. This is organized along 3 ordering schemes: hue (color), value (lightness) and saturation. These 3 entities can be the coordinate axes of a 3D space. We will visit the 3-dimensional idea later in subtopic 5.5. 96 CHAPTER 5. COLOR Color in image processing represents us with many interesting phenomena. The example in Slide 5.16 is a technical image, a so-called false color image. In this case film is being used that is not sensitive to blue, but is instead sensitive to green, red and infrared. In this particular film, the infrared light falling onto the emulsion will activate the red layer in the film. The red light will activate the green layer, the green light will activate the blue layer. As a result, an image will show infrared as red. Slide 5.16 is a vegetated area. We recognize that vegetation is reflecting a considerable amount of infrared light, much more so than red or green light. Healthy vegetation will look red, sick vegetation will reflect less infrared light and will therefore look whitish. Color images not only serve to represent the natural colors of our environment, or the electromagnetic radiation as we receive it with our eyes or by means of sensors, but color may also be used to visualize things that are totally invisible to humans. Slide 5.18 is an example of a terrain elevation in the form of color, looking at the entire world. Similarly, Slide 5.19 illustrates the rings of planet Saturn and uses color to highlight certain segments of those rings to draw the human observer’s attention. The colors can be used to mark or make more clearly visible to a human interpreter a physical phenomenon or particular data that one wants the human to pay attention to. This is called pseudo-color . Prüfungsfragen: • Was versteht man unter einem Falschfarbenbild (false color image) bzw. einem Pseudofarbbild (pseudo color image)? Nennen Sie je einen typischen Anwendungsfall! 5.3 Tri-Stimulus Theory, Color Definitions, CIE-Model The eye has color sensitive cones around the fovea, the area of highest color sensitivity in the eye. It turns out that these cones are not equally sensitive to red, green and blue. Slide 5.22 shows that we have much less sensitivity to blue light than we have to green and red. The eye’s cones can see the electromagnetic spectrum from 0.4 to 0.7 µm wavelength (or 400 to 700 nanometers). We find that the eye’s rods are most sensitive in the yellow - green area. Sensitivity luminance is best in that color range. Slide 5.23 illustrates the concept of the tri-stimulus idea. The tri-stimulus theory is attractive since it explains that all colors can be made from only 3 basic colors. If one were to create all spectral colors from red, green, blue, our cones in the eye would have to respond at the levels shown in Slide 5.13. The problem exists that one would have to allow for negative values in red, which is not feasible. So those colors cannot be created. Such colors are being falsified by too much red. The physics of color is explained in Slide 5.25. White light from the sun is falling onto an optical prism, breaking up the white light into the rainbow colors from ultraviolet via blue, green, yellow, orange to red and on to infrared. These are the spectral colors first scientifically explained by Sir Isaac Newton in 1666. We all recall from elementary physics that the electromagnetic spectrum is ordered by wavelength or frequency and goes from cosmic rays via gamma rays and X rays to ultraviolet, then on to the visible light, from there to near infrared, far infrared, microwaves, television and radio frequencies. Wavelengths of visible light range between 0.35 µm to 0.7 µm. Ultraviolet has shorter wavelengths in the range of 0.3 µm, infrared goes from 0.7 to perhaps several 300 µm. We would like to create color independent of natural light. We have two major ways of doing this. One is based on light, the other on pigments. We can take primary colors of light and mix them up. These primary colors would be green, blue and red, spectrally clean colors. As we mix equal portions of those three, we produce a white color. If we mix just two of them each we get yellow, cyan and magenta. 5.3. TRI-STIMULUS THEORY, COLOR DEFINITIONS, CIE-MODEL 97 In contrast to additive mixing of light there exist subtractive primaries of pigments. If we want to print something we have colors to mix. Primary colors in that case are magenta, yellow and cyan. As we mix equal parts we get black. If we mix pairs of them, we get red, green and blue. We call yellow, magenta and cyan primary colors, green, red and blue secondary colors of pigment. To differentiate between subtractive and additive primaries, we talk about pigments and light. An important difference between additive and subtractive colors is the manner in which they are being generated. A pigment absorbs a primary color of light and reflects the other two. Naturally then, if blue and green get reflected but red is absorbed, that pigment appears cyan, and represents the primary pigment “cyan”. The primary colors of light are perceived by the eye’s cones on the retina as red, green and blue, and combinations are perceived as secondary colors. The Commission Internationale of Éclairage (CIE) has been responsible for an entire world of standards and definitions. As early as 1931, CIE confirmed the spectral wavelenghts for red with 100 nm, green with 546.1 nm and blue with 435.8 nm. So far we have not yet been concerned about the dimensions of the color issue. But Munsell defined concepts such as hue1 , intensity (value or lightness), and saturation or chroma2 . We can build from such concepts a three dimensional space and define chromaticity, thus color, as a 2-dimensional subspace. The necessity of coping with negative color as one builds spectral colors from RGB has led the Commission Internationale l’Éclairage (CIE) to define 3 primary colors X, Y and Z. CIE defined their values to form the spectral colors as shown in Slide 5.27 The Y -curve was chosen to be identical to the luminous efficiency function of the eye. The auxiliary values X, Y and Z are denoted as tri-stimulus values, defining tri-chromatic coefficients x, y, z as follows: x = y = z = X X +Y +Z Y X +Y +Z Z X +Y +Z and x + y + z = 1. A 3-dimensional space is defined by X, Y, Z and by x, y, z. X, Y, Z are the amounts of red, green, and blue to obtain a specific color; whereas x, y, z are normalized tri-chromatic coefficients. One way of specifying color with the help of the tri-chromatic coefficients is by means of a CIE chromaticity diagram. A two dimensional space is defined by the plane x + y + z = 1 with an x-and a y-axis, whereby the values along the x-axis represent red, and y is green. The values vary between 0 and 1. The z-value (blue) results from z = 1 − x − y. There are several observations to be made about the CIE chromaticity diagram: 1. A point is marked as “green”, and is composed of 62% green, 25% red and from z = 1−x−y, 13% blue. 2. Pure spectral colors from a prism or rainbow are found along the edge of the diagram, with their wavelength in nm. 1 in 2 in German: Farbton German: Sättigung 98 CHAPTER 5. COLOR 3. Any point inside the tongue-shaped area represents a color that cannot only be composed from x, y and z, but also from the spectral colors along the edge of the tongue. 4. There is a point marked that has 33% of x, 33% of y and 33% of z and in the CIE-value for white light. 5. Any point along the boundary of the chromaticity chart represents a saturated color. 6. As a point is defined away from the boundary of the diagram we have a desaturated color by adding more white light. Saturation at the point of equal energy is 0. 7. A straight line connecting any 2 colors defines all the colors that can be mixed addditively from the end points. 8. From the white point to the edge of the diagram, one obtains all the shades of a particular spectral color. 9. Any three colors I, J, K define all other colors that can be mixed from them, by looking at the triangle by I, J, K. Definition 13 Conversion from CIE to RGB To device-specifically transform between different monitor RGB-spaces we can use transformations from a particular RGBmonitor -space to CIE XYZ-space. The general transformation can be written as: X Y Z = X r · Rm + X g · Gm + X b · B m = Yr · Rm + Yg · Gm + Yb · Bm = Zr · Rm + Zg · Gm + Zb · Bm Under the assumption that equal RGB voltages (1,1,1) should lead to the colour white and specifying chromaticity coordinates for a monitor consisting of long-persistence phosphors like this: x y red 0.620 0.330 we have for example: green 0.210 0.685 blue 0.150 0.063 X Y Z = 0.584 · Rm + 0.188 · Gm + 0.179 · Bm = 0.311 · Rm + 0.614 · Gm + 0.075 · Bm = 0.047 · Rm + 0.103 · Gm + 0.939 · Bm The inverse transformation is: Rm Gm Bm = 2.043 · X − 0.568 · Y − 0.344 · Z = −1.036 · X + 1.939 · Y + 0.043 · Z = 0.011 · X − 0.184 · Y + 1.078 · Z Prüfungsfragen: • Gegeben sei der CIE Farbraum. Erstellen Sie eine Skizze dieses Farbraumes mit einer Beschreibung der Achsen und markieren Sie in diesem Raum zwei Punkte A, B. Welche Farbeigenschaften sind Punkten, welche auf der Strecke zwischen A und B liegen, zuzuordnen, und welche den Schnittpunkten der Geraden durch A, B mit dem Rand des CIE-Farbraumes? 5.4. COLOR REPRESENTATION ON MONITORS AND FILMS 99 • Können von einem RGB-Monitor alle vom menschlichen Auge wahrnehmbaren Farben dargestellt werden? Begründen Sie Ihre Antwort anhand einer Skizze! 5.4 Color Representation on Monitors and Films The CIE chromaticity diagram describes more colors than the subset that is displayable on film on a monitor, or on a printer. The subset of colors that may be displayable on a medium can be represented from its primary colors in an additive system. A monitor uses the RGB model. In order for the same color to appear on a printer that was perceived on a monitor, and that might come from scanning color film, the proper mix of that color from the triangles can be assessed via the CIE chromaticity diagram. Prüfungsfragen: • Vergleichen Sie die Methode der Farberzeugung bei einem Elektronenstrahlbildschirm mit der beim Offset-Druck. Welche Farbmodelle kommen dabei zum Einsatz? 5.5 The 3-Dimensional Models The tri-stimulus values x, y, z define a 3D space as shown in Slide 5.33 with the plane x + y + z = 1 marked. If a color monitor builds its colors from 3 primaries RGB, then it will be able to display a subset of the CIE-colors. The xyz-space is shown in Slide 5.35 in 3 views. We extend our model to a three dimensional coordinate system with the red, green and blue color axes, the origin at black, a diagonal extending away from the origin under 45 degrees with each axis giving us gray values until we hit the white point. The red-blue plane defines the magenta color, the red-green plane defines yellow and the green-blue plane defines cyan. That resulting color model is shown in Slide 5.36 and is illustrated in Slide 5.37. The RGB values range between 0 and 1. The RGB model is the basis of remote sensing and displaying color images on various media such as monitors. How does one modify the histogram of an RGB-image? Clearly changing the intensity of each component image separately will change a resulting color. This needs to be avoided. We will discuss other color models that will help here. Prüfungsfragen: • Was versteht man unter einem dreidimensionalen Farbraum (bzw. Farbmodell)? Nennen Sie mindestens drei Beispiele davon! 5.6 CMY-Model Prüfungsfragen: • Gegeben sei ein Farbwert CRGB = (0.8, 0.5, 0.1)T im RGB-Farbmodell. 100 CHAPTER 5. COLOR Definition 14 CMY color model CMY stands for: C . . . Cyan M . . . Magenta Y . . . Yellow The three dimensional geometric representation of the CMY-Model can be done in the same way as the RGB-Model representation i.e a cube. In contrast to the RGB-Model the CMY-Model uses the principle of subtractive colors. Subtractive colors are seen when pigments in an object absorb certain wavelengths of white light while reflecting the rest. We see examples of this all around us. Any colored object, whether natural or man-made, absorbs some wavelengths of light and reflects or transmits others; the wavelengths left in the reflected/transmitted light make up the color we see. Some examples: • White light falling onto a cyan pigment will be reflected as a mix of blue and green since red will get absorbed. • White light falling onto a magenta pigment will be reflected as a mix of red and blue since green will get absorbed. • White light falling onto a yellow pigment will be reflected as a mix of red and green since blue will get absorbed. Therefore the conversion of RGB to CMY is supported by the physics of light and pigments. This leads to the following conversion-formulas: C M Y = 1−R = 1−G = 1−B R G B = 1−C = 1−M = 1−Y The CMY-Model is not used on monitors but in printing. 5.7. USING CMYK 101 1. Welche Spektralfarbe entspricht am ehesten dem durch CRGB definierten Farbton? 2. Finden Sie die entsprechende Repräsentation von CRGB im CMY- und im CMYKFarbmodell! Antwort: CCMY K CCMYK = (1, 1, 1)T − CRGB = (0.2, 0.5, 0.9)T = min(C, M, Y ) = 0.2 = (0, 0.3, 0.7, 0.2)T Der gegebene Farbton entspricht etwa orange. 5.7 Using CMYK Definition 15 CMYK color model CMYK is a scheme for combining primary pigments. The C stands for cyan (aqua), M stands for magenta (pink), Y is yellow, and K stands for black. The CMYK pigment model works like an ”upside-down” version of the RGB (red, green, and blue) color model. The RGB scheme is used mainly for computer displays, while the CMYK model is used for printed color illustrations (hard copy). K is being defined as the minimum of C 0 , M 0 , and Y 0 so that C is really redefined as C 0 − K, M as M 0 − K, and Y as Y 0 − K. Conversion from RGB to CMYK: C0 M0 Y0 K C M Y = = = = = = = 1−R 1−G 1−B min(C 0 , M 0 , Y 0 ) C0 − K M0 − K Y0−K Defining K (black) from CMY is called undercolor removal . Images become darker than they would be as CMY-alone, and there is less need for expensive printing colors CMY, which also need time to dry on paper. Prüfungsfragen: • Entsprechend welcher Formel wird eine CMYK-Farbdarstellung in eine RGB-Darstellung übergeführt? • Geben Sie die Umrechnungsvorschrift für einen RGB-Farbwert in das CMY-Modell und in das CMYK-Modell an und erklären Sie die Bedeutung der einzelnen Farbanteile! Wofür wird das CMYK-Modell verwendet? • Vergleichen Sie die Methode der Farberzeugung bei einem Elektronenstrahlbildschirm mit der beim Offset-Druck. Welche Farbmodelle kommen dabei zum Einsatz? • Im Vierfarbdruck sei ein Farbwert durch 70% cyan, 0% magenta, 50% gelb und 30% schwarz gegeben. Rechnen Sie den Farbwert in das RGB-Farbmodell um und beschreiben Sie den Farbton in Worten! 102 CHAPTER 5. COLOR Antwort: es ist CCMYK CCMY CRGB = (0.7, 0.0, 0.5, 0.3)T = (1, 0.3, 0.8)T = (0, 0.7, 0.2)T Die Farbe entspricht einem leicht bläulichen Grünton. 5.8 HSI-Model The hue-saturation-intensity color model derives from a transformation of the RGB color space that is rather complicated. The HSI-model is useful when analyzing images where color and intensity is important by itself. Also one may do an improvement of the image in its HSI-version, not the natural RGB-representation. Slide 5.44 introduces the transition from RGB to HSI. A color located at P in the RGB triangle has its hue H described by the angle with respect to the red axis. Saturation S is the distance from the white point, thus from the point of equal RGB at the center of the triangle. Intensity is not within the triangle of Slide 5.44, but is perpendicular to the triangle in plane, Slide 5.45 explains. The HSI-model is thus a pyramid - like shape. It is visualized in Slide 5.46. Conversion of RGB to HSI has been explained in concept, but it is based on one elaborate algorithm. The easiest element is intensity I which simply is I = 1/3(R + B + G). We do not detail H and S, nor do we address the inverse conversion from HSI to RGB. 5.9 YIQ-Model Prüfungsfragen: • Zum YIQ-Farbmodell: 1. Welche Bedeutung hat die Y -Komponente im YIQ-Farbmodell? 2. Wo wird das YIQ-Farbmodell eingesetzt? • Ein Farbwert CRGB = (R, G, B)T im RGB-Farbmodell wird in den entsprechenden Wert CYIQ = (Y, I, Q)T im YIQ-Farbmodell gemäß folgender Vorschrift umgerechnet: 0.299 0.587 0.114 CYIQ = 0.596 −0.275 −0.321 · CRGB 0.212 −0.528 0.311 Welcher biologische Sachverhalt wird durch die erste Zeile dieser Matrix ausgedrückt? (Hinweis: Überlegen Sie, wo das YIQ-Farbmodell eingesetzt wird und welche Bedeutung in diesem Zusammenhang die Y-Komponente hat.) 5.10 HSV and HLS -Models Variations on the HSI-Models are available. The HSV model (Hue-Saturation-Value) is also called the HSB model with B for brightness. This responds to the intuition of an artist, who thinks 5.10. HSV AND HLS -MODELS 103 Definition 16 YIQ color model This model is used in U.S. TV broadcasting. The RGB to YIQ transformation is based on a well-known matrix M . 0.299 0.587 0.114 M = 0.596 −0.275 −0.321 0.212 −0.523 0.311 The Y -component is all one needs for black & white TV. Y has the highest bandwidth, I and Q get less. Transmission of I, Q are separate from Y , where I, Q are encoded in a complex signal. RGB to YIQ Conversion: Y = 0.299 · R + 0.587 · G + 0.114 · B I = 0.596 · R − 0.275 · G − 0.321 · B Q = 0.212 · R − 0.523 · G + 0.311 · B YIQ to RGB Conversion: R G B = 1 · Y + 0.956 · I + 0.621 · Q = 1 · Y − 0.272 · I − 0.647 · Q = 1 · Y − 1.105 · I + 1.702 · Q Again, simple image processing such as histogram changes can take place with only Y . Color does not get affected since that is encoded in I, Q. in terms of tint, shade and tone. We introduce a cylindrical coordinate system, and the model defines a hexagon. In the coordinates with Slide 5.44. The hue is again measured as an angle around the vertical axis, in this case with intervals of 120 degrees going from one primary color to the next (Red at 0 degrees, green at 320. Blue at 240 and the intermediate degrees are then yellow, cyan and magenta). The value of saturation S is a ratio going from 0 at the center of the pyramid to one at the side of the hexagon. The values for V are varying between 0 for black and one for white. Note that the top of the hex? can be obtained by looking at the RGB cube along the diagonal axis from white to black. This is illustrated in Slide 5.45. This also provides the basic idea of converting an RGB input into an HSV color model. The HSL (hue-lightness-saturation) model of color is defined by a double hex-cone shown in Slide 5.49. The HLS model is essentially obtained as a deformation of the HSV model by pulling up from the center of the base of the hex-cone (the V = 1 plane). Therefore a transformation of an RGB into an HLS color model is similar to the RGB to HSV transformation. The HSV color space is visualized in Slide 5.52. Similarly Slide 5.53 illustrates an entire range of color models in the form of cones and hex-cones. Prüfungsfragen: • Gegeben sei ein Farbwert CRGB = (0.8, 0.4, 0.2)T im RGB-Farbmodell. Schätzen Sie grafisch die Lage des Farbwertes CHSV in Abbildung B.32 (also die Entsprechung von CRGB im HSV0 Modell). Skizzieren Sie ebenso die Lage eines Farbwertes CHSV , der den gleichen Farbton und die gleiche Helligkeit aufweist wie CHSV , jedoch nur die halbe Farbsättigung! 104 CHAPTER 5. COLOR Algorithm 11 Conversion from RGB to HSI 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: Input, real R, G, B, the RGB color coordinates to be converted. Output, real H, S, I, the corresponding HSI color coordinates. float Z, n, Hf, Sf, delta Z = ((R-G)+(R-B))*0.5 n = sqrt((R-G)*(R-G)+(R-B)*(G-B)) if n! = 0 then delta=acos(Z/n) else delta=0.0 end if if B <= G then Hf=delta else Hf=2.0*PI-delta end if {Default assignment} 18: 19: 20: 21: // Assignment to H and normalization to values between 0-255 H = (int) ((Hf*255.0)/(2.0*PI)) SUM = R+G+B 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: MIN = min(R,G,B) if SU M ! = 0 then Sf=1.0-3.0*(MIN/SUM) else Sf=255.0 end if {calculate minimum} S = (int) (Sf*255.0) I = (int) (SUM/3.0) if !(0 <= H, S, I <= 255) then Set H, S, I beetween 0 and 255 end if {prevent artifacts} 5.10. HSV AND HLS -MODELS 105 Algorithm 12 Conversion from HSI to RGB 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: Input, real H, S, I, the HSI color coordinates to be converted. Output, real R, G, B, the corresponding RGB color coordinates. float H, S, I float rt3, R, G, B, hue if S = 0 then R=I, G=I, B=I else rt3=1/sqrt(3.0) end if 22: 23: 24: 25: 26: 27: 28: 29: if 0.0 <= H < 120.0 then B=((1.0-S)*I) h=rt3*tan((H-60.0)*PI/180) G=(1.5+1.5*h)*I-(0.5+1.5*h)*B R=3.0*I-G-B else if 120.0 <= H < 240.0 then R=((1.0-S)*I) h=rt3*tan((H-180.0)*PI/180) B=(1.5+1.5*h)*I-(0.5+1.5*h)*R G=3.0*I-B-R else G=((1.0-S)*I) hue=rt3*tan((H-300.0)*PI/180) R=(1.5+1.5*h)*I-(0.5+1.5*h)*G B=3.0*I-R-G end if end if 30: 31: 32: 33: if !(0 <= R, G, B <= 255) then Set R, G, B beetween 0 and 255 end if 13: 14: 15: 16: 17: 18: 19: 20: 21: {prevent artifacts} 106 CHAPTER 5. COLOR Algorithm 13 Conversion from GRB to HSV 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: Input, real R, G, B, the RGB color coordinates to be converted. Output, real H, S, V, the corresponding HSV color coordinates. real B, bc, G, gc, H, R, rc, rgbmax, rgbmin, rmodp, S, V rgbmax = max ( R, G, B ) rgbmin = min ( R, G, B ) V = rgbmax Compute the saturation. if rgbmax/ = 0.0 then S = ( rgbmax - rgbmin ) / rgbmax else S = 0.0 end if Compute the hue. if S = 0.0 then H = 0.0 else rc = ( rgbmax - R ) / ( rgbmax - rgbmin ) gc = ( rgbmax - G ) / ( rgbmax - rgbmin ) bc = ( rgbmax - B ) / ( rgbmax - rgbmin ) if R = rgbmax then H = bc - gc else if G = rgbmax then H = 2.0 + rc - bc else H = 4.0 + gc - rc end if H = H * 60.0 Make sure H lies between 0 and 360.0 H = rmodp ( H, 360.0 ) end if end if 5.10. HSV AND HLS -MODELS Algorithm 14 Conversion from HSV to RGB 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: Input, real H, S, V, the HSV color coordinates to be converted. Output, real R, G, B, the corresponding RGB color coordinates. real B, f, G, H, hue, i, p, q, R, rmodp, S, t, V if s = 0.0 then R = V, G = V, B = V else Make sure HUE lies between 0 and 360.0 hue = rmodp ( H, 360.0 ) hue = hue / 60.0 i = int ( hue ) f = hue - real ( i ) p = V * ( 1.0 - S ) q = V * ( 1.0 - S * f ) t = V * ( 1.0 - S + S * f ) end if if i = 0 then R = V, G = t, B = p else if i = 1 then R = q, G = V, B = p else if i = 2 then R = p, G = V, B = t else if i = 3 then R = p, G = q, B = V else if i = 4 then R = t, G = p, B = V else if i = 5 then R = V, G = p, B = q end if end if end if end if end if end if 107 108 CHAPTER 5. COLOR Algorithm 15 Conversion from RGB to HLS 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: Input, real R, G, B, the RGB color coordinates to be converted. Output, real H, L, S, the corresponding HLS color coordinates. real B, bc, G, gc, H, L, R, rc, rgbmax, rgbmin, rmodp, S Compute lightness. rgbmax = max ( R, G, B ) rgbmin = min ( R, G, B ) L = ( rgbmax + rgbmin ) / 2.0 Compute saturation. if rgbmax = rgbmin then S = 0.0 else if L <= 0.5 then S = ( rgbmax - rgbmin ) / ( rgbmax + rgbmin ) else S = ( rgbmax - rgbmin ) / ( 2.0 - rgbmax - rgbmin ) end if end if Compute the hue. if rgbmax = rgbmin then H = 0.0 else rc = ( rgbmax - R ) / ( rgbmax - rgbmin ) gc = ( rgbmax - G ) / ( rgbmax - rgbmin ) bc = ( rgbmax - B ) / ( rgbmax - rgbmin ) if r = rgbmax then H = bc - gc else if g = rgbmax then H = 2.0 + rc - bc else H = 4.0 + gc - rc end if H = H * 60.0 Make sure H lies between 0 and 360.0. H = rmodp ( H, 360.0 ) end if end if 5.10. HSV AND HLS -MODELS Algorithm 16 Conversion from HLS to RGB 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: Input, real H, L, S, the HLS color coordinates to be converted. Output, real R, G, B, the corresponding RGB color coordinates. real B, G, H, hlsvalue, L, m1, m2, R, S if L <= 0.5 then m2 = L + L * S else m2 = L + S - L * S end if m1 = 2.0 * L - m2 if S = 0.0 then R = L, G = L, B = L else R = hlsvalue ( m1, m2, H + 120.0 ) G = hlsvalue ( m1, m2, H ) B = hlsvalue ( m1, m2, H - 120.0 ) end if Algorithm 17 hlsvalue(N1,N2,HLSVALUE) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: Input, real N1, N2, H. Output, real HLSVALUE. real H, HLSVALUE, hue, N1, N2 rmodp Make sure HUE lies between 0 and 360. hue = rmodp ( H, 360.0 ) if hue < 60.0 then hlsvalue = N1 + ( N2 - N1 ) * hue / 60.0 else if hue¡180.0 then HLSVALUE = N2 else if hue < 240.0 then HLSVALUE = N1 + ( N2 - N1 ) * ( 240.0 - hue ) / 60.0 else HLSVALUE = N1 end if end if end if 109 110 CHAPTER 5. COLOR grün gelb cyan rot weiß blau magenta Figure 5.2: eine Ebene im HSV-Farbmodell Antwort: Es gilt (siehe Abbildung 5.2): CHSV 0 CHSV 0 CRGB = (20◦ , 75%, 0.8) = (20◦ , 37.5%, 0.8) = (0.8, 0.6, 0.5) Halbierung der Sättigung im HSV-Modell bedeutet Halbierung der Entfernung vom Zentrum. Die Komponenten des entsprechenden Punktes im RGB-Modell liegen näher beinander, die Ordnung bleibt aber erhalten. • Welche Farbe liegt in der Mitte“, wenn man im RGB-Farbraum zwischen den Farben gelb ” und blau linear interpoliert? Welcher Farbraum wäre für eine solche Interpolation besser geeignet, und welche Farbe läge in diesem Farbraum zwischen gelb und blau? 5.11 Image Processing with RGB versus HSI Color Models An RGB color test pattern is shown in Slide 5.51. This test pattern is being used to calibrate printers, monitors, scanners, image color through a production system that is based on color. This particular test pattern is a digital and offers 8 bits of red, green and blue. This pattern is symmetric from top to bottom, consisting of one black band on top, bands two, three and four are the primary colors. For the RGB model 5, 6, 7 are the secondary colors, band 8 should be white, band 9 then is a continuous variation from blue to red. Band 9 is a gray wedge. The manner in which the band of rainbow colors is shown in Slide 5.52 obtained is by continuously varying from left to right the intensity of blue through values of 1 to 0, of red from 0 to full intensity, and then green goes from 0 to full and back to 0 across the band. Using the process we have conceptually hinted at in the HSI-model converts this RGB image into an HSI image. The easy part is the computation of the intensity I, the complex process is the computation of hue and saturation. In Slide 5.46, we are looking at the same pattern in terms of hue: we see that we have lost all sense of color and essentially have a bright image on the left and a dark image on the right in the color band. Looking at the saturation the variation in the various colors has also disappeared and the variation of saturation going in the color band from left to center to right. Most of the information is in the intensity band although some differences in colors have disappeared here. 5.12. SETTING COLORS 111 The advantage of the HSI model is that we can optimize an image by just optimizing the intensity segment of the HSI presentation. It is not uncommon that one goes from the RGB into the HSI color model, modifies the intensity band only and then does the transformation back into RGB. This typically will apply for histogram modifications of color images. As stated earlier this optimization will preserve the color and saturation and it will only change the contrast as we perceive it through the intensity of the image. Doing the optimization on each color band separately will give us unpredictable color results. Slide 5.53 illustrates the approach by means of an underexposed RGB original of a Kakadu bird. The result obtained by an HSI transformation and histogram equalization of just the intensity band produces the result shown next. We do have a much improved and satisfactory image. A similar ideology is used when one creates color products from multiple input sources: an example might be a high resolution black and white satellite image at one meter pixel size that is being combined with a lower resolution color image in RGB at 4 meter resolution. A process to combine those two image sources takes the RGB low resolution image and converts it into an HSI-model. The I component is then removed and for it one inserts the higher resolution black and white satellite image. The result is transformed back into RGB space. The entire operation requires of course that all images have the same pixel size and are a perfect geometric match. 5.12 Setting Colors We have now found that a great number of different color models exist that allow us to define colors in various ways. Slide 5.60 is a pictorial summary of the various color models. The models shown are those that are common in the image processing and technical arena. The most popular color model in the very large printing and graphic arts industry is not shown here, and that is the CMYK model. Setting a color on a monitor or printer requires that the color be selected on a model that the output device uses. Let us assume that we choose the red, green, blue model for presentation of an image on a color monitor. It is customary to change the red, green and blue channels in order to obtain a desired output color. Inversely, an output color could be selected and the RGB components from which that output color is created are being set automatically. If we were to choose the HSV color model we would create a sample color by selection of an angle for the hue we would shift saturation on a slider between 0 and 1, we would set the value also between 0 and 1 and in the process obtain the result in color. Inversely, a chosen color could be converted into its HSV components. Finally, the HSI and RGB models can be looked at simultaneously: as we change the HSI values, the system instantaneously computes the RGB output and vice versa. In the process, the corresponding colors are being shown as illustrated in Slide 5.63. Optical illusions are possible in comparing colors: Slide 5.64 shows the same color appearing differently when embedded in various backgrounds. 5.13 Encoding in Color This is the topic of pseudo-color in image processing where we assign color to gray values in order to highlight certain phenomena and make them more easily visible to a human observer. Slide 5.67 illustrates a medical X-ray image, initially in a monochrome representation. The gray values can be “sliced” into 8 different gray value regions which then can be encoded in color. The concept of this segmentation into gray value regions is denoted as intensity slicing, sometimes density slicing and is illustrated in Slide 5.66. The medical image may be represented by Slide 5.66 where the 112 CHAPTER 5. COLOR gray values are encoded as f (x, y). A plane is defined that intersects the gray values at a certain level li, one can assign now all pixels with a value greater than li to one color. All pixels below the slicing plane can be assigned to another color, and by moving the slicing plane we can see very clearly on a monitor which pixels are higher and lower than the slicing plane. This becomes a much more easily interpretable situation than one in which we see the original gray values only. Another matter of assigning color to a black and white image is illustrated in Slide 5.68. The idea is to take an input value f (x, y) and apply it, three different transformations, one into a red image, one into a green image and one into a blue image, so that the to three different images are assigned to the red, green and blue guns of a monitor. A variety of transformations would be available to obtain from a black and white image the colorful output. Of course, the matter of Slide 5.68 is nothing but a more general version of the specialized slicing plane applied in the previous Slide 5.66. Slide 5.69 illustrates the general transformation from a gray level to color with the example of an X-ray image obtained from a luggage checking system at an airport. We can see in the example how various color transformations enhance a luggage with and without explosives such that the casual observer might notice the explosive in the luggage very quickly. We skip the discussion of the details of a very complex color transformation but refer to [GW92, chapter 4.6]. Prüfungsfragen: • Was versteht man unter einem Falschfarbenbild (false color image) bzw. einem Pseudofarbbild (pseudo color image)? Nennen Sie je einen typischen Anwendungsfall! 5.14 Negative Photography The negative black and white photograph of Slide 5.71 is usually converted to positive by inverting the gray values as in Slide 5.72. This is demonstrated in Algorithm ??. Well we take a transformation that simply inverts the values 0 to 255 into 255 to 0. This trivial approach will not work with color photography. As shown in Slide 5.73 color negatives typically are masked with a protective layer that has a brown-reddish color. If one were to take an RGB scan of that color negative and convert it into a positive by inverting the red, green and blue components directly one would obtain a fairly unattractive result as shown in Slide 5.74. One has first to eliminate the protective layer, that means one has to go to the edge of the photograph and find an area that is not part of the image to determine the RGB components that represent that protective layer and then we have to subtract the R component from all pixel R-values, similarly in the B component and in green G. As a result we obtain a clean negative as shown in Slide 5.75. If we now convert that slide we obtain a good color positive as shown in Slide 5.76. Again, one calls this type of negative a masked negative (compare Algorithm 18). There have been in the past developments of color negative film that is not masked. However, that film is for special purposes only and is not usually available. Algorithm 18 Masked negative of a color image 1: 2: 3: 4: 5: 6: 7: 8: locate a pixel p which color is known in all planes {e.g. the black film border} for all planes plane do diff = grayvalue(p, plane) - known grayvalue(p, plane) {calculate the “masking layer”} for all pixel picture do grayvalue(pixel,plane) = grayvalue(pixel, plane) - diff {correct the color} Invert(pixel) {invert the corrected negative pixel to get the positive} end for end for 5.15. PRINTING IN COLOR 113 Prüfungsfragen: • Abbildung B.62 zeigt ein eingescanntes Farbfilmnegativ. Welche Schritte sind notwendig, um daraus mittels digitaler Bildverarbeitung ein korrektes Positivbild zu erhalten? Berücksichtigen Sie dabei, dass die optische Dichte des Filmes auch an unbelichteten Stellen größer als Null ist. Geben Sie die mathematische Beziehung zwischen den Pixelwerten des Negativund des Positivbildes an! 5.15 Printing in Color As we observe advertisement spaces with their posters, we see colorful photographs and drawings which, when we inspect them from a short distance, are really the sum of four separate screened images. We have said earlier that for printing the continuous tone images are being converted into half tones and we also specified in a digital environment that each pixel is further decomposed by a dithering matrix into subpixels. When printing a color originally, one typically uses the four color approach and bases this on the cyan, magenta, yellow and black pigments which are the primary colors from which the color images are being produced. Each of these separates of the four components is screened and the screen has an angle with respect to the horizontal or vertical. However, in order to avoid a Moiree effect, by interference of the different screens with one another, the screens themselves are slightly rotated with respect to one another. This type of printing is used in the traditional offset printing industry. If printing is then directly from a computer onto a plotter paper, then the dithering approach is used instead. If we look at a poster that is directly printed with a digital output device and not via an offset press, we can see how the dithering matrix is responsible for each of the dots on the poster. Again each dot is encoded by one of the four basic pigment colors, cyan, magenta yellow or black. Prüfungsfragen: • Beschreiben Sie die Farberzeugung beim klassischen Offsetdruck! Welches Farbmodell wird verwendet, und wie wird das Auftreten des Moiree-Effekts verhindert? Antwort: Vier separate Bilder (je eines für die Komponenten Cyan, Magenta, Yellow und Black) werden übereinander gedruckt (CMYK-Farbmodell). Jede Ebene ist ein HalftoneBild, wobei die Ebenen geringfügig gegeneinander rotiert sind, um den Moiree-Effekt zu verhindern. 5.16 Ratio Processing of Color Images and Hyperspectral Images We start out from a color image and for simplicity we make the assumption that we only have two color bands, R, G, so that we can explain the basic idea of ratio imaging. Suppose a satellite is imaging the terrain in those two colors. As the sun shines onto the terrain, we will have a stronger illumination on terrain slopes facing the sun than on slopes that face away from the sun. Yet, the trees may have the exact same color on both sides of the mountain. When we look now at the image of the terrain, we will see differences between the slopes facing the sun and the slopes facing away from the sun. 114 CHAPTER 5. COLOR In Slide 5.81 let’s take three particular pixels, one from the front slope, one from the back, and perhaps a third pixel from a flat terrain, all showing the same type of object, namely a tree. We now enter these three pixels into a feature space that is defined by the green and red color axes. Not surprisingly, the three locations for the pixels that we have chosen are on a straight line from the origin. Clearly, the color of all three pixels is the same, but the intensity is different. We are back again with the ideology of the HSI-model. We now can create two images from the one color input image. Both of those images are black and white. In one case, we place at each pixel its ratio R/G, the angle that that vector forms with the abscissa. In the other image we place at each pixel the distance of the pixel from the origin in the feature space: As a result, we obtain one black and white image in Slide 5.82, that is clean of color and shows us essentially the variations in intensity as a function of slope. The other image in Slide 5.83 shows us the image clean of variations of density as if it were all flat and therefore the variations of color are only shown there. Conceptually, one image is the I component of an HSI transformation and the other one is the H component. Such ratio images have in the past been used to take satellite images and make an estimate of the slope of the terrain, assuming that the terrain cover is fairly uniform. That clearly is the case on glaciers, the arctic or antaractic or in heavily wooded areas. 5.16. RATIO PROCESSING OF COLOR IMAGES AND HYPERSPECTRAL IMAGES 115 116 CHAPTER 5. COLOR Slide 5.1 Slide 5.2 Slide 5.3 Slide 5.4 Slide 5.5 Slide 5.6 Slide 5.7 Slide 5.8 Slide 5.9 Slide 5.10 Slide 5.11 Slide 5.12 Slide 5.13 Slide 5.14 Slide 5.15 Slide 5.16 Slide 5.17 Slide 5.18 Slide 5.19 Slide 5.20 Slide 5.21 Slide 5.22 Slide 5.23 Slide 5.24 Slide 5.25 Slide 5.26 Slide 5.27 Slide 5.28 5.16. RATIO PROCESSING OF COLOR IMAGES AND HYPERSPECTRAL IMAGES Slide 5.29 Slide 5.30 Slide 5.31 Slide 5.32 Slide 5.33 Slide 5.34 Slide 5.35 Slide 5.36 Slide 5.37 Slide 5.38 Slide 5.39 Slide 5.40 Slide 5.41 Slide 5.42 Slide 5.43 Slide 5.44 Slide 5.45 Slide 5.46 Slide 5.47 Slide 5.48 Slide 5.49 Slide 5.50 Slide 5.51 Slide 5.52 Slide 5.53 Slide 5.54 Slide 5.55 Slide 5.56 117 118 CHAPTER 5. COLOR Slide 5.57 Slide 5.58 Slide 5.59 Slide 5.60 Slide 5.61 Slide 5.62 Slide 5.63 Slide 5.64 Slide 5.65 Slide 5.66 Slide 5.67 Slide 5.68 Slide 5.69 Slide 5.70 Slide 5.71 Slide 5.72 Slide 5.73 Slide 5.74 Slide 5.75 Slide 5.76 Slide 5.77 Slide 5.78 Slide 5.79 Slide 5.80 Slide 5.81 Slide 5.82 Slide 5.83 5.16. RATIO PROCESSING OF COLOR IMAGES AND HYPERSPECTRAL IMAGES 119 Prüfungsfragen: • Was ist ein Ratio-Bild“? ” • Zu welchem Zweck würde man als Anwender ein sogenanntes Ratio-Bild“ herstellen? Ver” wenden Sie bitte in der Antwort die Hilfe einer Skizze zur Erläuterung eines Ratiobildes. 120 CHAPTER 5. COLOR Chapter 6 Image Quality 6.1 Introduction As image quality we generally denote an objective impression of the crispness, the color, the detail, the composition of an image. Slide 6.2 is an example of an exciting image with a lot of detail, crispness and color. Slide 6.3 adds the excitement of motion and a sentiment of activity and cold. Generally in engineering we do not deal with these concepts that are more artistic and aesthetic. We deal with art definitions. 6.2 Definitions In images we define quality by various components. Slide 6.5 illustrates radiometric concepts of quality that relate to density and dynamic range. Density 0 means that the light can go through the image unhindered, density 4 means that the image blocks the light. Intensity is the concept associated with the object. Greater intensity means that more light is coming from the object. The dynamic range of an image is the greatest density value divided by the least density value in the image, the darkest value divided by the brightest value. The dynamic range is typically encoded logarithmically. Prüfungsfragen: • Was versteht man unter dem dynamischen Bereich“ eines Mediums zur Wiedergabe bild” hafter Informationen, und im welchem Zusammenhang steht er mit der Qualität der Darstellung? Reihen Sie einige gebräuchliche Medien nach aufsteigender Größe ihres dynamischen Bereiches! 6.3 Gray Value and Gray Value Resolutions We have already described in earlier presentations the idea of resolving gray values. Chapter 3 the concept of a gray wedge and how a gray wedge gets scanned to assess the quality of a scanning process. Similarly we can assess the quality of an image by describing how many different gray values the image can contain. Slide ?? illustrates the resolution of a gray value image. Note again that in this case we talk about the gray values in an image whereas in the previous chapter we talked about the quality of the conversion of a given continuous tone image into a digital rendition in a computer in the process of scanning. 121 122 CHAPTER 6. IMAGE QUALITY Resolving great radiometric detail means that we can recognize objects in the shadow, while we also can read writing on a bright roof. Resolution of the gray values in the low density bright areas does not compromise a resolution in the high density dark areas. Slide ?? is a well resolved image. Prüfungsfragen: • Was versteht man unter der Grauwerteauflösung eines digitalen Rasterbildes? Antwort: Die Anzahl der verschiedenen Grauwerte, die in dem Bild repräsentiert werden können 6.4 Geometric Resolution Again, just as in the process of scanning an image, we can judge the image itself independent from its digital or analog format. I refer to an earlier illustration which essentially describes again by means of the US Air Force (USAF) resolution target how the quality of an image can be described by means of how well it shows small objects on the ground of in the scene. We recall that the USAF target, when photographed presents to the camera groups of line patterns and within each group elements. So in this particular case of Slide 6.10 group 6, element 1 is the element still resolved. We know from an accompanying table that that particular element in group 6 presents the resolution of 64 line pairs per mm. We can see in the lower portion of the slide that element 6 in group 4 represents 28 pairs per mm. We have now in Slide 6.11 a set of numbers typical of the geometrical resolution in digital image. We have a resolution where we typically deal with of dots per inch, for example when something is printed. So a high resolution is 3000 dots per inch, a low resolution is 100 dots per inch. Note that at 3000 dots per inch, each point is about 8 micrometers, recall that 1000 dots per inch is 25 micrometers per pixel. Which leads us to the second measure of geometric resolution: the size of a pixel. When we go to a computer screen, we have a third measure and we say the screen can resolve 1024 by 1024 pixels, irrespective of the size of the screen. Recall the observations about the eye and the fovea. We said that we had about 150 000 cone elements per mm on the fovea. So when we focus our attention on the computer monitor, those 1000 by 1000 pixels would represent the resolution of about 3 by 3 mm on the retina. We may really not have any use for a screen with more resolution, because we wouldn’t be able to digest the information on the screen in one glance, because it would overwhelm our retina. A next measure of resolution is the linear line pairs per mm, mentioned earlier. The 25 line pairs per mm is a good average of resolution for photography on a paper-print, and 50 line pairs per mm is a very good resolution on film. Best resolutions can be obtained with spy photography which is very slow filming needs lots of exposure times, but is capable of resolving great detail. In that case we make it in access of 75 line pairs per mm. It might be of interest to define the geometric resolution of an unaided eye, that is 3 to 8 pixels per mm at a distance of 25 cm. Again, when a human person sits in front of a monitor and starts seeing the images at a continuous pattern, and not recognizing individual pixels, at 3 pixels per mm the screen could have a dimension of 300 by 300 mm. For an eagle-eyed person at 8 pixels the same surface that the human can resolve would be about an 12 by 12 cm square. It is of interest to relate these resolutions to one another, this is shown in Slide 6.15. Film may have n line pairs per mm. This represents 2.8 × n pixels per mm (see below). If we had film of 25 pairs per mm then we would have to represent this image at 14 micrometers per pixel under this relationship. Now on a monitor of a sidelength of 250 mm with 1024 pixels, one pixel has the dimension of 0.25 mm. 6.5. GEOMETRIC ACCURACY 123 We can again confirm that if we have on a monitor each pixel occupying 250 micrometers (equals 0.25 mm) then we have 4 pixels in a mm, then typically the range of normal vision people perceive this as a continuous tone image, the actual range is at 125 to 300 micrometer per pixel. The Kell-factor proposed during World War II in the context of television suggests that resolving a single line pair of a black and white line by 2 pixels will be insufficient because statistically we cannot be certain that those pixels would fall directly on each dark line and on each bright line, but they fall halfway in between. If they do, the line pairs will not be resolved. Therefore Kell proposed, that the proper number of pixels to resolve under all circumstances each line pair needs √ is 2 2 the number of line pairs per mm. Prüfungsfragen: • Ein sehr hochauflösender Infrarotfilm wird mit einer geometrischen Auflösung von 70 Linienpaaren pro Millimeter angepriesen. Mit welcher maximalen Pixelgröße müsste dieser Film abgetastet werden, um jedweden Informationsverlust gegenüber dem Filmoriginal zu vermeiden? • Welches Maß dient der Beschreibung der geometrischen Auflösung eines Bildes, und mit welchem Verfahren wird diese Auflösung geprüft und quantifiziert? Ich bitte Sie um eine Skizze. 6.5 Geometric Accuracy An image always represents a certain geometric accuracy of the object. Again we have already taken a look at the basic idea when we talked in an earlier chapter about the conversion of a given analog picture into a digital forum. Geometric accuracy of an image is described by the sensor model, a concept mentioned in the chapter on sensors. We have deviations between the geometric locations of object points in a perfect camera, from the geometric locations of our real camera. Those discrepancies can be described in a calibration procedure. Calibrating imaging systems is a big issue and has given many diploma engineers and doctors their degrees in vision. The basic idea is illustrated in Slide 6.17. Prüfungsfragen: • Was versteht man unter der geometrischen Genauigkeit (geometric accuracy) eines digitalen Rasterbildes? 6.6 Histograms as a Result of Point Processing or Pixel Processing The basic element of analyzing the quality of any image is a look at its histogram. Slide 6.19 illustrates an input image in color, that is semidark and for which we want to build its histogram: we find many pixels in the darker range and fewer in the brighter range. We can now change this image by redistributing the histogram in a process called histogram equalization. We see however in Slide 6.20 that we have a histogram for each of the color component images, while we are only showing a composite of the colors denoted as luminosity. The summary of this manipulation is shown in Slide 6.22. A very common improvement of an image’s quality is a change of the assignment of gray values to the pixels of an image. This is based on the histogram. Let us assume that there indeed in each 124 CHAPTER 6. IMAGE QUALITY Algorithm 19 Histogram equalization 1: 2: For an N x M image of G gray-levels (often 256), create an array H of length G initialized with 0 values. Form the image histogram: Scan every pixel and increment the relevant member of H - if pixel p has intensity gp , perform H[gp ] = H[gp ] + 1 3: Form the cumulative image histogram Hc Hc [0] = H[0] Hc [p] = Hc [p − 1] + H[p] where p = 1, 2, ..., G − 1 4: Set T [p] = round( 5: G−1 Hc [p]) NM Rescan the image and write an output image with gray-levels gq , setting gq = T [gq ] Definition 17 Histogram stretching Stretching or spreading of an histogram is mapping the grey value of each pixel of an image or part of an image to an piecewise continuous function T (r). Normally the gradation curve T (r) is monotonous growing and assigns a small range of gray values of the input image over the entire range of available values, so that the result image looks as if it had a lot more contrast. 6.6. HISTOGRAMS AS A RESULT OF POINT PROCESSING OR PIXEL PROCESSING 125 of the 8 bit input images exists one more pixel. We may have individual gray values, say the gray value 67 which may have 10 000 pixels, gray values 68 may have none, gray values 69 may have none, but gray value 70 may again have 7000 pixels. We can change this image by allocating. The is input gray values to new output values depending on their frequency as seen in the histogram. We aim for a histogram that is as uniform as possible. Slide 6.23 shows a detail of the previous slide in one case with the input histogram and in a second case with the equalized histogram: we have attempted to distribute the gray values belonging to the input pixels such that the histogram is as uniform as possible. Slide 6.24 shows how we can change the gray values from an input image B into an output image C. Geometrically we describe the operation by a 2-d diagram with the abscissa for input pixels and the ordinate for the gray values. The relationship between input and output pixels is shown on the curve in the slide representing a look-up-table. Slide 6.25 illustrates again how an input image can be changed and how a certain area of the input image can be highlighted in the output image. We simply set all input pixels below a certain threshold A and above a certain threshold B to zero, and then set the intermediate range and spread that range to a specific value in the output image. Another method of highlighting is to take an input image and convert it one-on-one to an output image with the exception of a gray value range from a lower gray value A to an upper gray value B which is set into one output gray value, thereby accentuating this part of the input image. Another analysis is shown in Slide 6.26 where we represent the 8 bits of a gray value image as 8 separate images and in each bit plane we see the bit that is set in the byte of the image. Bit plane 7 is most significant, bit plane zero the least significant. We obtain an information about the contents of an image as shown in Slide 6.27 where we see the 8 levels of an image and note that we have a thresholded type image at level 7 and we have basically no information in level 0. We see there is very low information in the lower three bits of that image. In reality we may not deal with an 8 bit image, but with a 5 bit image. Histograms let us see where all pixels are aggregated. Low digital numbers represent a dark image, pixels clustered in the high digital numbers show a bright image. A narrow histogramm with a single peak is a low contrast image because does not have many different gray values. However, if an image has all of its gray values occupied with pixels and if those are equally distributed, we obtain a high contrast - high quality image. How do we change the histogram of an image and spread it or equalize it? We think of the image gray values in the input on the abscissa of a 2D-diagramm and translate them to output gray values on the ordinate. We use a curve that relates the input to the output pixels. The curve is denoted as gradation curve or t(r). Let’s take an example of an image with very low contrast as signified by a histogram that has only values in the range round 64 and 10 gray values to the left and to the right.. We now spread this histogram by a curve t(r) that takes the input values where we have many and spreads them over many values in the output image. As a result we now have pixels values spread over the entire range of available values, so that the image looks as if it had a lot more contrast. We may not really be able to change the basic shape of the histogram, but we can certainly stretch it as shown in slide ??. Equlisation is illustrated in slide ??. We may want to define a desired histogram and try to approach this histogram given an input image which may have a totally different histogram. How does this work? Slide 6.36 explains. Let’s take a thermal image of an indoor scene. We show the histogram of this input image and for compansion we also illustrate the result of equalization. We would like, however, to have a histogram as shown in the center of the histogram display of the slide. We change the input histogram to approach the designed histogram as best as we possbile obtaining the third image. The resulting image permits one to see chairs in the room. Slide 6.36 summarizes that enhancement is the improvement of the image by locally processing each pixel separately from the other pixels. This may not only concern contrast but could address noise as well. A noisy input image can become worse if we improve the histogram since we may increase the noise. If we do some type of histogram equalization that is locally changing we might get an improvment of the image structure and increased ease of interpretability. 126 CHAPTER 6. IMAGE QUALITY An example is shown in slide 6.38. We have a very noisy image and then embedded in the image are 5 targets of interest. With a global histogram process we may not be able to resolve the detail within those 5 targets. We might enhance the noise that already exists in the image and still not see what is inside the targets. However, when we go through the image and we look at individual segments via a small window and we improve the image locally, moving the window from place to place with moving new parameters at each location, we might obtain the result as shown in the next component of the Slid. We find that detail within each target consists of a point and a square around that point. Algorithm 20 Local image improvement g(x,y) ... Ergebnisbild f(x,y) ... Ausgangsbild g(x, y) = A(x, y) ∗ {f (x, y) − m(x, y)} + m(x, y) und A(x, y) = k∗M σ(x, y) wobei k ... Konstante M ... globaler Mittelwert σ ... Standardabweichung der Grauwerte We have taken an input image f (x, y) and created a resulting image g(x, y), by a formula shown in slide 6.39. There is a coefficient A(x, y) involved and a mean of m(x, y). So in a window we computed mean gray value m(x, y) as an average gray value, we subtract it from each gray value in the image f , we multiply the difference by a multiplication factor A(x, y) and then add back the mean m(x, y). What is this A(x, y)? It is in itself a function of (x, y) in each window we compute the mean m(x, y) and a standard deviation σ(x, y) of the gray values. We also compute a global average M , separate from the average of each small window, and we have some constant k. These improvements of images according to the formula, and similar approaches, are heavily used in medical imaging and many other areas where images are presented to the eye for interactive analysis. We are processing images here, but before we analyze them. Therefore, we call this preprocessing. A particular example of preprocessing is shown in the medical image of slide 6.40 illustrating how a bland image with no details reveals its detail after some local processing. Another idea is the creation of difference images, for example an X-ray image of a brain taken before and after some injection is given as a contrast agent. We then have an image of the brain before and after the contrast agent has entered into the blood stream. The two images can then be subtracted and will highlight the vessels that contain the contrast material. How else can we improve images? We can take several noisy images and average them. For example we can take a microscopic image of some cells, and a single image may be very noisy, but by repeating the image and computing the average of the gray values of each pixels we eliminate the noise and obtain a better signal. Slide 6.44 shows the effect of averaging 128 images. Prüfungsfragen: • Gegeben sei das Grauwertbild in Abbildung B.59. Bestimmen Sie das Histogramm dieses Bildes! Mit Hilfe des Histogramms soll ein Schwellwert gesucht werden, der geeignet ist, das Bild in Hintergrund (kleiner Wert, dunkel) und Vordergrund (großer Wert, hell) zu segmentieren. Geben Sie den Schwellwert an sowie das Ergebnis der Segmentierung in Form eines Binärbildes (mit 0 für den Hintergrund und 1 für den Vordergrund)! 6.6. HISTOGRAMS AS A RESULT OF POINT PROCESSING OR PIXEL PROCESSING 127 50 0 0 255 Figure 6.1: Histogramm eines Graukeils • Abbildung B.33 zeigt einen Graukeil, in dem alle Grauwerte von 0 bis 255 in aufsteigender Reihenfolge vorkommen, die Breite beträgt 50 Pixel. Zeichnen Sie das Histogramm dieses Bildes und achten Sie dabei auf die korrekten Zahlenwerte! Der schwarze Rand in Abbildung B.33 dient nur zur Verdeutlichung des Umrisses und gehört nicht zum Bild selbst. Antwort: siehe Abbildung 6.1 • Abbildung B.74(a) zeigt das Schloss in Budmerice (Slowakei), in dem alljährlich ein Studentenseminar1 und die Spring Conference on Computer Graphics stattfinden. Durch einen automatischen Prozess wurde daraus Abbildung B.74(b) erzeugt, wobei einige Details (z.B. die Wolken am Himmel) deutlich verstärkt wurden. Nennen Sie eine Operation, die hier zur Anwendung gekommen sein könnte, und kommentieren Sie deren Arbeitsweise! • Skizzieren Sie das Histogramm eines 1. dunklen, 2. hellen, 3. kontrastarmen, 4. kontrastreichen monochromen digitalen Rasterbildes! Antwort: Siehe Abbildung 6.2, man beachte, dass die Fläche unter der Kurve immer gleich groß ist. 1 Für interessierte Studenten aus der Vertiefungsrichtung Computergrafik besteht die Möglichkeit, kostenlos an diesem Seminar teilzunehmen und dort das Seminar/Projekt oder die Diplomarbeit zu präsentieren. 128 CHAPTER 6. IMAGE QUALITY (a) dunkel (b) hell (c) kontrastarm (d) kontrastreich Figure 6.2: Histogramme 6.6. HISTOGRAMS AS A RESULT OF POINT PROCESSING OR PIXEL PROCESSING 129 130 CHAPTER 6. IMAGE QUALITY Slide 6.1 Slide 6.2 Slide 6.3 Slide 6.4 Slide 6.5 Slide 6.6 Slide 6.7 Slide 6.8 Slide 6.9 Slide 6.10 Slide 6.11 Slide 6.12 Slide 6.13 Slide 6.14 Slide 6.15 Slide 6.16 Slide 6.17 Slide 6.18 Slide 6.19 Slide 6.20 Slide 6.21 Slide 6.22 Slide 6.23 Slide 6.24 Slide 6.25 Slide 6.26 Slide 6.27 Slide 6.28 6.6. HISTOGRAMS AS A RESULT OF POINT PROCESSING OR PIXEL PROCESSING 131 Slide 6.29 Slide 6.30 Slide 6.31 Slide 6.32 Slide 6.33 Slide 6.34 Slide 6.35 Slide 6.36 Slide 6.37 Slide 6.38 Slide 6.39 Slide 6.40 Slide 6.41 Slide 6.42 Slide 6.43 Slide 6.44 Slide 6.45 132 CHAPTER 6. IMAGE QUALITY Chapter 7 Filtering 7.1 Images in the Spatial Domain We revisit the definition of an image space with its cartesian coordinates x and y to denote the columns and rows of pixels. We define a pixel at location (x, y) and denote its gray value with f (x, y). Filtering changes the gray value f of an input image into an output gray value g(x, y) in accordance with Slide 7.3. The transformation g(x, y) = T [f (x, y)] is represented by an operator T which acts on the pixel at location (x, y) and on its neighbourhood. The neighbourhood is defined by a mask which may also be denoted as template, window or filter mask . We can therefore state in general terms that: a filter is an operation that produces from an input image and its pixels f (x, y) an output image with pixels g(x, y) by a filter operator T . This operator uses in the transformation the input pixel and its neighbourhood to produce a value in the output pixel. We will see later that filtering is a concept encompassing many different types of operations to which the basic definition applies common. It may be of interest to note that some of the operations we have previously discussed can be classified as filter operations, namely the transformation of the image where an operation addresses a neighbourhood of size 1 × 1. Those transformations produce from an input an output pixel via the transfer function T that one calls “point operations” or transformations of individual pixels. We have the special case of contrast enhancement in Slide 7.4, and of “thresholding”. Similarely, these operations on single pixels included the inversion of a negative to a positive as shown in Slide 7.5. The same type of operation is shown in Slide 7.6. The astronomic imaging sensors at times produce a very high density range that challenges the capabilities of film and certainly of monitors. On an 8-bit image we may not really appreciate the detail that a star may provide through a high resolution telescope. To do better justice to a high density range image a single pixel operation is applied that non-linearly transforms the input gray values into the output gray values. Again, in the narrow sense of our definition of filtering, this is a “filter operation”. However, we have previously discussed the same transformation under the name of contrast stretching. In this particular case, the contrast stretch is logarithmic. Prüfungsfragen: • In der Vorlesung wurden die Operationen Schwellwert“ und Median“, anzuwenden auf ” ” digitale Rasterbilder, besprochen. Welcher Zusammenhang besteht zwischen diesen beiden Operationen im Kontext der Filterung? 133 134 7.2 CHAPTER 7. FILTERING Low-Pass Filtering Let us define a mask of 3 by 3 pixels in Slide 7.8. We enter into that mask values w1 , w2 , . . . , w9 . We call those values w “weights”. We now place the 3 by 3 pixel mask on top of the input image which has gray values denoted as zi . Let us assume that we center the 3 by 3 mask over the pixel z5 so that w5 is on top of z5 . We can now compute a new gray value g5 as the sum of the products of the values wi and zi , wi · zi in accordance with slide. This describes an operation on an input image without specifying the values in the filter mask. We need to assign such values to the 3 by 3 mask: a low-pass filter is filled with a set of values shown in slide: in this example we assign the value 1/9. The sum of all values is 1. Similarly, a larger mask of 5 × 5 values may be filled with 1/25, a 7 × 7 filter mask with 1/49. The three examples are typical low-pass filters. Slide 7.10 illustrates the effect the low-pass filter masks, filled with weights of 1/k, with k being the number of pixels in the filter mask. Slide 7.10 shows the image of a light bulb and how the effect of low-pass filters increases the blur as the size of the filter mask increases from 25 via 125 to 625 values representing windows with side lengths of 5,1 and 25 pixels. We will next consider the analogy between “filtering” and “sampling”. Slide 7.11 shows an image and the gray value profile along a horizontal line (row of pixels). The continuous gray value trace needs to be “sampled” into discrete pixels. We show in Slide 7.12 the basic concept of a transition from the continuous gray value trace to a set of pixels. If we reconstruct the original trace from the discrete pixels, we will obtain a new version of the continuous gray value trace. If turns out that the reconstruction is nothing else but a filtered version of the original. Sampling and signal reconstruction are thus an analogy to filtering, and sampling theory is related to filter theory. Slide 7.13 illustrates one particular and important low-pass-filter: the sinc-filter . A sinc function is sin(πf ) sinc(f ) = πf and represents the Fourier-transform of a rectangular “pulse” in the Fourier-space (see below). Slide 7.13 illustrates how a filtered value of the input function is obtained from the large filter mask representing the sinc-function. By shifting the sinc-function along the abszissa and computing a filter value at each location, we obtain a smoothed version of the input signal. This is analogous to sampling the input signal and reconstructing it from the samples. Next we consider the median filter . This is a popular and frequently used operator. It inspects each gray value under the filter window and picks that gray value under that window which has half of the pixels with larger and the other half with smaller gray values. Essentially the gray values under the filter window are being sorted and the median value is chosen. Where would this be superior to an arithmetic mean? Clearly, the median filter does suppress high frequency information or rapid changes in the image. Thus it suppresses salt and pepper noise. Salt and pepper noise results from irregularities where individual pixels are corrupted. They might be either totally black or totally white. By applying a median filter one will throw out these individual pixels and replace them by one midrange pixel from the neighbourhood. The effect can sometimes be amazing. Slide 7.16 illustrates with a highly corrupted image of a female person, and a corruption of the image with about 20% of the pixels. Computing the arithmetic mean will produce a smoother image but will not do away with the effect of noise. Clusters of corrupted pixels will result in persistent corruptions of the image. However, the median filter will work a miracle. An image, almost as good as the input image, without many corruptions, will result. A median filter also has a limitation: If we have fine details in an image, say individual narrow linear features (an example would be telegraph wires in an aerial photo) then those pixels marking such a narrow object will typically get suppressed and replaced by the median value in their environment. As a result the fine linear detail would no longer show in the image. 7.2. LOW-PASS FILTERING 135 0 0 5 0 0 0 0 0 0 0 5 0 0 4 0 0 0 0 1 5 0 0 1 2 0 0 0 5 2 4 5 5 0 0 4 5 0 0 0 0 0 0 0 1 2 1 0 1 2 4 4 0 1 3 5 5 0 0 1 3 5 5 5 5 5 0 1 3 5 5 5 2 5 5 0 2 5 5 3 5 5 5 5 Figure 7.1: Anwendung eines Median-Filters Prüfungsfragen: • Gegeben sei Abbildung B.57 mit den angebenen linienhaften weißen Störungen. Welche Methode der Korrektur schlagen Sie vor, um diese Störungen zu entfernen? Ich bitte um die Darstellung der Methode und die Begründung, warum diese Methode die Störungen entfernen wird. • Was ist ein Medianfilter, was sind seine Eigenschaften, und in welchen Situationen wird er eingesetzt? • Wenden Sie ein 3 × 3-Median-Filter auf die Pixel innerhalb des fett umrandeten Bereiches des in Abbildung B.14 gezeigten Grauwertbildes an! Sie können das Ergebnis direkt in Abbildung B.14 eintragen. Antwort: Siehe Abbildung 7.1 • Skizzieren Sie die Form des Filterkerns eines Gaussschen Tiefpassfilters. Worauf muss man bei der Wahl der Filterparameter bzw. der Größe des Filterkerns achten? • Tragen Sie in die leeren Filtermasken in Abbildung B.30 jene Filterkoeffizienten ein, sodass 1. in Abbildung B.30(a) ein Tiefpassfilter entsteht, das den Gleichanteil des Bildsignals unverändert lässt, 2. in Abbildung B.30(b) ein Hochpassfilter entsteht, das den Gleichanteil des Bildsignals vollständig unterdrückt! Antwort: siehe Abbildung 7.3 (a) Tiefpass (b) Hochpass Figure 7.2: Tief- und Hochpassfilter 136 7.3 CHAPTER 7. FILTERING The Frequency Domain We have so far looked at images represented as gray values in an (x, y) cartesian coordinate system. We call this the spatial-domain representation. There is another representation of images using sinus- and cosinus-functions called spectral representation. The transformation of the spatialdomain image f (x, y) into a spectral-domain representation F (u, v) is via a Fourier-transform: Z Z F {f (x, y)} = F (u, v) = f (x, y)e−2jπ(ux+vy) dxdy x y The spectral representation is with the independent variables u, v which are the frequencies in the coordinate directions. The spectral representation can be converted back into a special representation by the inverse transform: Z Z f (x, y) = F (u, v)e−2jπ(ux+vy) dudv(???) u v RR PP In the discrete world of pixels, the double integral is replaced by a double summation . A filter operation can be seen as a convolution (Faltung) in accordance with Slide 7.18. The convolution is defined in nd graphically illustrated in through . In this case the two functions f (x) and g(x) are one-dimensional functions for simplicity. They are being convolved using an operation denoted by a symbol ∗: Z ∞ f (x) ∗ g(x) = f (t)g(x − t)dt t=−∞ We define the function f (t) as a simple rectangle on the interval 0 ≤ t ≤ 1. The second function g(t) is also defined in the same space as a box on the interval 0 ≤ t ≤ 1. We illustrate the function g at location −t and at x − t produce the product of f (t). g(x − t), and as shown in Slide 7.24 as the shaded area. We illustrate this at x = x1 , and x = x2 . The convolution now is the integral of all these areas as we move g(x − t) into the various positions along the axis x. When there is no overlap between the two functions the product f · g is empty. As a result, the integral produces values that increase monotonously from 0 to c and then decrease from c to 0 as the co-ordinate x goes from 0 through 1 to the value of 2. This produces a “smoothed” version of the input function f. It is now of interest to appreciate that a convolution in the spatial domain is a multiplication in the spectral domain. This was previously explained in Slide 7.18. We can thus execute a filter operation by transforming the input image f and the filter function f into the spectral domain, resulting in F and H. We multiply the two spectral representation, obtain G as the spectral representation of the output image. After of to a Fourier transform of G we have the viewable output image g. This would be the appropriate point in this course to interrupt the discussion of filtering and inserting a “tour d’horizon” of the Fourier-transform. We will not do this in this class, and reserve that discussion for a later course as part of the specialization track in “image processing”. However, a Fourier-transform of an image is but one of several transforms in image processing. There are others, such as the Hadamard-transform, a Cosine-transformation, Walsh-transforms and similar. Of interest now is the question of filtering in the spatial domain, representing a convolution, or in the spectral domain representing a multiplication. At this point we only state that with large filter masks at sizes greater than 15 x 15 pixels, it may be more efficient to use the spectral representation. We do have the cost of 3 Fourier transforms (note: f ??? F, h ??? H, G ??? G), but the actual convolution is being replaced by a simple multiplication of F???H. Slide 7.25 now introduces certain filter windows and their representation both in the spatial and spectral domains. presents a one-dimensional filter functions. We are, therefore, looking at a row 7.4. HIGH PASS-FILTER - SHARPENING FILTERS 137 of pixels in the spatial domain or a row of pixels in the spectral domain through the center of the 2D function. The 2D functions themselves are rotationally symmetric. A typical low-pass filter in the spatial domain will have a Gaussian shape. Its representation in the spatial domain is similar to its representation in the spectral domain. In the spectral domain it is evident that the filter rapidly approaches a zero-value, therefore suppressing higher frequencies. In the spectral domain a high-pass filter has a large value as frequencies increase and is zero at low frequencies. Such a high pass-filter looks like the so called mexican hat, if presented in the spatial domain. A band pass-filter in two dimensions is a ring like a “donut shape”, and in the one dimensional case it is a Gaussian curve that is displaced with the respect to the origin. In the spatial domain the band-pass filter-shape is similar to a “mexican hat”. However, the values in the high pass-filter are negative outside the central area in the spectral domain, whereas is in the band pass-filter the shape goes first negative, then positive again . Prüfungsfragen: • Beschreiben Sie anhand einer Skizze das Aussehen“ folgender Filtertypen im Frequenzbe” reich: 1. Tiefpassfilter 2. Hochpassfilter 3. Bandpassfilter 7.4 High Pass-Filter - Sharpening Filters We now are ready to visit the effect of a high-pass filter. In the spatial domain, the shape of the high pass-filter was presented in Slide 7.26. In actual numerical values such a filter is shown in Slide 7.28. The filter window is normalized such that the sum of all values equals zero. Note that we have a high positive value in the center and negative values at the edge of the window. The pixel at the center of the window in the image will be emphasized and the effect of neighbouring pixels reduced. Therefore small details will be accentuated. Background will be suppressed. It is as if we only had the high-frequency detail left and the low frequency variations disappear. The reason is obvious. In areas were pixel values don’t change very much, the output gray values will become 0, because there are no differences among the gray values. The input pixels will be replaced by the value 0 because we are subtracting from the gray value the average value of the surrounding pixels. This high-pass filter can be used to emphasize (highlight) the geometric detail. But if we do not want to suppress the background, as we have seen in the pure high pass-filter, we need to re-introduce it. This leads to a particular type of filter that is popular in the graphic arts: the unsharp masking or USM. The high pass-filtered image really is the difference between an original image and a low-pass version of the image, so that only the high frequency content survives. In the USM we would like to have the high-pass-version of the image augmented with the original image. We obtain this by means of a high pass-filter version of the image and adding to it the original image, however multiplied by a factor A − 1, where A > 1. If A = 1 we have a standard high pass-filter. As we increase A we add more and more of the original image back. The effect is shown in Slide 7.30. In that slide we have a 3 by 3 filter window and the factor A is shown variably as being 1.1, 1.15, and 1.2. As A increases, the original image gets more and more added back in, to a point where we get overwhelmed by the amount of very noisy detail. Prüfungsfragen: 138 CHAPTER 7. FILTERING • Gegeben sei eine Filtermaske entsprechend Abbildung ??. Um was für eine Art Filter handelt es sich hier? • Gegeben sei ein Bild nach Abbildung ??. Was sind die Ergebnispixel im Ergebnisbild an den markierten drei Orten nach Anwendung der Filtermaske aus Abbildung ??? • Eines der populärsten Filter heißt Unsharp Masking“ (USM). Wie funktioniert es? Ich bitte ” um eine einfache formelmäßige Erläuterung. • In Abbildung B.61 ist ein digitales Rasterbild gezeigt, das durch eine überlagerte Störung in der Mitte heller ist als am Rand. Geben Sie ein Verfahren an, das diese Störung entfernt! • Das in Abbildung B.66 gezeigte Foto ist kontrastarm und wirkt daher etwas flau“. ” 1. Geben Sie ein Verfahren an, das den Kontrast des Bildes verbessert. 2. Welche Möglichkeiten gibt es noch, die vom Menschen empfundene Qualität des Bildes zu verbessern? Wird durch diese Methoden auch der Informationsgehalt des Bildes vergrößert? Begründen Sie Ihre Antwort. • Tragen Sie in die leeren Filtermasken in Abbildung B.30 jene Filterkoeffizienten ein, sodass 1. in Abbildung B.30(a) ein Tiefpassfilter entsteht, das den Gleichanteil des Bildsignals unverändert lässt, 2. in Abbildung B.30(b) ein Hochpassfilter entsteht, das den Gleichanteil des Bildsignals vollständig unterdrückt! Antwort: siehe Abbildung 7.3 (a) Tiefpass (b) Hochpass Figure 7.3: Tief- und Hochpassfilter 7.5 The Derivative Filter A very basic image processing function is the creation of a so called edge-image. Recall that we had one definition of “edge” that related to the binary image early on in this class (Chapter 1). That definition of an edge will now be revisited and we will learn about a second definition of an edge. Let us first define what a gradient image is. We apply a gradient operator to the image function f (x, y). The gradient of f (x, y) is shown in Slide 7.32, denoted as ∇ (Nabla). A gradient is thus a 7.5. THE DERIVATIVE FILTER 139 multidimensional entity; in a two dimensional image we obtain a two dimensional gradient vector with a length and a direction. We now have to associate with each location x, y in the image these two entities. The length of the gradient vector is of course the Pythagorean sum of its elements, namely of the derivatives of the gray-value function with respect to x and y. We typically use the magnitude of the gradient vector and ignore it’s direction. However, this is not true in every instance. We are not dealing with continuous tone images but discrete renditions in the form of pixels and discrete matrices of numbers. We can approximate the computation of a gradient function by means of a three by three matrix as explains. The 3 × 3 matrix has nine values z1 , z2 , . . . , z9 . We approximate the derivative my means of a first difference, namely z5 − z8 , z5 − z6 , and so forth. The magnitude of the gradient function is being approximated by the expression shown in Slide 7.34. We define a way of computing the gradient in a discrete, sampled digital image avoiding squares and square roots. We can even further simplify the approximation as shown in Slide 7.34 namely as the sum of the absolute values of the differences between pixel gray values. We can also use gradient approximations by means of cross-differences, thus not by horizontal and vertical differences along rows and columns of pixels in the window. Some of these approximations are associated with their inventors. The gradient operator ∇f ≈ |z5 − z9 | + |z6 − z8 | is named after Roberts. Prewitt’s approximation is little more complicated: ∇f ≈ |(z7 + z8 + z9 ) − (z1 + z2 + z3 )| + |(z3 + z6 + z9 ) − (z1 + z4 + z7 )| Slide 7.36 computation is being implemented. Two filter functions are sequentially being applied to the input image, and the two resulting output images are being added up. The Roberts operates with two windows of dimensions 2 × 2. The case of Prewitt uses two windows with dimensions 3 × 3, and a third gradient approximation by Sobel also uses two 3 × 3 windows. Lets take a look at an example: Slide 7.37 shows a military fighter plane and the gradient image derived from it using the Prewitt-operator. These gradient images can then be post-processed by e.g. removing the background details, simply by reassigning gray values above a certain level to zero or one, or assign gradients of a certain value to a particular colour such as white or black. This will produce from an original image the contours of its objects, as seen Slide 7.37. We call the resulting image, after a gradient operator has been applied, an edge-image. However, in reality we don’t have any edges yet. We still have a gray-tone image that visually appears like as image of edges and contours. To convert this truly to an edge image we need to treshold the gradient image so that only the highest valued pixels get a value one and all lower value pixels are set to 0 (black) and are called “background”. This means that we have produced a binary image where the contours and edgy objects are marked as binary elements. We now need to remove the noise, for example in the form of single pixels using by a morphological filter. We also have to link up the individual edge pixels along the contours so that we obtain contour lines. Linking up these edges is an operation that has to do with “neighbourhoods” (Chapter 1) we also need to obtain skeletons and connected sequences of pixels as discussed previously (Chapter 3). Prüfungsfragen: • Definieren Sie den Sobel-Operator und wenden Sie ihn auf die Pixel innerhalb des fett umrandeten Bereiches des in Abbildung B.13 gezeigten Grauwertbildes an! Sie können das Ergebnis direkt in Abbildung B.13 eintragen. 140 CHAPTER 7. FILTERING 9 9 8 8 6 7 6 6 7 8 9 8 7 2 3 1 0 5 6 3 2 2 12 4 2 3 3 12 3 1 2 6 8 7 8 3 2 0 1 8 7 8 2 3 1 1 2 7 6 7 1 0 2 3 1 7 6 8 2 2 1 2 0 Figure 7.4: Roberts-Operator • Zu dem digitalen Rasterbild in Abbildung B.21 soll das Gradientenbild gefunden werden. Geben Sie einen dazu geeigneten Operator an und wenden Sie ihn auf die Pixel innerhalb des fett umrandeten Rechtecks an. Sie können das Ergebnis direkt in Abbildung B.21 eintragen. Führen Sie außerdem für eines der Pixel den Rechengang vor. • Wenden Sie auf den fett umrandeten Bereich in Abbildung B.34 den Roberts-Operator zur Kantendetektion an! Sie können das Ergebnis direkt in Abbildung B.34 eintragen. Antwort: Siehe Abbildung 7.4 7.6 Filtering in the Spectral Domain / Frequency Domain We define a filter function H(u, v) in the spectral domain as a rectangular function, a so-called box function. Multiplying the Fourier transform of an image by H(u, v) produces the spectral representation of the final image G(u, v) as a product of H and F. We have a transform-function of the shape of filter-function H as shown in Slide 7.39, that has the value 1 at the origin, and from the origin to a value D0 and we assume that H is rotationally symmetric. In the frequency domain the value D0 is denoted as a cut-of-frequency. Any frequency beyond D0 will not be permitted through the filter function. Let us take a look of how this works. In Slide 7.40 we have the image of the head of a bee course in the spectral domain we would not be able to judge what the image shows. We can create a spectral representation by applying a Fourier-transform to the image and we can now define circles in the spectral representation with the centre at the origin of the spectral domain and radius that contains 90%, 93% or more of the image frequencies, also denoted as the “energies”. Now if we apply a filter function H is shown before that will only let the frequencies pass through within 90% of the energy and than we transform the resulting function G back from this spectral into the spatial domain to obtain an image g we obtain a blurred version of the original image. As we let more frequencies go throw the blur will be less and less. What we have obtained is a series of low-pass filtered images of the head of the bee and we also have indicated how much of the image content we have filtered out and how much we have let go through the low-pass filter. If we transform the function H from the spectral domain into the spatial domain, we obtain Slide 7.41. If we apply this filter function to an image that contains nothing but 2 white points, we will obtain an image g that will appear corrupted, presenting us with a ghost image. We should therefore be careful with that type of box filter (in the spectral domain). The ghost images of high contrast content in our input image will be disturbing. It is advisable to not use such a box filter, which is sometimes also called ideal filter . We should use instead an approximation . We introduce the Butterworth filter as shown in . The Butterworth filter is represented in the spectral domain by a curve as shown in the slide: 1 H(u, v) = 1+ D(u,v) D0 2n 7.7. IMPROVING NOISY IMAGES 141 In two dimensions this is a volcano-like shape. Applying that type of filter now to the bee produces a series of low pass filtered images without ghost images . A straight example of the difference of a applying the box-filter as opposed to the Butterworth filter is shown in . Of course this entire discussion of spectral and spatial domains and of convolutions and filters requires space, time and effort and is related to a discussion of the Fourier transform, and the effect of these transforms and of the suppression of certain frequencies on the appearance of functions. Typically throughout an engineering program the signals are mostly one-dimensional, whereas in image processing the typical signals are two-dimensional. A quick view of Fourier transforms of certain functions illustrates some of what one needs to be aware off. Slide 7.47 presents one function F (u) in the spectral domain in the form of a rectangular pulse. Its transform into the spatial domain gives us the sinc-function, as previously discussed as f (x) = sin(πx)/(πx). Now if we cut off the extremities of f (x) and then transform that function back into the spectral space we obtain a so-called ringing of the function. Giving up therefore certain frequencies in the spectral domain can lead to a certain noisiness of the signal in the other domain. Prüfungsfragen: • Geben Sie die Transferfunktion H(u, v) im Frequenzbereich eines idealen Tiefpassfilters mit der cutoff“-Frequenz D0 an! Skizzieren Sie die Transferfunktion! ” 7.7 Improving Noisy Images There are many uses of filters. We have already found the use of filters to enhance edges, and pointed out that filters transform individual pixels. We may use filters also to remove problems in images. Let us assume that we have compressed an image from 8 bits to 4 bits and therefore have reduced a number of available grey values to 16. We have an example in Slide 7.49 where the low number of gray values creates artefacts in the image in the form of gray value contours. By applying a low-pass filter we can suppress the unpleasant appearance of false density contours. Another example also in Slide 7.49 is an image with some corruption by noise. A low pass filter will produce a new image that is smoother and therefore more pleasant to look at. Finally, we want to revisit the relationship between “filter” and “sampling”.. Slide 7.51 illustrates again the monkey-face: smoothing an image by a low-pass filter maybe equivalent to sampling the image, then reconstructing it from the samples. Prüfungsfragen: • Es besteht eine Analogie zwischen der Anwendung eines Filters und der Rekonstruktion einer diskretisierten Bildfunktion. Erklären Sie diese Behauptung! 7.8 The Ideal and the Butterworth High-Pass Filter ??? inspection we may want to use high-pass filters, because our eye likes the crisp, sharp edges and a high level of energy in an image. Slide 7.53 introduces such a high pass filter in the spectral domain. The ideal high-pass filter lets all high frequencies go through and supresses all low frequencies. This “ideal” filter has the same problems as we have seen in the low-pass case. Therefore we may prefer the Butterworth high-pass filter is not a box, but a monotonaes function. Of course in the 2-dimensional domain the ideal and Butterworth high-pass filters appear like a brick with a hole in it. Application of the high-pass filter is to enhance the contrast and bringing out the fine detail of the object as shown in . The high-pass filter improves the 142 CHAPTER 7. FILTERING appearance of the image, suppresses the background. If we addin the original image in a highpass filtered version we come again to a type of ”emphasis filter” that we have seen earlier under the name “unsharp masking” (USM). The resulting image can be processed into an equalized histogram for optimum visual inspection. Again high-pass filters can be studied in booth the spatial and the spectral domains . We have the sinc function in the spectral domain which represents in the spatial domain a box-function, a pulse. The sinc2 function is a triangular function in the spatial domain. And a Gaussian function will remain a Gaussian function both in the spectral and in the spatial domains. Prüfungsfragen: • Skizzieren Sie die Übertragungsfunktion eines idealen und eines Butterworth-Hochpassfilters und vergleichen Sie die Vor- und Nachteile beider Filtertypen! 7.9 7.9.1 Anti-Aliasing What is Aliasing ? Recall the rasterization or scan-conversion of straight lines and curves, and the resulting aliasing. Suppose we work with a trigonometric function of some sort. This function is being sampled at certain widely spaced intervals. Reconstruction of the function from samples will produce a particular function that’s not really there. What is shown in Slide 7.58 is a high frequency function, whereas the samples describe a low frequency sinus-curve. We denote the falsification of the original function into one of a different frequency with “aliasing”. This type of aliasing is a widely reviewed subject of sampling theory and signal processing and is not particular to image processing or graphics. Aliasing is a result of our need to sample continous functions, both in the creation of images and in the creation of visualizations of objects in computer graphics. 7.9.2 Aliasing by Cutting-off High Frequencies explains the issue further with an excursion into sampling theory. We have an input-image f (x) in the spatial domain that needs to be sampled. As we go into the spectral domain we cannot use all frequencies. We cut them off at w and we loose all frequencies outside the interval −w ≤ F (u) ≤ w. Let us now define a sampling function in the spatial domain as s(x), consisting of a series of Diracfunctions at an interval ∆x. The multiplication of f (x) with s (x) produces the sampled function in the spatial domain. As we go into the spectral domain we also obtain a set of discrete frequencies s(u) at 1/∆x, 2/∆x. If we now convolve (in the spectral domain) the sampling function S(u) with the original function F (u) we get the spectral view of the sampled function f (x) · s(x). We see the original function F (u) repeated at locations −1/∆x, +1/∆x, . . . Transforming this back into the spatial domain produces samples from which the original function f (x) can only be incompletely reconstructed. What is now the effect of changing ∆x? If we make it smaller we get a more accurate sampling of the input function in accurdance with . We see in the spectral domain that the repetitions of the original function F (u) in F (u) ∗ S(u) are spaced apart at wider intervals 1/∆x, as ∆x gets smaller. Slide 7.62 illustrates that we could isolate the spectrum of our function f (x) by multiplying F (x) ∗ S(u) by a box filter G(u), producing F (u) and we can fully reconstruct f (x) from the samples. If w was the smallest frequency in our function f (x) or F (u), then we have no 7.9. ANTI-ALIASING 143 less if the sampling interval ∆x is smaller than 1/2w: ∆x ≤ 1 2w Whittaker-Shannon theorem In turn we define a cut-off frequency w and denote it the Nyquist-frequency that is fully represented by a sampling interval ∆x if w = 1/(2∆x) Nyquist frequency. 7.9.3 Overcoming Aliasing with an Unweightable Area Approach Of course the implementation is again as smart as possible to avoid multiplications and divisions, and replaces them by simpler operations. The approach in Slide 7.63. Aliasing occurs if ∆x violates the Whittaker-Shannon theorem. Anti-Aliasing by means of a low-pass filter occurs in the rasterization or scan conversion of geometric elements in computer graphics. We have discussed this effect in the context of scan conversion by means of the Bresenham-approach. Slide 7.63 explains another view of the issue, using the scan-conversion of a straight line. We can assign grey values to those pixels that are being touched by the area representing the “thin-line”. This would produce a different approach from Bresenham because we are not starting out from a binary decision that certain pixels are in, all others are out: We instead select pixels that are “touched” by the straight line, and assign a brightness proportional to the area that the overlap takes up. 7.9.4 Overcoming Aliasing with a Weighted Area Approach Algorithm 21 Weighted Antialiasing 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: set currentX to x-value of start of line while currentX smaller than x-value of end of line do apply Bresenham’s Line Algorithm to get appropriate currentY -value consider three cones (each with diameter of 2 pixels and volume normalized to 1) erected over the grid positions (currentX, currentY + 1), (currentX, currentY ) and (currentX, currentY - 1 ) for all cones do determine the intersection of the cone’s base with the line calculate the volume above the intersection multiply the obtained volume with the desired gray value take the result and set it as the pixel’s gray value end for increase currentX end while In weighted area sampling we also decrease a pixel´s brightness as it has less overlap with the area of the “thin line”. But not all overlap areas are treated equal! We introduce a “distance” from the center of a pixel for the overlap area. With this basic idea in mind we can revisit the unweighted area sampling and treat all overlap areas equally, implementing a “box-filter” as shown in . Each overlap area is multiplied with the same value represented us the height of the box, normalized to 1. A weighted area sampling approach is shown in . The “base” of the filter (its support) is circular and larger than a pixel, typically with a diameter at 2x the pixel´s side length. The height of the filter come in such that its volume is 1. illustrates the effect that a maring small triangle would have on pixels as it moves across an image. The triangle is smaller than a pixel. 144 CHAPTER 7. FILTERING Getting Antialiased Lines by Means of the Gupta-Sproull approach. Algorithm 22 Gupta-Sproull-Antialiasing 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: dx := x2 − x1; dy := y2 − y1; d := 2*dy − dx; incrE := 2∗ dy; incrNE := 2 ∗ ( dy − dx); two v dx := 0; invDenom := 1/(2∗ Sqrt(dx ∗ dx + dy ∗ dy)); two dx invDenom := 2∗ dx ∗ invDenom; x := x1; y := y1; IntensifyPixel (x, y, 0); IntensifyPixel (x, y +1, two dx invDenom); IntensifyPixel (x, y −1, two dx invDenom); while x < x2 do if d < 0 then two v dx := d + dx; d := d + incrE; x := x + 1; else two v dx := d − dx; d := d + incrNE; x := x + 1; y := y + 1; end if IntensifyPixel (x, y, two v dx ∗ invDenom); IntensifyPixel (x, y +1, two dx invDenom-two v dx ∗ invDenom); IntensifyPixel (x, y −1, two dx invDenom-two v dx ∗ invDenom); end while intensity := Filter(Round(Abs(distance))); WritePixel (x, y, intensity); Using the weighted area method, we can pre-compute a table for lines at different distances from a pixel’s center. A line will typically intersect those cones centered on three pixels as shown in Slide 7.69, but it may intersect also only 2, maximally 5 such cones. The look-up table is filled with values, computed by using to the definitions of as a function F (D, t) of two variables: t as the line’s thickness and D as the distance from a pixel center. Gupta and Sproull, two early pioneers of computer graphics, introduced the table look up for a 4-bit display device. There are only 16 values of D needed since a 4-bit display only has 16 different gray values. The Bresenham method (the midpoint line algorithm) needs now to be modified to not only decide on the E or NE pixel, but we also need to assign a grey value. However, we not only set a grey value for the single pixels at E or NE, but also for its two neighbours above and below. Slide 7.70 illustrates how distance D is being computed using simple trigonometry: dx D = vp dx2 + dy 2 And we need two additional distances Dabove and Dbelow : Dabove dx = (1 − v) p dx2 + dy 2 (7.1) 7.9. ANTI-ALIASING 145 Dbelow dx = (1 + v) p dx2 + dy 2 (7.2) Prüfungsfragen: • Erklären Sie, unter welchen Umständen Aliasing“ auftritt und was man dagegen unterneh” men kann! • In Abbildung B.72 sehen Sie ein perspektivisch verzerrtes schachbrettartiges Muster. Erklären Sie, wie die Artefakte am oberen Bildrand zustandekommen, und beschreiben Sie eine Möglichkeit, deren Auftreten zu verhindern! 146 CHAPTER 7. FILTERING 7.9. ANTI-ALIASING 147 Slide 7.1 Slide 7.2 Slide 7.3 Slide 7.4 Slide 7.5 Slide 7.6 Slide 7.7 Slide 7.8 Slide 7.9 Slide 7.10 Slide 7.11 Slide 7.12 Slide 7.13 Slide 7.14 Slide 7.15 Slide 7.16 Slide 7.17 Slide 7.18 Slide 7.19 Slide 7.20 Slide 7.21 Slide 7.22 Slide 7.23 Slide 7.24 Slide 7.25 Slide 7.26 Slide 7.27 Slide 7.28 148 CHAPTER 7. FILTERING Slide 7.29 Slide 7.30 Slide 7.31 Slide 7.32 Slide 7.33 Slide 7.34 Slide 7.35 Slide 7.36 Slide 7.37 Slide 7.38 Slide 7.39 Slide 7.40 Slide 7.41 Slide 7.42 Slide 7.43 Slide 7.44 Slide 7.45 Slide 7.46 Slide 7.47 Slide 7.48 Slide 7.49 Slide 7.50 Slide 7.51 Slide 7.52 Slide 7.53 Slide 7.54 Slide 7.55 Slide 7.56 7.9. ANTI-ALIASING 149 Slide 7.57 Slide 7.58 Slide 7.59 Slide 7.60 Slide 7.61 Slide 7.62 Slide 7.63 Slide 7.64 Slide 7.65 Slide 7.66 Slide 7.67 Slide 7.68 Slide 7.69 Slide 7.70 150 CHAPTER 7. FILTERING Chapter 8 Texture 8.1 Description Texture is an important subject in the analysis of natural images of our environment and in the computer generation of images if we want to achieve photo-realism. Slide 8.3 illustrates three different sets of textures. The first maybe of pebbles on the ground, the second of a quarry to mine stones and the third is a texture of fabric. We can describe texture (a) pictorially be means of the photograph of the surface (b) or by a set of mathematical methods: these may be statistical, structural or spectral. And finally we will present a procedural approach to modeling and using texture. Prüfungsfragen: • Nennen Sie drei Arten der Texturbeschreibung und führen Sie zu jeder ein Beispiel an. 8.2 A Statistical Description of Texture Recall the image function as z = f (x, y) with the image gray values z. We can compute so-called moments of the image gray values as shown in Slide 8.7. The moments are denoted as µn (z). The first moment, µ1 (z) is the mean of the gray values. The second moment m2 (z) represents the variance of the gray values with respect to the mean, see definition 8.2 . Moments include the probability of the gray value p(z). In a discrete context probability is represented by the histogram of the gray values. Obviously if a gray value is very unlikely to occur, its column in the histogram will be very low or empty. The measure of texture can be a function of these moments. A very simple one is the value R . If there is no variation in gray value then its variance σ 2 or its standard deviation σ or its second moment µ2 (z) is 0 or close to 0. In that case the value of R is 0 as well. R therefore represents a measure of the smoothness of the image. We can associate a separate value R with each pixel i by computing it for a window around that pixel i. There are other statistical measures of texture, for example associated with the “edginess” of an area. In this case we would produce an edge value associated with each pixel, for example representing the number, direction and strength of the edges in small windows surrounding a pixel. Nominally we obtain a different texture parameter at each pixel. However, we are looking to describe an extended image by regions of similar texture. Therefore we will classify the texture parameters into a few groups. We may create an equidensity image as discussed previously. If 151 152 CHAPTER 8. TEXTURE µn (z) = X [(zi − m)n ∗ p(zi )] z ... the grayvalue Image, zi the gray value of the i-th pixel in the Image m ... mean value of z ( average intensity ) µn ... n-th moment of z about the mean R = 1 − 1/(1 + σ 2 (z)) R ... a measure of the relative smoothness σ 2 ... variance we do that, we might be able to describe a quarry with a only two texture parameters as shown in Slide 8.8 and . While the quarry itself has been delineated manually by a human operator, a texture parameter is computed within this delineated polygon and the equidensity method applied to the texture parameter will define two different textures in this quarry. A very frequently used texture measure is the so called co-occurrence-matrix which is seeking to describe the occurrence of similar patterns in an image. We are not discussing this in this context other than to mention the name. Prüfungsfragen: • Welche statistischen Eigenschaften können zur Beschreibung von Textur herangezogen werden? Erläutern Sie die Bedeutung dieser Eigenschaften im Zusammenhang mit Texturbildern! 8.3 Structural Methods of Describing Texture In order to understand the concept of a structural texture description we refer to Slide 8.11. We define a rule that replaces a small window in a image by a pattern, for example we replace a window by a pattern “aS” and “a” may represent a circle. If we now apply the same operation multiple times, we do get an arrangement of repetitive patterns located adjacent to one another in a row and column pattern “a a a S”. We might denote the neighbourhood relationship between adjacent areas by different symbols. Slide 8.12 below the current location, “c” to the left. We can now describe a certain pattern by a certain sequence of “a”, “b” and “c” operations. A texture primitive which is shown here as a circle, could be any kind of other pattern. We set up our texture by repeating the pattern. Note again that a this point we are concerned with describing texture as we find it in a natural image. We are not, at this time, generating a texture for an object that we want to visualise. Prüfungsfragen: • Erläutern Sie die strukturelle Methode der Texturbeschreibung! 8.4 Spectral Representation of Texture We have previously discussed the technical viability to represent an image in a computer. In the spatial domain we are using the rows and colums of pixels, in the spectral domain we are 8.5. TEXTURE APPLIED TO VISUALISATION 153 using the frequencies. The description of texture using the spectral representation of an image is therefore described next. Slide 8.14 illustrates a typical texture pattern. Its spectral representation illustrates that there are distinct patterns that are repeated in the image. These are illustrated by dominant frequencies in the image. We call the two dimensional function in the spatial domain the spectral function s(r, j), where r is the radius of a certain spectral location from the origin and j is the angle from the axis x in a counter-clock wise direction. Any location in the spectral representation of the image therefore has the coordinates r, j. Slide 8.15 explains this further. We can simplify the spectral representation of the image into two functions. One functions is a plot of the angle j as a function of r and the other is a plot of the function r for a given value of j. Slide 8.16 illustrates two different patterns of textures and the manifestation of those patterns in the j-curve. A texture parameter can now be extracted from the spectral representation, for example by counting the number of peaks or the average distance between the peaks in the spectral domain. We can also set up a texture vector with several values that consider the number of peaks as a function of the radius r. The aim is to associate with a pixel or a window in the image a simple number or vector that is indicative of the type of texture one finds there. Therefore we have here a case of classification where we could take a set of know textures and create from those a feature space in two or more dimensions (see Chapter 14). If we now have an unknown texture we might try to describe this in terms of the known textures using the feature space and looking for the nearest texture that we can find given the texture numbers of the unknown texture. In this manner we can replace an input image by a texture image which indicates at each location the kind of texture which exists there. In classifying areas of similar texture as one area we will replace a large number of pixels by a small numbers of textures and a description of the contour of an area of uniform texture. Prüfungsfragen: • Welche Eigenschaften weist eine (sich regelmäßig wiederholende) Textur im Spektralraum auf? Welche Aussagen können über eine Textur anhand ihres Spektrums gemacht werden? • Das digitalen Rasterbild aus Abbildung B.71 soll segmentiert werden, wobei die beiden Gebäude den Vordergrund und der Himmel den Hintergrund bilden. Da sich die Histogramme von Vorder- und Hintergrund stark überlappen, kann eine einfache Grauwertsegmentierung hier nicht erfolgreich sein. Welche anderen Bildeigenschaften kann man verwenden, um dennoch Vorder- und Hintergrund in Abbildung B.71 unterscheiden zu können? 8.5 Texture Applied to Visualisation To achieve photorealism in the visualisation of two- or three-dimensional objects we employ descriptions of texture rather than texture itself. We may apply artificial texture, also denoted as synthetic texture and place this on the geometric polygons describing the surface shape of an object. The texture itself may consist of texture elements which are also denoted as texels. Slide 8.19 is an example of some simple objects showing a wire-frame rendering of an indoor scene and illustrates how unrealistic this type of representation appears. Slide 8.20 is a result of placing photographic texture on top of the objects. We obtain a photorealistic representation of those objects. Basic concepts in Slide 8.21 are illustrated by various examples of a two-dimensional flag, a three-dimensional indoor-scene, a two dimensional representation of symbols, phototexture of wood, texture of some hand-written material. How is this photographic texture applied to a geometrically complex object? This is illustrated in Slide 8.22, see also 23. We deal with three different coordinate systems. At first we have the representation on a monitor or display medium. And a window on this display contains a segment 154 CHAPTER 8. TEXTURE Algorithm 23 Texture mapping 1: 2: 3: 4: 5: 6: 7: 8: surround the object with a virtual cylinder for all pixels of the texture do make a coordinate transformation from carthesian to cylindric coordinates {to wrap the texture on the cylinders surface} end for for all points of the object do project the point perpendicularly from the midpoint of the cylinder to the cylinders surface where the projection cuts the edge of the object, assign the object point the color of the corresponding cylinder point end for of a three dimensional object which is represented in a world-coordinate system. The surface of this object needs to be photo-textured and it receives that photo-texture from a texture map with its third coordinate system. Essentially we a projecting the texture map onto the curved surface of an object and than render the curved surface on the display medium using a transformation that results from a synthetic camera and with a camera pose consisting of attitude and angle orientation. Prüfungsfragen: • Erklären Sie, wie in der Visualisierung die Qualität eines vom Computer erzeugten Bildes durch den Einsatz von Texturen verbessert werden kann. Nennen Sie einige Oberflächeneigenschaften (insbesondere geometrische), die sich nicht zur Repräsentation mit Hilfe einer Textur eignen. • In Aufgabe B.1 wurde nach geometrischen Oberflächeneigenschaften gefragt, die sich nicht zur Visualisierung mittels Textur eignen. Nehmen Sie an, man würde für die Darstellung solcher Eigenschaften eine Textur unsachgemäß einsetzen. Welche Artefakte sind für solche Fälle typisch? 8.6 Bump Mapping In order to provide a realistic appearance to a surface which is not smooth but bumpy, there exists a concept called bump-mapping. This applies a two dimensional texture to a three-dimensional object and making the two dimensional texture appear as if it were three dimensional. Slide 8.24 and Slide ?? explain the concept with a donut and a strawberry. Note that the texture really is two dimensional. The third dimension is introduced by some 2D-picture of a shadow and detail that is not available in the third dimension. This is visible in the contours of the object where the bumps on the texture are not reflected in the geometry of the object. In this case we do not apply the photographic texture we did use in the previous chapter, but we deal with a computed texture. Prüfungsfragen: • Was beschreibt der Begriff Bump-Mapping“? ” • In Abbildung B.77 ist ein Torus mit strukturierter Oberfläche gezeigt, wobei sich die Lichtquelle einmal links (Abbildung B.77(a)) und einmal rechts (Abbildung B.77(b)) vom Objekt befindet. Zur Verdeutlichung sind in den Abbildungen B.77(c) und B.77(d) vergrößerte Ausschnitte dargestellt. Welche Technik wurde zur Visualisierung der Oberflächenstruktur eingesetzt, und was sind die typischen Eigenschaften, anhand derer man das Verfahren hier erkennen kann? 8.7. 3D TEXTURE 8.7 155 3D Texture Another concept of texture is three dimensional. In this case we do not texture a surface but an entire three dimensional body. An example is shown in Slide 8.27 where the surface results from the intersection of the three dimensional texture body with the surface geometry. Prüfungsfragen: • Was ist eine 3D Textur“? ” 8.8 A Review of Texture Concepts by Example Slide 8.29 illustrates from an animated movie an example of the complexities of applying photorealistic textures to three dimensional objects. We begin with basic shapes of the geometric entity and apply to it some basic colours. We superimpose on these colours an environment map. This is again modified by a bump map and the appropriate illumination effect. The intermediate result is shown in Slide 8.31 adding dirt specks for additional realism. Slide 8.32 adds further details: We want to add details by creating a near-photographic texture, by adding more colour, the effect of water troplets, mirror and spectral reflections. We should not be surprised that the creation of such animated scenes consumes growing computing power and therefore takes time the complete. The final result is in Slide 8.33. 8.9 Modeling Texture: Procedural Approach As previously discussed we process natural images to find a model of texture and we use those models to create images. Slide 8.35 details the method of analysing existing texture. We have a real scene of an environment and we do understand from the image model that the intensity in the image is a function of the material property fr and the illumination Ei . Material property is unknown and needs to be determined from the raster image. Illumination is known. We estimate the model parameters for the material property and we use it to approximate objects. The photo texture use this and the virtual scene with the unknown material and illumination properties to compute the density per pixel and thereby obtain a synthetic image. An issue is now to find a method of model an unknown texture by simple curves. Slide 8.36 explains how a reference surface, a light source, a camera and a texture to be analysed can be set up into a sensor system. The resulting image is illustrated in Slide 8.37 with the known reference surface and the unknown texture. We need to have the reference texture so that we can calibrate the differences in illumination. As seen in the previous slide we have an image of texture and an effect of illumination, particularly we may have mirror or specular reflection. We do not discuss models for reflection at this time but just show a given model in Slide 8.38 for illustration purposes. We have for each pixel a known gray value f and we know the angle Qi under which a pixel is being illuminated and how the reflection occurs . We will discuss the parameters of the illumination model in Chapter ??. We need to compute the parameters of the reflections that are marked. In Slide 8.39 we study a particular column of pixels that represent a gray value curve of an unknown photo texture. The question is: what is “texture” here ? Slide 8.40 explains. We do have the actual brightness along the row of pixels plotted and we model the change of brightness as a function of the illumination with an average that we can calibrate with our reference pattern. The deviation from the average is then the actual texture in the form of an irregular signal. We 156 CHAPTER 8. TEXTURE now need to describe that signal statistically by means of a few simple numbers. How to do this is a topic of “statistical signal analysis”, for example in a spectral representation of the signal as previously discussed in section 8.4. Let us review the basic idea in a different way. We have an image of a surface and we can take a little window for analysis. We can create a texture surface by projecting that window multiple times onto the surface and we may obtain in the process some type of “tiling effect”. The procedural texture discussed before will model the surface texture by mathematics and avoid the seaming effect of the individual tiles. We can create any kind of shapes in our synthetic surface as shown in Slide 8.43. We can illustrate in Slide 8.44 that those shapes can be fairly complex even in three dimensions. Prüfungsfragen: • Was versteht man unter prozeduralen Texturen“, wie werden sie erzeugt und welche Vorteile ” bringt ihr Einsatz? 8.9. MODELING TEXTURE: PROCEDURAL APPROACH 157 158 CHAPTER 8. TEXTURE Slide 8.1 Slide 8.2 Slide 8.3 Slide 8.4 Slide 8.5 Slide 8.6 Slide 8.7 Slide 8.8 Slide 8.9 Slide 8.10 Slide 8.11 Slide 8.12 Slide 8.13 Slide 8.14 Slide 8.15 Slide 8.16 Slide 8.17 Slide 8.18 Slide 8.19 Slide 8.20 Slide 8.21 Slide 8.22 Slide 8.23 Slide 8.24 Slide 8.25 Slide 8.26 Slide 8.27 Slide 8.28 8.9. MODELING TEXTURE: PROCEDURAL APPROACH 159 Slide 8.29 Slide 8.30 Slide 8.31 Slide 8.32 Slide 8.33 Slide 8.34 Slide 8.35 Slide 8.36 Slide 8.37 Slide 8.38 Slide 8.39 Slide 8.40 Slide 8.41 Slide 8.42 Slide 8.43 Slide 8.44 160 CHAPTER 8. TEXTURE Chapter 9 Transformations 9.1 About Geometric Transformations We will discuss in this chapter the transformation of objects in a fixed coordinate system, the change of coordinate systems with a fixed object, the deformation of objects, so that from an input object the geometrically changed output object results, we will discuss projections of the 3D-world into a 2D-display plane and finally we will discuss under the heading of “transformations” the change in representation of an object if we approximate it by simple functions and we denote this as approximation and interpolation. Geometric transformations apply when objects move in a fixed world coordinate system, but they also apply when we need to look at objects and have to create images or use images of objects to reconstruct them. In that case we need to understand the projection of the object into an image or display medium. A very important application of geometric transformation is in robotics. This is can be unrelated to the processing of digital visual information, but employ the same sets of formule and ideologies of “transformation”. A simple robot may have associated with it numerous coordinate systems which are attached to its rigid elements. Each coordinate system is related to each other coordinate system by a coordinate transformation. Slide 9.3 and Slide 9.4 explain how a world coordinate system is home to the robot’s body which in turn is the reference for the robot arm. The arm holds the hand, the hand holds the fingers and the fingers seek to relate to an object or box which itself is presented in the world coordinate system. Slide 9.4 illustrates these six coordinate systems in a simplified presentation in two dimensions. Our interest is in geometric transformations concerning the use of imagery. Slide 9.5 illustrates an early video image of the surface of planet Mercury, from NASA’s Mariner mission in the mid 1960’s. We do need to relate each image to images taken from other orbits, and we need to place each image into a coordinate reference frame that is defined by meridians, the equator and poles of the planet. Slide 9.6 represents a geometric rectification of the previous image. The transformation is into a Mercator or Stereographic projection. We can see the geometric correction of the image if we note that the craters which were of elliptical shape in the original image, now approximate circles, as they would appear from an overhead-view straight down. Such a view is also denoted as an orthographic projection. 9.2 Problem of a Geometric Transformation The geometric transformation applies typically to a 2-dimensional space in the plane, the 3dimensional space as in the natural human environment and more generally to n-dimensional space. 161 162 CHAPTER 9. TRANSFORMATIONS In the processing of digital visual information, most of our geometric transformations address the 3-dimensional space of our environment and the 2-dimensional space of a display medium. Slide 9.8 illustrates the transformation of objects in a rigid coordinate system (x, y) in a 2-dimensional space. We have in this example 2 objects, 1 and 2, before the transformation, and 1’, 2’ after the transformation. The general model of a rigid body transformation in 2-D the space is shown inSlide 9.9: the equation takes the input (x, y) coordinates and produces from them the output x0 , y 0 coordinates using transformation parameters a0 , a1 , a2 and b0 , b1 , b2 . Slide 9.10 illustrates the usefulness of this formulation, if we have given objects before and after the transformation and we need to determine (“estimate”) the unknown parameters of the transformation. Given are therefore: x1 , y1 , x2 , y2 , x01 , y10 , x02 , and y20 , and we seek to compute a0 , a1 , a2 , b0 , b1 , b2 . We may also know the transformation parameters and need to compute for each given input coordinate pair (x, y) its associated output coordinate pair x0 , y 0 as illustrated in Slide 9.11. This concludes the introduction of the basic ideas of transformations using the example of 2 dimensional space. 9.3 Analysis of a Geometric Transformation We will use the example of a 2-dimensional object that is transformed in 2D-space under a socalled conformal transformation which does not change the angles of the object. The following illustrates in Slide 9.13, Slide 9.14, Slide 9.15 through Slide 9.16 the elements from which a geometric transformation in 2D-space is assembled. A very basic element of a transformation always is the translation. We add to each pair (x, y) an object’s translational component tx and ty to produce the output coordinates x0 , y 0 . Definition 18 Conformal transformation x0 y0 = s · cos(α) · x − s · sin(α) · y + tx = s · sin(α) · x + s · cos(α) · y + ty A second important transformational element is scaling. An object gets reduced or enlarged by a scale factor s ( see definition 18 ), and more generally we might use 2 different scale factors in the x coordinate direction denoted sx and in the y coordinate direction denoted sy . As a result, we may obtain a squished thus deformed object. We call a deformation by means of 2 different scale factors an affine deformation and will discuss this later. Finally we have rotations and rotate an object by an angle α. The transformation equation representing the rotation is shown in Slide 9.15. For a rotation we need a point around which we rotate the object. Normally this is the origin of the coordinate system. The general expression for a rotation using a rotation angle α produces the output x0 , y 0 coordinates from the input (x, y) coordinates by multiplying those coordinates with cos α and sinα in accordance with Slide 9.16. This can also be presented in matrix notation, resulting in the expression p0 = R · p, and we call R the rotation matrix. What makes now a transformation in 2D-space specifically a conformal transformation? We already stated that this does not change any angles. Obviously this requires that our body not be changed in shape. Instead it may be enlarged or reduced, it may be translated and it may be rotated, but right angles before the transformation will be right angles after the transformation as well. Slide 9.17 explains that we combine the three elements of the 2D-transformation that we denoted as scaling by factor s, rotating by angle α and translating by the translation elements tx 9.3. ANALYSIS OF A GEOMETRIC TRANSFORMATION 163 and ty . We call this a four parameter transformation since we have four independent elements of the transformation: s, α, tx , ty . In matrix notation this transformation is x0 = s · Rx + t, and s · R can be replaced by the transformation matrix M. We have described a transformation by means of Cartesian coordinates (x, y). One could use polar coordinates (r, φ). A point with coordinates (x, y) receives the coordinates (r, φ). A rotation becomes a very simple operation, changing the angle φ by the rotation angle ω. The relationships between (x, y) and (r, φ) are fairly obvious: x = r cos φ, y = r sin φ. A rotated point p0 will have the coordinates r cos(φ + ω) and r sin(φ + ω). When performing a transformation we may have a fixed coordinate system and rotate the object or we may have a fixed object and rotate the coordinate system. In Slide 9.20 we explain how a point p with coordinates (x, y) obtains coordinates (X, Y ) as a result of rotating the coordinate system by an angle α. Note that the angle α is the angle subtended between the input and output axes. We can therefore interpret that the rotation matrix is not only filled with the elements cos α, sin α, but we can interpret the rotation matrix to be filled with the angle subtended between the rotation axes before and after rotation and we have the angles xX, xY , yX, yY and have them all enter the rotation matrix with a cos(xX), cos(xY ) etc. We have thus found in Slide 9.20 a second definition for the contents of the rotation matrix: first was the interpretation of R with cos α and sin α of the rotation angle α. The second now is, that the elements of the rotation matrix are the cosinus of the angles subtended by the input and output coordinates. Prüfungsfragen: • In Abbildung B.12 ist ein Objekt A gezeigt, das durch eine lineare Transformation M in das Objekt B übergeführt wird. Geben Sie (für homogene Koordinaten) die 3 × 3-Matrix M an, die diese Transformation beschreibt (zwei verschiedene Lösungen)! Antwort: Zwei verschiedene Lösungen ergeben sich, weil das Objekt symmetrisch ist und um die y-Achse gespiegelt werden kann, ohne verändert zu werden. 2 0 4 M1 = 0 0.5 3 0 0 1 −2 0 12 M2 = 0 0.5 3 0 0 1 • Berechnen Sie jene Transformationsmatrix M, die eine Rotation um 45◦ im Gegenuhrzeiger√ sinn um den Punkt R = (3, 2)T und zugleich eine Skalierung mit dem Faktor 2 bewirkt (wie in Abbildung B.27 veranschaulicht). Geben Sie M für homogene Koordinaten in zwei Dimensionen an (also eine 3 × 3-Matrix), sodass ein Punkt p gemäß p0 = Mp in den Punkt p0 übergeführt wird. Hinweis: Sie ersparen sich viel Rechen- und Schreibarbeit, wenn Sie das Assoziativgesetz für die Matrixmultiplikation geeignet anwenden. Antwort: √ M = T(3, 2) · S( 2) · R(45◦ ) · T(−3, −2) 164 CHAPTER 9. TRANSFORMATIONS 1 = 0 0 1 = 0 0 1 = 0 0 0 1 0 0 1 0 0 1 0 3 2 · 1 3 2 · 1 3 2 · 1 √ 2 √0 0 cos 45◦ 0 2 0 · sin 45◦ 0 0 0 1 1 −1 0 1 0 −3 1 1 0 · 0 1 −2 0 0 1 0 0 1 1 −1 −1 1 −1 1 1 −5 = 1 1 0 0 1 0 0 − sin 45◦ cos 45◦ 0 0 1 0 · 0 1 0 0 −3 1 −2 0 1 2 −3 1 • Im praktischen Teil der Prüfung wird bei Aufgabe B.2 nach einer Transformationsmatrix (in zwei Dimensionen) gefragt, die sich aus einer Skalierung und einer Rotation um ein beliebiges Rotationszentrum zusammensetzt. Wie viele Freiheitsgrade hat eine solche Transformation? Begründen Sie Ihre Antwort! Antwort: Rotationszentrum (rx , ry ), Rotationswinkel (ϕ) und Skalierungsfaktor (s) ergeben vier Freiheitsgrade. • Gegeben sei ein zweidimensionales Objekt, dessen Schwerpunkt im Koordinatenursprung liegt. Es sollen nun gleichzeitig“ eine Translation T und eine Skalierung S angewandt ” werden, wobei 1 0 tx s 0 0 T = 0 1 ty , S = 0 s 0 . 0 0 1 0 0 1 Nach der Tranformation soll das Objekt gemäß S vergrößert erscheinen, und der Schwerpunkt soll gemäß T verschoben worden sein. Gesucht ist nun eine Matrix M, die einen Punkt p des Objekts gemäß obiger Vorschrift in einen Punkt p0 = M · p des transformierten Objekts überführt. Welche ist die richtige Lösung: 1. M = T · S 2. M = S · T Begründen Sie Ihre Antwort und geben Sie M an! Antwort: Antwort 1 ist richtig, da durch die Skalierung der Schwerpunkt genau dann unverändert bleibt, wenn er im Koordinatenursprung liegt. Die anschließende Translation verschiebt das Objekt (und damit den Schwerpunkt) an die gewünschte Position. Es ist also 1 0 tx s 0 0 s 0 tx M = T · S = 0 1 ty · 0 s 0 = 0 s ty 0 0 1 0 0 1 0 0 1 • Gegeben seien eine 3 × 3-Transformationsmatrix 3 4 2 M = −4 3 1 0 0 1 sowie drei Punkte a = b = c = (2, 0)T , (0, 1)T , (0, 0)T 9.4. DISCUSSING THE ROTATION MATRIX IN TWO DIMENSIONS 165 im zweidimensionalen Raum. Die Matrix M beschreibt in homogenen Koordinaten eine konforme Transformation, wobei ein Punkt p gemäß p0 = Mp in einen Punkt p0 übergeführt wird. Die Punkte a, b und c bilden ein rechtwinkeliges Dreieck, d.h. die Strecken ac und bc stehen normal aufeinander. 1. Berechnen Sie a0 , b0 und c0 durch Anwendung der durch M beschriebenen Transformation auf die Punkte a, b und c! 2. Da M eine konforme Transformation beschreibt, müssen auch die Punkte a0 , b0 und c0 ein rechtwinkeliges Dreieck bilden. Zeigen Sie, dass dies hier tatsächlich der Fall ist! (Hinweis: es genügt zu zeigen, dass die Strecken a0 c0 und b0 c0 normal aufeinander stehen.) Antwort: 1. a0 b0 c0 = (8, −7)T = (6, 4)T = (2, 1)T 2. a0 − c0 = (6, −8)T b0 − c0 = (4, 3)T (a0 − c0 ) · (b0 − c0 ) = 6 · 4 + (−8) · 3 = 0 9.4 Discussing the Rotation Matrix in two Dimensions A rotation matrix R is filled with four elements it if concerns rotations in two dimensions. Definition 19 Rotation in 2D x0 y0 = x · cos θ − y · sin θ = x · sin θ + y · cos θ written in matrix-form: x0 y0 R cos θ − sin θ sin θ cos θ x = R· y = As shown in 9.1 two elements can be combined into a unit vector, namely unit vectors i and j. The rotation matrix R consists of i, j, which are the unit vectors in the direction of the rotated coordinate system. We can show that the rotation matrix has some interesting properties, namely that the multiplications of the unit vectors with themselves are 1, and that the cross-products of the unit vectors are zero. ( see also Slide 9.22 ) 166 CHAPTER 9. TRANSFORMATIONS Definition 20 2D rotation matrix A point of an object is rotated about the origin by multiplying it with a so called rotation matrix. When dealing with rotations in two dimensions the rotation matrix R consists of four elements. These elements can be combined into two unit vectors i and j. i x0 y0 R = cos α sin α , − sin α cos α j= cos θ − sin θ = = (i, j) sin θ cos θ x = R· y Starting from a given coordinate system with axes X and Y the vectors i and j correspond to the unit vectors in the direction of the rotated coordinate system (see Figure 9.1). Figure 9.1: rotated coordinate system We have now found a third definition of the rotation matrix element, namely the unit vectors along the axes of the rotated coordinate system as expressed in the input coordinate system. Slide 9.23 summarizes the 3 interpretations of the elements of a rotation matrix. Let’s take a look at the inverse of a rotation matrix. Note that if we premultiply a rotation matrix by its inverse we get the unit matrix (obviously). But we also learn very quickly, that premultiplying the rotation matrix with the transposed of the rotation matrix also produces the unit vector, which very quickly proves to us in accordance with Slide 9.24 that the inverse of a rotation matrix is nothing else but the transposed rotation matrix. We now take a look at the forward and backward rotation. Suppose we have rotated a coordinate system denoted by x into a new coordinate system of X. If we now premultiply the new coordinate system with the transposed rotation matrix, we obtain the inverse relationship and see that we obtain, in accordance with Slide 9.25, the original input coordinates. Therefore we know that the transposed of a rotation matrix serves to rotate back the rotated coordinate system into its input state. Let’s now take a look at multiple sequential rotations. We first rotate input coordinates x into output coordinates x1 and then we rotate the output coordinates x1 further into coordinates x2 . 9.5. THE AFFINE TRANSFORMATION IN 2 DIMENSIONS 167 We see very quickly that x2 is obtained from the product of two rotation matrixes R1 and R2 . However, it is also very quickly evident that multiplying two rotation matrixes produces nothing else but a third rotation matrix. Definition 21 Sequenced rotations x1 x2 x2 R = = = = R1 x R2 x1 R2 R1 x = Rx R2 R1 It is important, however, to realize that matrix multiplications are not commutative: R2 · R1 is not necessarily identical to R1 · R2 ! Prüfungsfragen: • In der Vorlesung wurde darauf hingewiesen, dass die Matrixmultiplikation im Allgemeinen nicht kommutativ ist, d.h. für zwei Transformationsmatrizen M1 und M2 gilt M1 ·M2 6= M2 · M1 . Betrachtet man hingegen im zweidimensionalen Fall zwei 2 × 2-Rotationsmatrizen R1 und R2 , so gilt sehr wohl R1 ·R2 = R2 ·R1 . Geben Sie eine geometrische oder mathematische Begründung für diesen Sachverhalt an! Hinweis: Beachten Sie, dass das Rotationszentrum im Koordinatenursprung liegt! Antwort: Bei der Drehung um eine fixe Rotationsachse addieren sich die Rotationswinkel, die Reihenfolge der Rotationen spielt daher keine Rolle. 9.5 The Affine Transformation in 2 Dimensions Slide 9.28 is an example of an Affine Transformation created with the help of a letter “F”. We see a shearing effect as a characteristic feature of an Affine Transformation. Similarly, Slide 9.29 illustrates how a unit square will be deformed for example by squishing it only along the axis x but not along the axis y, or by shearing the square in one direction or in the other direction. All these are effects of an Affine Transformation. Slide 9.30 provides us with the equation for a general Affine Transformation in 2 dimensions. We see that this is a six parameter transformation, defined by transformation parameters a, b, c, d, tx , ty . We may again ask the question of estimating the unknown transformation parameters if we have given a number of points both before and after the transformation. Question: How many points do we need at a minimum to be able to solve for the unknown six transformation parameters. Obviously we need three points, because each point provides us with two equations, so that three points provide us with six equations suitable of solving for the six unknown equation parameters. But be aware: those three points cannot be colinear! Let us now analyze the elements of an Affine Transformation and let us take a look at Slide 9.32, Slide 9.33 however recalling what we saw from Slide 9.13 to Slide 9.15. First, we see a scaling of the input coordinates, in this case denoted as px and py , independently by scaling factors sx and sy to obtain output coordinates qx and qy . We can denote the scaling operations by means of a 2 × 2 scaling matrix Msc as shown in Definition ??. Secondly, we have a shearing deformation which adds to each coordinate x an increment that is proportional to y and we add in y an augmentation that is proportional to the x coordinate using a 168 CHAPTER 9. TRANSFORMATIONS proportionality factor g. That shearing transformation can be described by a matrix Msh shearing (see Definition ?? ). Thirdly, we can introduce a translation adding to each x and y coordinate the translational element tx and ty ( see Definition ?? ). Finally, we can rotate the entire object identical to the rotation that we saw earlier using a rotation angle α and producing a rotation matrix MR ( see Chapter 9.4 ). An Affine Transformation is now the sum total of the transformations, thus the product of three transformations: Msc for scale, Msh for shearing and MR for rotation and adding on the translation as discussed previously. Slide 9.34 further explains how the transformation of the input coordinate vector p into an output coordinate q is identical to the earlier two equations, converting the input coordinate pair (x, y) into an output coordinate pair (x0 , y 0 ) via a six parameter affine transformation. Definition 22 Affine transformation with 2D homogeneous coordinates x0 sx y0 = 0 w0 0 0 sy 0 0 x x 0 · y = Msc y 1 w w x0 1 y 0 = hy w0 0 hx 1 0 0 x x 0 · y = Msh y 1 w w x0 1 0 tx x x y 0 = 0 1 ty · y = Mtr y w0 0 0 1 w w r11 x0 y 0 = r21 w0 0 r12 r22 0 tx x ty · y s w Definition 22 shows an example of how to construct a Affine Transformation that rotates, translates and scales it in one step. The transformation is done in 2D using homogeneous coordinates ( see Chapter 9.9 ). The parameters ri specify the rotation, ti specify the translational element and s is a scaling factor ( which in this case scales equally in both directions x, and y ). Prüfungsfragen: • Es seien zwei Punktwolken“ entsprechend Abbildung ?? gegeben. Stelle zunächst die ” geeignete Transformation der einen Punktgruppe auf die zweite Punktgruppe unter Verwendung des dazu einzusetzenden Formelapparates (ohne Verwendung der angebenen Koordinaten) dar, sodass die markierten drei Punkte im linken Bild (jene drei, welche als Kreisflächen markiert sind) nach der Transformation mit den drei Punkten im rechten Bild (die ebenfalls als Kreisflächen markiert sind) zur Deckung gebracht werden. • Stellen Sie bitte für die in der Frage ?? gesuchte Berechnung der unbekannten Transformationsparameter die Koeffizientenmatrix auf, wobei die Koordinaten aus Abbildung ?? nur ganzzahling verwendet werden. 9.6. A GENERAL 2-DIMENSIONAL TRANSFORMATION 9.6 169 A General 2-Dimensional Transformation We begin the consideration of a more general 2-dimensional transformation by a look at the bilinear transformation ( see Definition 23 ), which takes the input coordinates (x, y) and converts them into an output coordinate pair (X, Y ) via a bilinear expression which has a term with a product (x, y) of the input x and input y coordinates. This transformation is called bilinear because if we freeze either the coordinate x or the coordinate y we obtain a linear expression for the transformation. Such a transformation has 8 parameters as we can see from Slide 9.36. Each input point (x, y) produces 2 equations as shown in that slide. We need four points to compute the transformation parameters a, b, c, e, f , g, and the translational parameters d and h. By means of a bilinear transformation we can match any group of four input points into any group of four output points and thereby achieve a perfect fit by means of that transformation. Definition 23 Bliniear transformation 0 x 0 y = a ∗ x + b ∗ y + c ∗ xy + d = e ∗ x + f ∗ y + g ∗ xy + h A more general transformation would be capable of taking a group of input points as shown in Slide 9.37, in this example with an arrangement of 16 points, into a desired output geometry as shown in Slide 9.38. We suggest that the randomly deformed arrangements of that slide be converted into a rigidly rectangular pattern: How can we achieve this? Obviously, we need to define a transformation with 16 × 2 = 32 parameters for all 32 coordinate values. Slide 9.39 illustrates the basic concept. We are setting up a polynomial transformation to take the input coordinate pair (x, y) and translate it into an coordinate pair (X, Y ) by means of two 16-parameter polynomials. These polynomial coefficients a0 , a1 , a2 , . . . and b0 , b1 , b2 , . . . may initially be unknown, but if we have 16 input points with their input locations (x, y) and we know their output locations (X, Y ), then we can set up an equation system to solve the unknown transformation parameters a0 , a1 , . . . , a15 , and b0 , b1 , . . . , b15 . Slide 9.40 illustrates the type of computation we have to perform. Suppose we had given in the input coordinate system 1 the input coordinates (xi , yi ) and we have n such points. We also have given in the output coordinate system 2 the output coordinates (Xj , Yj ) and we have the same number of output points n. We can now set up the equation system that translates the input coordinates (xi , yi ) into output coordinates (Xj , Yj ). What we ultimately obtain is an equation system: x=K·u In this equation, x are the known output coordinates, u is the vector of unknown transformation parameters, and this may be 4 in the conformal transformation, 6 in the affine, 8 in the bilinear or, as we discussed before, 32 for a polynomial transformation that must fit 16 points from an input to 16 output locations. What is in the matrix K? It is the coefficient matrix for the equations and is filled with the input coordinates as shown in the polynomial or other transformation equations. How large is the coefficient matrix K? Obviously for an affine transformation, the coefficient matrix K is filled with 6 by 6 elements, and in the polynomial case discussed here the coefficient matrix K has 36 by 36 elements. What happens if we have more points given in system 1 with their transformed coordinates in system 2 than we need to solve for the unknowns? Suppose we had ten input and ten output points to compute the unknown coefficients of a conformal transformations where we would only need 2 points producing 4 equations to allow us to solve for the 4 unknowns? We have an overdetermined equation system and our matrix K is rectangular. We can not invert a rectangular matrix. So what do we do? 170 CHAPTER 9. TRANSFORMATIONS There is a theory in statistics and estimation theory which is called Least Squares Method . Slide 9.41 explains: we can solve an over-determined equation system which has a rectangular and not a square coefficient matrix by premultiplying the left and the right side of the equation system by a transposed of the coefficient matrix, KT . We obtain in this manner a square matrix KT · K on the right hand side and we call this a normal equation matrix . It is square in the shorter of the two dimensions of the rectangular matrix K and it can be inverted. So the unknown coefficient u results of an inverse of the product KT · K as shown in Slide 9.41. This is but a very simple glimpse at the matters of “Least Squares”. In reality, this is a concept that can fill many hundreds of pages of textbooks, but the basic idea is that we estimate the unknown parameters u using observations that are often erroneous, and to be robust against such errors, we provide more points (xi , yi ) in the input system and (Xi , Yi ) in the output system than needed as a minimum. Because of these errors the equations will not be entirely consistent and we will have to compute transformation parameters that will provide a best approximation of the transformation. “Least squares” solutions have optimality properties if the errors in the coordinates are statistically normally distributed. Prüfungsfragen: • Im 2D Raum sei ein bilineare Transformation gesucht, und die unbekannten Transformationsparameter seien zu berechnen. Es seien dafür N Punkte mit ihren Koordinaten vor und nach der Transformation bekannt, wobei N > 4. Welcher Lösungsansatz kommt hier zur Anwendung? Antwort: Methode der kleinsten Quadrate: X = K·u K · X = KT ·K · u −1 u = KT · K · KT · X T • In der Vorlesung wurden zwei Verfahren zur Ermittlung der acht Parameter einer bilinearen Transformation in zwei Dimensionen erläutert: 1. exakte Ermittlung des Parametervektors u, wenn genau vier Input/Output-Punktpaare gegeben sind 2. approximierte Ermittlung des Parametervektors u, wenn mehr als vier Input/OutputPunktpaare gegeben sind ( Least squares method“) ” Die Methode der kleinsten Quadrate kann jedoch auch dann angewandt werden, wenn genau vier Input/Output-Punktpaare gegeben sind. Zeigen Sie, dass man in diesem Fall das gleiche Ergebnis erhält wie beim ersten Verfahren. Welche geometrische Bedeutung hat diese Feststellung? Hinweis: Bedenken Sie, warum die Methode der kleinsten Quadrate diesen Namen hat. Antwort: u = KT K −1 KT X = K−1 KT −1 KT X = K−1 X Diese Umformungen sind möglich, da K hier eine quadratische Matrix ist. Da das Gleichungssystem nicht überbestimmt ist, existiert eine exakte Lösung (Fehler ε = 0). Diese Lösung wird auch von der Methode der kleinsten Quadrate gefunden, indem der Fehler (ε ≥ 0) minimiert wird. • Beschreiben Sie eine bilineare Transformation anhand ihrer Definitionsgleichung! 9.7. IMAGE RECTIFICATION AND RESAMPLING 9.7 171 Image Rectification and Resampling We change the geometry of an input image as illustrated in Slide 9.43, Slide 9.44, Slide 9.45 showing a mesh or a grid superimposed over the input image. We similarly show a different shape mesh in the output image. The task is to match the input image onto the output geometry so that the meshes fit one another. We have to establish a geometric relationship between the image in the input and the output using a transformation equation from the input to the output. If we now do a geometric transformation of an image we have essentially two tasks to perform. First we need to describe the geometric transformation between the input image and the output image by assigning to every input image location the corresponding location in the output image. This is a geometric operation with coordinates. Second, we need to produce an output gray level for the resulting image based on the input gray levels. We call this second process a process of resampling as shown in Slide 9.46, and use operations on gray values. Again, what we do conceptually is to take an input image pixel at location (x, y) and to compute by a spatial transform the location in the output image at which this input pixel would fall and this location has the coordinates (x0 , y 0 ) in accordance with Slide 9.46. However, that location may not perfectly coincide with the center of a pixel in the output image. Now we have a second problem and that is to compute the gray value at the center of the output pixel by looking at the area in which the input image corresponds to that output location. One method is to assign the gray value we find in the input image to the specific location in the output image. If we use this method, we have used a so-called nearest neighbor -method. An application of this matter of resampling and rectification of images is illustrated in Slide 9.47. We have a distorted input image which would show an otherwise perfectly regular grid with some distortions. In the output image that same grid is reconstructed with reasonably perfect vertical and horizontal grid lines. The transition is obtained by means of a geometric rectification and this rectification includes as an important element the function of resampling. Slide 9.113 is again the image of planet Mercury before the rectification and Slide 9.49 after the rectification performing a process as illustrated earlier. Let us hold right at this point and delay a further discussion of resampling to a separate later (Chapter 15). Resampling and image rectification was only mentioned at this point to establish the relationship of this task to the idea of 2-dimensional transformations from an input image to an output image. Prüfungsfragen: • Wird eine reale Szene durch eine Kamera mit nichtidealer Optik aufgenommen, entsteht ein verzerrtes Bild. Erläutern Sie die zwei Stufen des Resampling, die erforderlich sind, um ein solches verzerrtes Bild zu rektifizieren! Antwort: 1. geometrisches Resampling: Auffinden von korrespondierenden Positionen in beiden Bildern 2. radiometrisches Resampling: Auffinden eines geeigneten Grauwertes im Ausgabebild 9.8 Clipping As part of the process of transforming an object from a world coordinate system into a display coordinate system on a monitor or on a hardcopy output device we are faced with an interesting problem: We need to take objects represented by vectors and figure out which element of each vector is visible on the display device. This task is called clipping. An algorithm to achieve 172 CHAPTER 9. TRANSFORMATIONS clipping very efficiently is named after Cohen-Sutherland. Slide 9.51 illustrates the problem a number of objects is in world coordinates and a display window will only show part of those objects. On the monitor the objects will be clipped. Slide 9.52 algorithm. The task is to receive on the input side a vector defined by the end points p1 and p2 and computing auxiliary points C, D where this vector intersects the display window which is defined by a rectangle. 9.8.1 Half Space Codes In order to solve the clipping problem Cohen and Sutherland have defined so-called half-space codes in Slide 9.54 and relate to the half spaces defined by the straight lines delineating the display window. These half-space codes designate spaces to the right, to the left, to the top and to the bottom of the boundaries of the display window, say with subscripts cr , cl , ct , and cb . For example if a point is to the right of the vertical boundary, the point’s half-space code is set to “true”, but if it is to the left the code per is set to “false”. Similar a location above the window gets a half-space code ct “true” and below gets “false”. We now need to define a procedure called “Encode” in Slide 9.55 which takes an input point p and produces for it the associated four half-space codes assigning the half-space codes to variable c, a Boolean variable. We obtain 2 values for each of the 2 coordinates px and py of point p, obtaining a value of true or false depending on where px falls with respect to the vertical boundaries of the display window, and on where py falls with respect to the horizontal boundaries. 9.8.2 Trivial acceptance and rejection Slide 9.56 is a picture of the first part of the procedure clip, as it is presented in [FvDFH90, Section 3.12.3]. Procedure “Encode” is called up for the beginning and end points of a straight line, denoted as P1 and P2 and the resulting half-space codes are denoted as C1 and C2 . We now have to take a few decisions about the straight line depending on where P1 and P2 fall. We compute 2 auxiliary Boolean variables, |In|1 and |In|2 . We can easily show that the straight line is entirely within the display window if |In|1 and |In|2 are “true”. This is called trivial acceptance, shown in Slide 9.57 for points A, B. Trivial rejection is also shown in Slide 9.57 for a straight line connecting points C and D. 9.8.3 Is the Line Vertical? We need to proceed in the “Clipping Algorithm” in Slide 9.58, if we do not have a trivial acceptance nor a trivial rejection. We differentiate among cases where at least one point is outside the display window. The first possibility is that the line is vertical. That is considered first. 9.8.4 Computing the slope If the line is not vertical we compute its slope. This is illustrated in Slide 9.59. 9.8.5 Computing the Intersection A in the Window Boundary With this slope we compute the intersection of the straight line with the relevant boundary lines of the display window at wl , wr , wt and wb . We work our way through a few decisions to make sure that we do find the intersections of our straight line with the boundaries of the display window. 9.9. HOMOGENEOUS COORDINATES 9.8.6 173 The Result of the Cohen-Sutherland Algorithm The algorithm will produce a value starting that either the straight line is entirely outside of the window or it returns with the end points of the straight line. These are the end points from the input if the entire line segment is within the window, they are the intersection points of the input line with the window bountaries if the line intersects them. Prüfungsfragen: • Welche Halbraumcodes“ werden im Clipping verwendet, und welche Rolle spielen sie? ” • Erklären Sie die einzelnen Schritte des Clipping-Algorithmus nach Cohen-Sutherland anhand des Beispiels in Abbildung B.18. Die Zwischenergebnisse mit den half-space Codes sind darzustellen. Es ist jener Teil der Strecke AB zu bestimmen, der innerhalb des Rechtecks R liegt. Die dazu benötigten Zahlenwerte (auch die der Schnittpunkte) können Sie direkt aus Abbildung B.18 ablesen. • Wenden Sie den Clipping-Algorithmus von Cohen-Sutherland (in zwei Dimensionen) auf die in Beispiel B.2 gefundenen Punkte p01 und p02 an, um den innerhalb des Quadrats Q = {(0, 0)T , (0, 1)T , (1, 1)T , (1, 0)T } liegenden Teil der Verbindungsstrecke zwischen p01 und p02 zu finden! Sie können das Ergebnis direkt in Abbildung B.19 eintragen und Schnittberechnungen grafisch lösen. Antwort: 9.9 p01 p02 cl true false cr false true ct false false cb false true Homogeneous Coordinates A lot of use of homogenous coordinates is made in the world of computer graphics. The attraction of homogenous coordinates is that in a 2- or 3- dimensional transformation of an input x coordinate system or object described by x into an output coordinate system x0 or changed object we do not have to split our operation into a part with a multiplication for the rotation matrix and scale factor, and separately have an addition for the translation vector t. Instead we simply employ only a matrix multiplication having a simple homogeneous coordinate X for a point and output coordinates X0 for the same point after the transformation. Slide 9.62 explains the basic idea of homogenous coordinates. Instead of working in 2 dimensions in a 2-dimensional Cartesian coordinate system (x, y) we augment the coordinate system by a third coordinate w, and any point in 2D-space with locations (x, y) receives a third coordinate and therefore is at location (x, y, w). If we define w1 = 1 we have defined a horizontal plane for the location of a point. Again Slide 9.63 states that Cartesian coordinates in 2 dimensions represent a point p as (x, y) and homogeneous coordinates in 2 dimensions have that same point represented by the three element vector (x, y, 1). Let us try to explain how we use homogeneous coordinates staying with 2 dimensions only. In Slide 9.64 we have another view of a translation in Cartesian coordinates. Slide 9.65 describes scaling, in this particular case an affine scaling occurs with separate scale factors in the two different coordinate directions (Slide 9.66 illustrates a rotation). Slide 9.67 illustrates the translation by means of a translation vector and scaling by means of a scaling matrix. Slide 9.68 introduces the relationship between a Cartesian and a homogenous coordinate system. Slide 9.69 uses homogeneous coordinates for a translation by means of a multiplication of the input coordinate into an output coordinate system. The same operation is used for scaling in Slide 9.70 and for rotation in Slide ??, Slide ?? summarizes. As Slide 9.73 reiterates that translation and scaling are described by matrix multiplication and of course rotation and scaling have previously also been matrix multiplications in the Cartesian 174 CHAPTER 9. TRANSFORMATIONS coordinate system. If we now combine these three transformations of translation, scaling and rotation we obtain a single transformation matrix M which describes all three transformations without separation into multiplication and additions as is necessary in the Cartesian case. The simplicity of doing everything in matrix form is the appeal that leads computer graphics software to heavily rely on homogeneous coordinates. In image analysis homogeneous coordinates are not as prevalent. One may assume that because we often times have in image processing to estimate transformation parameters using for this over-determined equation systems and the method of least squares. That approach typically is better applicable with Cartesian geometry than with the homogeneous system. Prüfungsfragen: • Erklären Sie die Bedeutung von homogenen Koordinaten für die Computergrafik! Welche Eigenschaften weisen homogene Koordinaten auf? • Geben Sie für homogene Koordinaten eine 3 × 3-Matrix M mit möglichst vielen Freiheitsgraden an, die geeignet ist, die Punkte p eines starren Körpers (z.B. eines Holzblocks) gemäß q = M p zu transformieren (sog. rigid body transformation“)! ” Hinweis: In der Fragestellung sind einfache geometrische Zusammenhänge verschlüsselt“ ” enthalten. Wären sie hingegen explizit formuliert, wäre die Antwort eigentlich Material der Gruppe I“. ” • Gegeben seien die Transformationsmatrix 0 2 0 0 0 0 2 0 M = 1 0 0 −5 −2 0 0 8 und zwei Punkte 3 p1 = −1 , 1 2 p2 = 4 −1 in Objektkoordinaten. Führen Sie die beiden Punkte p1 und p2 mit Hilfe der Matrix M in die Punkte p01 bzw. p02 in (normalisierten) Bildschirmkoordinaten über (beachten Sie dabei die Umwandlungen zwischen dreidimensionalen und homogenen Koordinaten)! Antwort: 0 2 0 0 1 0 −2 0 3 0 0 −1 2 0 · 0 −5 1 1 0 8 0 2 0 0 2 0 0 2 0 4 1 0 0 −5 · −1 −2 0 0 8 1 9.10 −2 −1 2 0 = 1 −2 ⇒ p1 = −1 2 8 2 −2 0 = −0.5 −3 ⇒ p2 = −0.75 4 A Three-Dimensional Conformal Transformation In three dimensions things become considerably more complex and more difficult to describe. Slide 9.75 shows that a 3-dimensional conformal transformation rotates objects or coordinate axes, scales 9.10. A THREE-DIMENSIONAL CONFORMAL TRANSFORMATION 175 Definition 24 Rotation in 3D The three-dimensional rotation transforms an input point P with coordinates (x,y,z) into an output coordinate system (X,Y,Z) by means of a rotation matrix R. The elements of this rotation matrix can be interpreted following: - as the cosines of the angles subtended by the coordinate axes xX,yX,zX,...zZ - as the assembly of the three unit vectors directed along the axes of the rotated coordinate systems but described in terms of the input. R cos(xX) cos(yX) cos(zX) = cos(xY ) cos(yY ) cos(zY ) cos(xZ) cos(yZ) cos(zZ) or R r11 = r21 r31 P0 = R·P r12 r22 r32 r13 r23 r33 A 3D rotation can be considered as a composition of three individual 2-D rotations around the coordinate axes x,y,z. It is easy to see that rotating around one axis will also affect the two other axis. Therefore the sequence of the rotations is very important. Changing the sequence of the rotations may result in a different output image. them and translates, just as we had in 2 dimensions. However, the rotation matrix now needs to cope with three coordinate axes. In analogy to the 2-dimensional case we now know that the rotation matrix takes an input point P with coordinates (x, y, z) into an output coordinate system (X, Y, Z) by means of a rotation matrix R. The elements of this rotation matrix are again first: the cosines of the angles subtended by the coordinate axes xX, yX, zX, . . . , zZ; second is the assembly of three unit vectors directed along the axes of the rotated coordinate system but described in terms of the input coordinate systems (Slide 9.76 the multiplication of three 2-D rotation a matrices as shown in Slide 9.77. The composition of the rotation matrix by three individual 2-D rotations around the three coordinate axes x, y and z is the most commonly used approach. Each rotation around an axis needs to consider that that particular axis may already have been rotated by a previous rotation. Note as we rotate around a particular axis first, that will move the other two coordinate axes. We then rotate around the rotated second axis, affecting the third one again and then we rotate around the third axis. The sequence of rotations is of importance and will change the ultimate outcome if we change the sequence. Slide 9.79 illustrates how we might define a three-dimensional rotation and translation by means of three points P1 , P2 , P3 which represent two straight line sequency P1 P2 and P1 P3 . We begin by translating P1 into the origin of the coordinate system. We proceed by rotating P2 into the z axis and complete the rotation by rotating P3 into the yz plane. We thereby obtain the final position. If we track this operation we see that we have applied several rotations. We have first rotated P1 P2 into the xz plane. Then we have rotated the result around the y-axis into the z-axis. Finally we have rotated P1 P3 around the z-axis into the yz plane. Slide 9.80 and Slide 9.81 explain in detail the sequence of three rotations of three angles which are denoted in this case first as angle Θ, second angle φ, and third angle α. Generally, a three dimensional conformal transformation will be described by a scaling l, a rotation matrix R and a translation vector t. Note that l is a scalar value, the rotation matrix is a 3 by 3 matrix containing three angles and translation vector t has three elements with translations along the directions x, y, and z. This type of transformation contains seven parameters for the three dimensions as opposed to four parameters in the 2D case. Note that the rotation matrix has 3 angles, the scale factor is a fourth value and the translation vector has three values, resulting in a total of seven parameters 176 CHAPTER 9. TRANSFORMATIONS to define this transformation. Prüfungsfragen: • Was versteht man unter einer konformen Transformation“? ” 9.11 Three-Dimensional Affine Transformations Definition 25 Affine transformation with 3D homogeneous coordinates case ’translation’: 1 0 tr matrix = 0 0 0 tx 0 ty 1 tz 0 1 0 1 0 0 case ’rotation x’: 1 0 rotationx = 0 0 0 0 0 cos φ − sin φ 0 sin φ cos φ 0 0 0 1 case ’rotation y’: cos φ 0 rotationy = − sin φ 0 0 1 0 0 sin φ 0 cos φ 0 0 0 0 1 case ’rotation z’: cos φ − sin φ sin φ cos φ rotationz = 0 0 0 0 0 1 0 0 0 0 0 1 case ’scale’: sx 0 scale matrix = 0 0 0 sy 0 0 0 1 sz 0 0 0 0 1 By using homogeneous coordinates, all transformations are 4x4 matrices. So the transformations can be easily combined by multiplying the matrices. This results in a speedup because every point is only multiplied with one matrix and not with all transformation-matrices. The three-dimensional transformation may change the object shape. A simple change results from shearing or squishing, and produces an affine transformation. Generally, the affine transformation 9.12. PROJECTIONS 177 does not have a single scale factor, but we may have up to three different scale factors along the x, y, and z axes as illustrated in Slide 9.83. An other interpretation of this effect is to state that a coordinate X is obtained from the input coordinates (x, y, z) by means of these shearing elements hyx and hzy which are really part of the scaling matrix Msc . Ultimately, a cube will be deformed into a fairly irregular shape as shown in Slide 9.84 with the example of a building shape. A three-dimensional affine transformations now has 12 parameters, so that transformations of the x, y, and z coordinates are independent of one another. Yet, however, the transformation will maintain straight lines as straight lines. However, right angles will not remain right angles. 9.12 Projections From a higher dimensional space, projections produce images in a lower dimensional space. We have projection lines in projectors that connect input to output points, we have projection centers and we have a projection surface onto which the high-dimensional space is projected. In the real world we basically project 3-dimensional spaces onto 2 dimensional projection planes. The most common projections are the perspective projections as used by the human eye and by optical cameras. We differentiate among a multitude of projections. The perspective projections model what happens in a camera or the human eye. However, engineers have long used parallel projections. These are historically used also in the arts and in cartography and have projection rays (also called projectors or projection lines) that are parallel. If they are perpendicular onto the projection plane we talk about an orthographic projection. If they are not perpendicular but oblique to the projection plane, we talk about an oblique projection (see Slide 9.86). A special case of the orthographic projection results in the commonly used presentations of three dimension space in a top view, front view and side view (Slide 9.87) case. Heavy use in architecture and civil engineering of top views, front views and side views of a 3-D space is easy to justify: from these 3 views we can reconstruct the 3 dimensions of that space. Another special is the axonometric projection where the projection plane is not in one of the three coordinate planes of a three-dimensional space.. Yet another special case in the isometric projection which occurs if the projection plane is chosen such that all three coordinate axes are changed equally much in the projection (the projection rays are directed along the vector with elements (1,1,1). We highlight particular oblique projections which are the cavalier and the cabinet projection. The cavalier projection produces no scale reduction along the coordinate axes because it projects perfectly under 45◦ . In the cabinet projection we project under an angle α = 63.4◦ since from tan α = 2, this projection shrinks an object in one direction by factor of 1/2. 9.13 Vanishing Points in Perspective Projections In order to construct a perspective projection we can take advantage of parallel lines. In the natural world they meet of course at infinity. In the projection they meet at a so-called vanishing point 1 . This is a concept of descriptive geometry, a branch of mathematics. Slide 9.91 is the example of a perspective projection as produced by a synthetic camera. Note how parallel lines converge at a point which typically is outside the display area. The vanishing point is the image of the object point at infinity. Because there exists an infinity of directions for bundles of parallel lines in 3D space, there exists an infinity of vanishing points. However, special vanishing points are associated with bundles of lines that are parallel with the coordinate axes. Such vanishing points are called principal. 1 in German: Fluchtpunkt 178 CHAPTER 9. TRANSFORMATIONS If we may have only one axis producing a finite vanishing point since the other two axes are themselves parallel to the projection plane and their vanishing points are at infinity. Therefore such a perspective projection is called a one-point perspective in which a cube aliguid with the coordinate axes will only have one vanishing point. Analogously, Slide 9.93 and Slide 9.94 present a 2-point and a general 3-point perspective. 9.14 A Classification of Projections Slide 9.96 presents the customary hierarchy of projections as they are commonly presented in books about architecture, art and engineering. In all cases, these projections are onto a plane and are thus planar projections. The differenciation between perspective and parallel projections is somewhat artificial if one considers that with a perspective center at infinity, one obtains the parallel projection. However, the projections are grouped into parallel and perspective projections, the perspective axes are then subdivided into single point, two point and three point perspective projections and the parallel projections are classified into orthographic and oblique ones, the oblique have the cavalier and cabinet projection as special cases. The orthographic projections have the axonometry on one hand and the multi-view orthographic on the other hand and within the axonometric projection we have one special case we discussed, the isometric projection. We do not discuss the world of more complex projections, for example to convert the surface of a sphere into a plane: this is the classical problem of cartography with its need to present a picture of the Earth on a flat sheet of paper. Prüfungsfragen: • In der Vorlesung wurde ein Baum“ für die Hierarchie diverser Projektionen in die Ebene ” dargestellt (Planar Projections). Skizzieren Sie bitte diesen Baum mit allen darin vorkommenden Projektionen. 9.15 The Central Projection This is the most important projection of all the ones we discuss in this class. The simple reason for this is that it is the geometric model of a classical camera. Slide 9.98 explains the geometry of a camera and defines three coordinate systems. The first is the world coordinate system with X, Y and Z. In this world coordinate system we have a projection center O at location (X0 , Y0 , Z0 ). The projection center is the geometric model of a lens. All projection lines are straight lines going from the object space, where there is an object point P at the location (x, y, z) through the projection center O and intersecting the image plane. We know at this point that the central projection is similar to the perspective projection. There is a small difference, though. We define the projection center with respect to an image plane and insist on some additional parameters that describe the central projection that we do not typically use in the perspective projection. Note that we have a second coordinate system that is in the image plane that is denoted in Slide 9.98 by ξ and η. This is a rectangular 2-dimensional Cartesian coordinate system with an origin at point M . The point P in object space is projected onto the image location P 0 = (x, y). Third, we have the location of point O of the perspective center defined in a sensor coordinate system. The sensor coordinate system has its origin at the perspective center O, is a three-dimensional coordinate system, its x and y axes are nominally parallel to the image coordinate system (ξ, η)and the z-axis is perpendicular to the image plane. 9.15. THE CENTRAL PROJECTION 179 We do have an additional point H defined in the central projection which is the intersection of the line perpendicular to the image plane and passing through the projection center. Note that this does not necessarily have to be identical to point M . M simply is the origin of the image coordinate system and is typically the point of symmetry with respect to some fiducial marks as shown in Slide 9.98. In order to describe a central projection we need to know the image coordinate system with its origin M , we need to know the sensor coordinate system with its origin O and we need to understand the relationship between the sensor coordinate system and the world coordinate system (X, Y, Z). Let us take another look at the same situation in Slide 9.99 where the coordinate systems are again illustrated. We have the projection center O as in the previous slide and we have two projection rays going from object point P1 to image point P10 or object point P2 to image point P20 . We also do have an optical axis or the direction of the camera axis which passes through point O and is perpendicular to the image plane. In Slide 9.99 there are two image planes suggested. One is between the perspective center and the object area and that suggests the creation of a positive image. In a camera, however, the projection center is typically between the object and the image plane and that leads geometrically to a negative image. Slide 9.99 also defines again the idea of an image coordinate system. In this case it is suggested that the image is rectangular and the definition of the image coordinates is by some artificial marks that are placed in the image plane. The marks are connected and define the origin M . We have also again the point H which is the intersection of the line perpendicular to the image plane but passing through the projection center. He also have some arbitrary location for a point P 0 that is projected into the image. We will from here on out ignore that M and H may be 2 locations. Typically the distance between M and H is small, and it is considered an error of a camera if M and H don’t coincide. Normal cameras that we use as amateurs do not have those fiducial marks and therefore they are called non-metric cameras because they do not define an image coordinate system. Users of non-metric cameras who want to measure need to help themselves by some auxiliary definition of an image coordinate system and they must make sure that the image coordinate system is the same from picture to picture if multiple pictures show the same object. Professional cameras that are used for making measurements and reconstructing 3D objects typically will have those fiducial marks as fixed features of a camera. In digital cameras the rows and columns of a CCD array will provide an inherent coordinate system because of the numbering of the pixels. Slide 9.100 is revisiting the issue of a 3-dimensional rotation. We have mentioned before that there are three coordinate systems in the camera: Two of those are 3-dimensional and the third one is 2-dimensional. The sensor coordinate system with its origin at projection center O and the world coordinate system (X, Y, Z) need to be related via a 3-dimensional transformation. Slide 9.100 suggests we have 3 angles that define the relationship between the 2 coordinate systems. Each of those angles represents a 2-dimensional rotation around 1 of the 3 world coordinate axes. Those are angles in 3D space. Recall that we have several definitons of rotations matrixes and that we can define a rotation matrix by various geometric entities. These can be rotations around axes that rotate themselves, or they can be angles in 3-D space subtended by the original axes and the rotated axis. Slide 9.100 describes the first case. Θ is the angle of rotation around the axis Z, but in the process we will rotate axes X and Y . φ is rotating around the axis X and will take with it obviously the axes Z and Y . And A then is a rotation around the rotated axis Z. Conceptually, everything we said earlier about 3-dimensional transformations, rotations and so forth applies here as well. Our earlier discussions of a 3D conformal transformation applies to the central projection and the central projection really is mathematically modeled by the 3-dimensional conformal which elevates that particular projection to a particularly important role. 180 9.16 CHAPTER 9. TRANSFORMATIONS The Synthetic Camera We have various places in our class in which we suggest the use of a synthetic camera. We have applications in computer graphics in order to create a picture on a display medium, on a monitor, for augmented or virtual reality. We have it in image processing and photogrammetry to reconstruct the world from images and we have terminology that has developed separately as follows. What is called a projection plane or image plane is in computer graphics called a View Plane. What in image processing is a projection center is in computer graphics a View Reference Point VRP. And what in image processing is the optical axis or the camera axis is in computer graphics the View Plane Normal VPN. Slide 9.102 and Slide 9.103 explain this further. We do have again a lens center and an image plane and an optical axes Z that is perpendicular to the image plane which itself is defined by coordinate axis X and Y . An arbitrary point in object space (X, Y, Z) is projected through the lens center on to an image plane. Note that in a synthetic camera we do not worry much about fine points such as an image coordinate system defined by fiducial marks or the difference between the points M and H (M being the origin of the image coordinate system and H being the intersection point of the line normal to the image plane and passing through the lens center). In robotics we typically use cameras that might use rotations around very particular axes. Slide 9.103 defines the world coordinate system with (X, Y, Z) and defines a point or axis of rotation in the world coordinate system at the end of vector w0 at location (X0 , Y0 , Z0 ). That location of an axis of rotation then defines the angle under which the camera itself is looking at the world. The camera has coordinate axes (x, y, z) and an optical axis in the direction of coordinate axis z. The image coordinates are 2-dimensional with an origin at the center of the image and that point itself is defined by an auxiliary vector r with respect to the point of rotation. So we see that we have various definitions of angles and coordinate systems and we always need to understand these coordinate systems and convert them into one another. Slide 9.104 explains this further: We do have a camera looking at the world, again we have an image coordinate system (x, y), and a sensor system (x, y, z) that are defined in the world coordinate system (X, Y, Z). As we want to define where a camera is in the world coordinate system and in which direction its optical axis is pointing we have to build up a transformation just as we did previously with the 3-dimensional conformal transformation. Let us assume that we start out with a perfect alignment of our camera in the world coordinate system so that the sensor coordinate axes x, y, z and the world coordinate axis X, Y , Z are coinciding. We now move the camera into an arbitrary position which represents the translation in 3-D space defined, if you recall, by the translational vector t. Then we orient the camera by rotating it essentially around 3 axes into an arbitrary position. First rotation may be as suggested in Slide 9.105 around the z axis which represents the angle A in Slide ??. In this slide it is suggested that the angle is 135o . Next we roll the camera around the x axis, also again by an angle of 135o and instead of having the camera looking up into the sky we now have it look down at the object. Obviously we can apply a third rotation around the rotated axis y to give our camera attitude complete freedom. We now have a rotation matrix that will be defined by those angles of rotation that we just described, we have a translation vector as described earlier. Implied in all of this is also a scale factor. We have not discussed yet the perspective center and the image plane. Obviously, as the distance grows, we go from a wide-angle through a normal-angle to a tele-lens and that will affect the scale. So the scale of the image is affected by the distance of the camera from the object and also by the distance of the projection center from the image plane. Note that we need 7 elements to describe the transformation that we have seen in Slide 9.105. We need 3 elements of translation, we have 3 angles of rotation and we have one scale factor that is defined by the distance of the projection center from the image plane. That are the exact same 7 transformation parameters that we had earlier in the 3-dimensional conformal transformation. Prüfungsfragen: 9.17. STEREOPSIS 181 • Gegeben seien eine 4 × 4-Matrix 8 0 M= 0 0 0 8 0 0 8 −24 8 8 0 24 1 1 sowie vier Punkte p1 p2 p3 p4 = = = = (3, 0, 1)T (2, 0, 7)T (4, 0, 5)T (1, 0, 3)T im dreidimensionalen Raum. Die Matrix M fasst alle Transformationen zusammen, die zur Überführung eines Punktes p in Weltkoordinaten in den entsprechenden Punkt p0 = M · p in Gerätekoordinaten erforderlich sind (siehe auch Abbildung B.36, die Bildschirmebene und daher die y-Achse stehen normal auf die Zeichenebene). Durch Anwendung der Transformationsmatrix M werden die Punkte p1 und p2 auf die Punkte p01 p02 = (4, 8, 12)T = (6, 8, 3)T in Gerätekoordinaten abgebildet. Berechnen Sie in gleicher Weise p03 und p04 ! Antwort: es gilt p̃01 p̃02 p̃03 p̃04 9.17 = (8, 16, 24, 2)T = (48, 64, 24, 8)T = (48, 48, 24, 6)T = (8, 32, 24, 4)T ⇒ ⇒ ⇒ ⇒ p01 p02 p03 p04 = (4, 8, 12)T = (6, 8, 3)T = (8, 8, 4)T = (2, 8, 6)T Stereopsis This is a good time to introduce the idea of stereopsis although we will have a separate chapter later in this class. The synthetic camera produces an image that we can look at with one eye and if we produce a second image and show it to the other eye we will be able to “trick” the eye into a 3-dimensional perception of the object that was imaged. Slide 9.107. We model binocular vision by two images: we compute or present to the eyes two existing natural images of the object, separately one image to one eye and the other image to the other eye. Those images can be taken by one camera placed in two locations or there can be synthetic images computed with a synthetic camera. Slide 9.108 explains further that our left eye is seeing point Pleft , the right eye is seeing point Pright , and in the brain those 2 observations are merged in a 3-dimensional location P . Slide 9.109 illustrates that a few rules need to be considered when creating images for stereoscopic viewing. Image planes and the optical axis for the 2 images should be parallel. Therefore, one should not create two images with converging optical axes. This would be inconsistent with natural human viewing. Only people who squint2 will have converging optical axes. Normal stereoscopic viewing would create a headache if the images were taken with converging optical axes. We call the distance between the two lens centers for the two stereoscopic images the stereobase B. Slide 9.110 shows the same situation in a top view. We have the distance from the lens center 2 in German: schielen 182 CHAPTER 9. TRANSFORMATIONS to the image plane which is typically noted as the camera constant or focal length and an object point W which is projected into image locations (X1 , Y1 ) and (X2 , Y2 ), and we have the two optical axes Z parallel to one another and perpendicular to XY . Note that we call the ratio of B/Distance-to-W also the Base/Heigth ratio, this being a measure of quality for the stereo-view. If we compute a synthetic image from a 3-dimensional object for the left and the right eye we might get a result as shown in Slide 9.111 which indeed can be viewed stereoscopically under a stereoscope. To make matters a little more complicated yet, it turns out that a human can view stereoscopically two images that do not necessarily have to be made by a camera under a central perspective projection. As long as the two images are similar enough in radiometry and if the geometric differences are not excessive, the human will be able to merge the two images into a 3-dimensional impression. This factor has been used in the past to represent measurements in 3 dimensions, for example, temperature. We could encode temperature as a geometric difference in 2 otherwise identical images and we would see a 2-dimensional scene and temperature would be shown as height. This and similar applications have in the past been implemented by various researchers. 9.18 Interpolation versus Transformation One may want to transfer an object such as a distorted photo in Slide 9.113 into an output geometry. This can be accomplished by a simplified transformation based for example on 4 points. This will reveal errors (distortions) in other known points (see Slide 9.114 and Slide 9.115). These errors can be used to interpolate a continuous error function dx (x, y), dy (x, y) which must be applied to each (x, y) location: x0 y0 = x + dx (x, y) = y + dy (x, y) We have replaced a complicated transformation by a much simpler transformation plus an interpolation. Question: What is the definition of interpolation? 9.19 Transforming a Representation 9.19.1 Presenting a Curve by Samples and an Interpolation Scheme We may want to represent an object in various ways. We may have a continuous representation of an object or we might sample that object and represent the intervals between samples by some kind of interpolation and approximation technique. So we have conceptually something similar to a transformation because we have two different ways of representing an object. Slide 9.117 introduces the basic idea that is described by a set of points p1 , p2 , . . . , pn . If we are in a 2dimensional space we may want to represent that object not by n points but by a mathematical curve. In 3-dimensional space it may be a surface to represent a set of points: We transform from one representation into another. A second item is that an object may not be given by points, but by a set of curves x = fx (t), y = fy (t), and z = fz (t). We would like to replace this representation by another mathematical representation which may be more useful for certain tasks. Again while we are going to look at this basically in 2 dimensions or for curves, a generalization into 3 dimensions and to surfaces always applies. 9.19. TRANSFORMING A REPRESENTATION 9.19.2 183 Parametric Representations of Curves We introduce the parametric representation of a curve. We suggest in Slide 9.120 that the 2dimensional curve Q in an (x, y) Cartesian coordinate system can be represented by two curves Q = x(t), y(t). We note this as a parametric representation. The parameter t typically can be the length of the curve and as we proceed along a curve, the coordinate x and the coordinate y will change as the function of the curve length t. More typically, t may be “time” for a point to move along the curve. The advantage of a parametric representation is described in Slide 9.120. dy(t) The tangent is replaced by a tangent vector Q0 (t) = ( dx(t) dt , dt ). That vector has a direction and length. 9.19.3 Introducing Piecewise Curves We may also not use a representation of the function x(t) or y(t) with a high order polynomial but instead we might break up the curve into individual parts, each part being a polynomial of third order (a cubic polynomial). We connect those polynomials at joints by forcing continuity at the joints. If a curve is represented in 3D-space by the equations x(t), y(t), and z(t) as shown in Slide 9.121, we can request that at the joints those polynomial pieces be continuous in the function but are also continuous in the first derivative or tangent. We may even want to make it continuous in the curvature or second derivative (the length of the tangent). However, this type of geometric continuity is narrower than the continity in “speed” as acceleration, if t is interpreted as time. One represents such a curve by a function Q(t) which is really a vector function (x(t), y(t), z(t)). 9.19.4 Rearranging Entities of the Vector Function Q In accordance with the equation of Slide 9.121, Q(t) can be represented as a multiplication of a (row) vector T and a coefficient matrix C where T contains the independent parameter t as the coefficient of the unknowns ax , bx , and cx . Matrix C can now be decomposed into M · G. As a result we can write that Q(t) = T · M · G and we call now G a geometry vector and M is called a basis matrix . We can introduce a new entity, a function B = T · M and those are the cubic polynomials or the so-called blending functions. Prüfungsfragen: • Was sind der Geometrievektor“, die Basisfunktion“ und die Blending Funktionen“ einer ” ” ” parametrischen Kurvendarstellung? Antwort: x(t) y(t) z(t) Q(t) T = = = = = ax t3 + bx t2 + cx t + dx ay t3 + by t2 + cy t + dy az t3 + bz t2 + cz t + dz (x(t), y(y), z(t))T = T · C (t3 , t2 , t, 1) Man zerlegt C in C = M · G, sodass Q(t) = T · C = T · M · G mit G als Geometrievektor und M als Basismatrix. Weiters sind B=T·M 184 CHAPTER 9. TRANSFORMATIONS kubische Polynome, die Blending Functions. 9.19.5 Showing Examples: Three methods of Defining Curves Slide 9.122 introduces three definitions of curves that are frequently used in engineering. Let’s take a look at an example in Slide 9.123. In that slide we have a continuous curve represented by 2 segments S and C. They are connected at a joint. Depending on the tangent vector at the joint we may have different curves. Illustrated in Slide 9.123 are 3 examples C0 , C1 , and C2 . C0 is obtained if we simply enforce at the joint that the function be continuous, but we don’t worry about the tangent vectors to be have the same direction. C1 results if we say that the function has to have the same derivative. C2 further defines that also the length must be identical at the joint. So we have 3 different types of continuity at the joint: function, velocity, acceleration. This type of continuity is narrower than mere geometric continuity with function, slope and curneture. In computer graphics one describes the type of continuity by the direction and the length of the tangent vector. Slide 9.124 again illustrates how a point P2 is the joint between curve segments Q1 and Q2 , two curves passing through P1 , P2 , Pi and P3 . Defining two different lengths for the tangent (representing velocity) leads to two different curve segments Q2 , Q3 . Slide 9.125 describes a curve with two segments joining at point P . We indicate equal time intervals, showing a “velocity” that reduces as we approach point P . At point P we change direction and accelerate. In this case of course, the function is continuous but as shown in that example, the tangent is not continuous. We have a discontinuity in the first derivative at point P . 9.19.6 Hermite’s Approach There is a concept in the representation of curves by means of cubic parametric equations called the Hermite’s Curves. We start out with the beginning and end point of a curve and the beginning and end tangent vector of that curve, and with those elements we can define a geometry vector G as discussed earlier in Slide 9.121. Slide 9.127 explains several cases where we have a beginning and end point of a curve defined, associated with a tangent vector and as a result we can now describe a curve. Two points and two tangent vectors define four elements of a curve. In 2D space this is a third order or cubic curve with coefficients a, b, c and d. Slide 9.128’s curves are basically defined by 2 points and 2 tangent vectors. Since the end point of one curve is identical to the beginning point of the next, we obtain a continuous curve. The tangent vectors are parallel but point into opposite directions. Geometrically we are continuous in the shape, but the vertices are opposing one another. This lends itself to describing curves by placing points interactively on a monitor with a tangent vector. This is being done in constructing complex shapes, say in the car industry where car bodies need to be designed. A particular approach to accomplishing this has been proposed by Bezier. 9.20 Bezier’s Approach Pierre Bezier worked for a French car manufacturer and invented an approach of designing 3dimensional shapes, but we will discuss this in 2 dimensions only. He wanted to represent a smooth curve by means of 2 auxiliary points which are not on the curve. Note that so far we have had curves go through our points, and Bezier wanted a different approach. So he defined 2 auxiliary points for a curve and the directions of the tangent vectors. Slide 9.130 defines the beginning and end points, P1 and P4 and the tangent at P1 using an auxiliary point P2 and the tangent at P4 by using an auxiliary point P3 . By moving P2 and P3 one can obtain various shapes as one pleases, passing through P1 and P4 . 9.21. SUBDIVIDING CURVES AND USING SPLINE FUNCTIONS 185 Definition 26 Bezier-curves in 2D Sind definierte Punkte P0 bis Pn gegeben, die durch eine Kurve angenähert werden sollen, dann ist die dazugehörige Bézierkurve: P (t) = n X Bin (t)Pi 0≤t≤1 1.0 i=0 Die Basisfunktionen, Bernsteinpolynaome genannt, ergeben sich aus: Bin (t) = n i ti (1 − t)n−i mit n i = n! i!(n − i)! 2.0 Sie können auch rekursiv berechnet werden. Bézierkurven haben die Eigenschaften, dass sie: • Polynome (in t) vom Grad n sind, wenn n+1 Punkte gegeben sind, • innerhalb der konvexen Hülle der definierenden Punkte liegen, • im ersten Punkt P0 beginnen und im letzten Punkt Pn enden und • alle Punkte P0 bis Pn Einfluss auf den Verlauf der Kurve haben. Slide 9.131 illustrates the mathematics behind it. Obviously, we have a tangent at P1 denoted as R1 , which is according to Bezier 3 · (P2 − P1 ). The analogous applies to tangent R4 . If we define tangents in that way, we then obtain a third order parametric curve Q(t) as shown in Slide 9.131. Slide 9.132 recalls what we have discussed before, how these cubic polynomials for a parametric representation of a curve or surface can be decomposed into a geometric vector and a basis matrix and how we define a blending function. Slide 9.133 illustrates geometrically some of those blending functions for Bezier. Those particular ones are called Bernstein-curves. Now let’s proceed in Slide 9.134 to the construction of a complicated curve that consists of 2 polynomial parts. We therefore need the beginning and end point for the first part, P1 and P4 , and the beginning and end point for the second part which is P4 and P7 . We then need to have auxiliary points P2 , P3 , P5 and P6 to define the tangent vectors at P1 , P4 , P7 . P3 defines the tangent at P4 for the first curve segment and P5 defines the tangent at point P4 for the second segment. We are operating here with piece-wise functions. If P3 , P4 , and P5 are colinear, then the curve is geometrically continuous. Study Slide 9.134 for details. Prüfungsfragen: • Was ist die Grundidee bei der Konstruktion von 2-dimensionalen Bezier-Kurven“? ” • Beschreiben Sie den Unterschied zwischen der Interpolation und der Approximation von Kurven, und erläutern Sie anhand einer Skizze ein Approximationsverfahren Ihrer Wahl! 9.21 Subdividing Curves and Using Spline Functions We can generalize the ideas of Bezier and other people and basically define spline functions3 as functions that are defined by a set of data points P1 , P2 , . . . , Pn to describe an object and we approximate the object by piecewise polynomial functions that are valid on certain intervals. In the general case of splines the curve does not necessarily have to go through P1 , P2 , . . . , Pn . 3 in German: Biegefunktionen 186 CHAPTER 9. TRANSFORMATIONS Algorithm 24 Casteljau {Input: array p[0:n] of n+1 points and real number u} {Output: point on curve, p(u)} {Working: point array q[0:n]} 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: for i := 0 to n do q[i] := p[i] end for for k := 1 to n do for i := 0 to n - k do q[i] := (1 - u)q[i] + uq[i + 1] end for end for return q[0] {save input} We need to define the locations of the joints, and the type of continuity we want. Note that we abondon here the used for a parametric representation. Let us examine the idea that our points describing an object may be in error, for example those points may be reconstructions from photographs taken of an object using a stereo reconstruction process. Because the points may be in error and therefore be noisy, we do not want the curve or surface to go through the points. We want an approximation of the shape. In that case we need to have more points than we have unknown parameters of our function. In the Least Squares approach discussed earlier, we would get a smooth spline going nearly through the points. Slide 9.136 illustrates the idea of a broken-up curve and defines a definition area for each curve between joints P2 , P3 and P3 , P4 . We enforce continuity of the curves at the joints, for example by saying that the tangent has to be identical. A spline that goes exactly through the data points is different from the spline that approximates the data points only. Note that the data points are called control points 4 . Of course the general idea of a spline function can be combined with Bezier as suggested in Slide 9.137 curve. For added flexibility we want to replace a single Bezier curve by two Bezier curves which are defined on a first and second part of the original Bezier curve. We solve this problem by finding auxiliary points and tangents such that the conditions apply, by propertionally segmenting distance as shown. Slide 9.139 illustrates the process. The technique is named after a French engineer Casteljeau. The single curve defined by P1 , P4 (and auxiliary points P2 , P3 ) is broken into two smaller curves defined by L1 , . . . , L4 and another curve defined by R1 , . . . , R4 . Spline functions of a special kind exist if we enforce that the tangents at the joint are parallel to the line going through adjacent neighboring joints. Slide 9.140 explains. The technique is named after Catmull-Rom. Prüfungsfragen: • In Abbildung B.26 sehen Sie vier Punkte P1 , P2 , P3 und P4 , die als Kontrollpunkte für eine Bezier-Kurve x(t) dritter Ordnung verwendet werden. Konstruieren Sie mit Hilfe des Verfahrens von Casteljau den Kurvenpunkt für den Parameterwert t = 31 , also x( 13 ), und erläutern Sie den Konstruktionsvorgang! Sie können das Ergebnis direkt in Abbildung B.26 eintragen, eine skizzenhafte Darstellung ist ausreichend. Hinweis: der Algorithmus, der hier zum Einsatz kommt, ist der gleiche, der auch bei der Unterteilung einer Bezier-Kurve (zwecks flexiblerer Veränderung) verwendet wird. Antwort: Die Strecken sind rekursiv im Verhältnis 4 in German: Pass-Punkte 1 3 : 2 3 zu teilen (siehe Abbildung 9.2). 9.22. GENERALIZATION TO 3 DIMENSIONS 187 Figure 9.2: Konstruktion einer Bezier-Kurve nach Casteljau 9.22 Generalization to 3 Dimensions Slide 9.142 suggests a general idea of taking the 2-dimensional discussions we just had and transporting them into 3 dimensions. Bezier, splines and so forth, all exist in 3-D as well. That in effect is where the applications are. Instead of having coordinates (x, y) or parameters t we now have coordinates (x, y, z) or parameters t1 , t2 . Instead of having points define a curve we now have a 3-dimensional arrangement of auxiliary points that serve to approximate a smooth 3D-surface. 9.23 Graz and Geometric Algorithms On a passing note, a disproportional number of people who have been educated at the TU Graz have become well-known and respected scientists in the discussion of geometric algorithms. Obviously, Graz has been a hot bed of geometric algorithms. Look out for classes on “Geometric Algorithms”. Note that these geometric algorithms we have discussed are very closely related to mathematics and really are associated with theoretical computer science and less so with computer graphics and image processing. The discussion of curves and surfaces also is a topic of descriptive geometry. In that context one speaks of “free-form curves and surfaces”. Look out for classes and that subject as well! 188 CHAPTER 9. TRANSFORMATIONS 9.23. GRAZ AND GEOMETRIC ALGORITHMS 189 Slide 9.1 Slide 9.2 Slide 9.3 Slide 9.4 Slide 9.5 Slide 9.6 Slide 9.7 Slide 9.8 Slide 9.9 Slide 9.10 Slide 9.11 Slide 9.12 Slide 9.13 Slide 9.14 Slide 9.15 Slide 9.16 Slide 9.17 Slide 9.18 Slide 9.19 Slide 9.20 Slide 9.21 Slide 9.22 Slide 9.23 Slide 9.24 Slide 9.25 Slide 9.26 Slide 9.27 Slide 9.28 190 CHAPTER 9. TRANSFORMATIONS Slide 9.29 Slide 9.30 Slide 9.31 Slide 9.32 Slide 9.33 Slide 9.34 Slide 9.35 Slide 9.36 Slide 9.37 Slide 9.38 Slide 9.39 Slide 9.40 Slide 9.41 Slide 9.42 Slide 9.43 Slide 9.44 Slide 9.45 Slide 9.46 Slide 9.47 Slide 9.48 Slide 9.49 Slide 9.50 Slide 9.51 Slide 9.52 Slide 9.53 Slide 9.54 Slide 9.55 Slide 9.56 9.23. GRAZ AND GEOMETRIC ALGORITHMS 191 Slide 9.57 Slide 9.58 Slide 9.59 Slide 9.60 Slide 9.61 Slide 9.62 Slide 9.63 Slide 9.64 Slide 9.65 Slide 9.66 Slide 9.67 Slide 9.68 Slide 9.69 Slide 9.70 Slide 9.71 Slide 9.72 Slide 9.73 Slide 9.74 Slide 9.75 Slide 9.76 Slide 9.77 Slide 9.78 Slide 9.79 Slide 9.80 Slide 9.81 Slide 9.82 Slide 9.83 Slide 9.84 192 CHAPTER 9. TRANSFORMATIONS Slide 9.85 Slide 9.86 Slide 9.87 Slide 9.88 Slide 9.89 Slide 9.90 Slide 9.91 Slide 9.92 Slide 9.93 Slide 9.94 Slide 9.95 Slide 9.96 Slide 9.97 Slide 9.98 Slide 9.99 Slide 9.100 Slide 9.101 Slide 9.102 Slide 9.103 Slide 9.104 Slide 9.105 Slide 9.106 Slide 9.107 Slide 9.108 Slide 9.109 Slide 9.110 Slide 9.111 Slide 9.112 9.23. GRAZ AND GEOMETRIC ALGORITHMS 193 Slide 9.113 Slide 9.114 Slide 9.115 Slide 9.116 Slide 9.117 Slide 9.118 Slide 9.119 Slide 9.120 Slide 9.121 Slide 9.122 Slide 9.123 Slide 9.124 Slide 9.125 Slide 9.126 Slide 9.127 Slide 9.128 Slide 9.129 Slide 9.130 Slide 9.131 Slide 9.132 Slide 9.133 Slide 9.134 Slide 9.135 Slide 9.136 Slide 9.137 Slide 9.138 Slide 9.139 Slide 9.140 194 CHAPTER 9. TRANSFORMATIONS Slide 9.141 Slide 9.142 Slide 9.143 Chapter 10 Data Structures 10.1 Two-Dimensional Chain-Coding Algorithm 25 Chain coding 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: resample boundary by selecting larger grid spacing starting from top left search the image rightwards until a pixel P[0] belonging to the region is found initialize orientation d with 1 to select northeast as the direction of the previous move initialize isLooping with true initialize i with 1 while isLooping do search the neighbourhood of the current pixel for another unvisited pixel P[i] in a clockwise direction beginning from (d + 7) mod 8, increasing d at every search step if no unvisited pixel found then set isLooping false else print d end if increase i end while We start from a raster image of a linear object. We are looking for a compact and economical representation by means of vectors. Slide 10.3 illustrates the 2-dimensional raster of a contour image, which is to be encoded by means of a chain-code. We have to make a decision about the level of generalization or elimination of detail. Slide 10.4 describes the 4 and 8 neighborhood for each pixel and indicates by a sequence of numbers how each neighbor is labeled as 1, 2, 3, 4, . . . , 8. Using this approach, we can replace the actual object by a series of pixels and in the process obtain a different resolution. We have resampled the contour of the object. Slide 10.6 shows how a 4-neighborhood and an 8-neighborhood will serve to describe the object by a series of vectors, beginning at an initial point. The encoding itself is represented by a string of integer numbers. Obviously we obtain a very compact representation of that contour. Next we can think of a number of normalizations of that coding scheme. We may demand that the sum of all codes be minimized. Instead of recording the codes themselves to indicate in which direction each vector points, we can look at code differences only, which would have the advantage that they are invariant under rotations. Obviously the object will look different if we change the direction of the grid at which we resample 195 196 CHAPTER 10. DATA STRUCTURES the contour. An extensive theory of chain codes has been introduced by H. Freeman and one of the best-known coding schemes is therefore also called the Freeman-Chain-Code. Prüfungsfragen: • Gegeben sei eine Punktfolge entsprechend Abbildung ?? und ein Pixelraster, wie dies in Abbildung ?? dargestellt ist. Geben Sie bitte sowohl grafisch als auch numerisch die kompakte Kettenkodierung dieser Punktfolge im Pixelraster an, welche mit Hilfe eines 8-Codes erhalten wird. 10.2 Two-Dimensional Polygonal Representations Algorithm 26 Splitting 1: 2: 3: 4: 5: 6: Splitting methods work by first drawing a line from one point on the boundary to another. Then, we compute the perpendicular distance from each point along the segment to the line. If this exceeds some threshold, we break the line at the point of greatest error. We then repeat the process recursively for each of the two new lines until we don’t need to break any more. For a closed contour, we can find the two points that lie farthest apart and fit two lines between them, one for one side and one for the other. Then, we can apply the recursive splitting procedure to each side. Let us assume that we do have an object with an irregular contour as shown in Slide 10.9 on the left side. We describe that object by a series of pixels and the transition from the actual detailed contour to the simplification of a representation by pixels must follow some rules. One of those is a minimum parameter rule which takes the idea of a rubber band that is fit along the contour pixels as shown on the right-hand side of Slide 10.9. At issue is many times the simplification of a shape in order to save space, while maintaining the essence of the object. Slide 10.10 explains how one may replace a polygonal representation of an object by a simplified minimum quadrangle. One will look for the longest distance that can be defined from points along the contour of the object. This produces a line segment ab. We then further subdivide that shape by looking for the longest line that is perpendicular to the axis that we just found. This produces a quadrangle. We can now continue on and further refine this shape by a simplifying polygon defining a maximum deviation between the actual object contour and its simplification. If the threshold value is set at 0.25 then we obtain the result shown in Slide 10.10. The process is also denoted as splitting (algorithm 26). Prüfungsfragen: • Wenden Sie den Splitting-Algorithmus auf Abbildung B.35 an, um eine vereinfachte zweidimensionale Polygonrepräsentation des gezeigten Objekts zu erhalten, und kommentieren Sie einen Schritt des Algorithmus im Detail anhand Ihrer Zeichnung! Wählen Sie den Schwellwert so, dass die wesentlichen Details des Bildes erhalten bleiben (der Mund der Figur kann vernachlässigt werden). Sie können das Ergebnis (und die Zwischenschritte) direkt in Abbildung B.35 einzeichnen. 10.3. A SPECIAL DATA STRUCTURE FOR 2-D MORPHING 197 Definition 27 2D morphing for lines Problems with other kinds of representation can be taken care of by the parametric representation. In Parametric representation a single parameter t can represent the complete straight line once the starting and ending points are given. In parametric representation x = X(t), y = Y (t) For starting point (x1, y1) and ending point (x2, y2) (x, y) (x, y) = (x1, y1) = (x2, y2) if if t=0 t=1 Thus any point (x, y) on the straight line joining two points (x1, y1) and (x2, y2) is given by x = x1 + t(x2 − x1) y = y1 + t(y2 − y1) 10.3 A Special Data Structure for 2-D Morphing Suppose the task is defined as in Slide 10.13 where an input figure, in this particular case a cartoon of President Bush, needs to be transformed into an output figure, namely the cartoon of President Clinton. The approach establishes a relationship between the object contour points of the input and output cartoons. Each point on the input cartoon will correspond to one or no point on the output cartoon. In order to morph the input into the output one needs now to take these vectors which link these points. We introduce a parametric representation x = fx (t), y = fy (t). We gradually increase the value of the parameter t from 0 to 1. At a value of the parameter t = 0 one has the Bush cartoon, at the parameter t = 1, one has the Clinton cartoon. The transition can be illustrated in as many steps as one likes. The basic concept is shown in Slide ?? and Slide 10.14 and the result is shown in Slide 10.15. Prüfungsfragen: • In Abbildung B.3 soll eine Karikatur des amerikanischen Ex-Präsidenten George Bush in eine Karikatur seines Amtsnachfolgers Bill Clinton übergeführt werden, wobei beide Bilder als Vektordaten vorliegen. Welches Verfahren kommt hier zum Einsatz, und welche Datenstrukturen werden benötigt? Erläutern Sie Ihre Antwort anhand einer beliebigen Strecke aus Abbildung B.3! 10.4 Basic Concepts of Data Structures For a successful data structure we would like to have a direct access to data independent of how big a data base is. We would like to have simple arrays, our data should be stored sequentially and we might use pointer lists, thus pointers, chains, trees, and rings. This all is applicable in geometric data represented by coordinates. Slide 10.17 illustrates how we can build a directed graph of some geometric entities that are built from points in 3-dimensional space with coordinates x, y, z at the base. From those points, we produce lists of edges which combine two points into an edge. From the edges, one builds regions or areas which combine edges into contours of areas. Slide 10.18 shows that we request an ease of dynamic changes in the data, so we can insert or delete points and objects or areas. We will also like to be able to change dynamically a visualization: if we delete an object we should not be required to completely recompute everything. We would 198 CHAPTER 10. DATA STRUCTURES like to have support for a hierarchical approach so that we can look at an overview as well as at detail. And we would like to be able to group objects into hyper-objects and we need to have a random access to arbitrary objects independent of the number of objects in the data base. Let us now examine a few data structures. Prüfungsfragen: • Erklären Sie, wie ein kreisfreier gerichteter Graph zur Beschreibung eines Objekts durch seine (polygonale) Oberfläche genutzt werden kann! 10.5 Quadtree Algorithm 27 Quadtree 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: {define datastructure quadtree} quadtree=(SW,SE,NW,NE:Pointer of quadtree,value) {SW south-western son, SE south-eastern son} {NW north-western son, NE north-eastern son} {value holds e.g. brightness} init quadtree = (NULL,NULL,NULL,NULL,0) while the entire image has not been segmented do segment actually processed area into 4 squares if there is no element of the object left in a subdivided square then link a leaf to the quadtree according to the actually processed square {leaf = quadtree(NULL,NULL,NULL,NULL,value)} else link new node to (SW or SE or NW or NE) of former quadtree according to the actually processed square {node = quadtree (SW,SE,NW,NE,0)} if node holds four leafs containing the same value then replace node with leaf containig value end if end if end while A quadtree is a tree data structure for 2-dimensional graphical data, where we subdivide the root, the 2-dimensional space, into squares of equal size, so we subdivide an entire area into 4 squares, we subdivide those 4 squares further into 4 squares and so forth. We number each quadrant as shown in Slide 10.20. Now if we have an object in an image or in a plane we describe the object by a quadtree by breaking up the area sequentially, until such time that there is no element of the object left in a subdivided square. In this case we call this a leaf of the tree structure, an empty leaf. So we have as a node a quadrant, and each quadrant has four pointers to its sons. The sons will be further subdivided until such time that there is either the entire quadrant filled with the object or it is entirely empty. A slight difference to the quadtree is the Bin-tree. In it, each node has only two sons and not four like in the quadtree. Slide 10.21 explains. If there is a mechanical part available as shown in Slide 10.22 then a pixel representation may be shown on the left and the quadtree representation at right. The quadtree is more efficient. There is an entire literature on geometric operations in quadtrees such as geometric transformations, scale changes, editing, visualization, Boolean operations and so forth. Slide 10.23 represents the mechanical part of Slide 10.24 in a quadtree representation. 10.6. DATA STRUCTURES FOR IMAGES 199 A quadtree has “levels of subdivisions”, obviously, and its root is at the highest level, with a single node. The next level up is shown in Slide 10.24 and has one empty and three full nodes which are further subdivided into a third level with some empty and some full leaves and some nodes that are further subdivided into a fourth level. The leafs are numbered sequentially from north-west to south-east. Slide 10.25 again illustrates how a raster image with pixels of equal area is converted into a quadtree representation. It is more efficient since there are fewer leafs in a quadtree than there are pixels in an image, except when the image is totally chaotic. One may want to store all leaves whether they are empty or full or one stores only the full leafs, thereby saving storage space. Typically this may save 60 percent as in the example of Slide 10.26. Prüfungsfragen: • Gegeben sei das binäre Rasterbild in Abbildung B.6. Gesucht sei die Quadtree-Darstellung dieses Bildes. Ich bitte Sie, einen sogenannten traditionellen“ Quadtree der Abbildung ” B.6 in einer Baumstruktur darzustellen und mir die quadtree-relevante Zerlegung des Bildes grafisch mitzuteilen. • Welche Speicherplatzersparnis ergibt sich im Fall der Abbildung B.6, wenn statt eines traditionellen Quadtrees jener verwendet wird, in welchem die Nullen entfernt sind? Wie verhält sich dieser spezielle Wert zu den in der Literatur genannten üblichen Platz-Ersparnissen? 10.6 Data Structures for Images So far we have looked at data structures for binary data, showing objects by means of their contours, or as binary objects in a raster image. In this chapter, we are looking at data structures for color and black and white gray value images. A fairly complete list of such data structures can be seen in PhotoShop (Slide 10.28 and Slide 10.29). Let us review a few structures as shown in Slide 10.30. We can store an image by storing it pixel by pixel, and all information that belongs to a pixel is stored sequentially, or we store row by row and we repeat say red, green, blue for each row of images or we can go band sequential which means we store a complete image, one for the red, one for the green, one for the blue channel. Those forms are called BSSF or BIFF (Band Sequential File Format or similar). The next category is the TIFF-format, a tagged image file format, another one is to store images in tiles, in little 32 by 32 or 128 by 128 windows. The idea of hexagonal pixels has been proposed. An important idea is that of pyramids, where a single image is reproduced at different resolutions, and finally representations of images by fractals or wavelets and so forth exist. Slide 10.31 illustrates the idea of an image pyramid. The purpose of pyramids is to start an image analysis process on a much reduced version of an image, e.g. to segment it into its major parts and then guide a process which refines the preliminary segmentation from resolution level to resolution level. This increases the robustness of an approach and also reduces computing times. At issue is how one takes a full resolution image and creates from it reduced versions. This may be by simple averaging or by some higher level processes and filters that create low resolutions from neighborhoods of higher resolution pixels. Slide 10.32 suggests that data structures for images are important in the context of image compression and we will address that subject under the title “Compression” towards the end of this class. Prüfungsfragen: • In Abbildung B.1 ist ein digitales Rasterbild in verschiedenen Auflösungen zu sehen. Das erste Bild ist 512 × 512 Pixel groß, das zweite 256 × 256 Pixel usw., und das letzte besteht 200 CHAPTER 10. DATA STRUCTURES nur mehr aus einem einzigen Pixel. Wie nennt man eine solche Bildrepräsentation, und wo wird sie eingesetzt (nennen Sie mindestens ein Beispiel)? • In Aufgabe B.1 wurde nach einer Bildrepräsentation gefragt, bei der ein Bild wiederholt gespeichert wird, wobei die Seitenlänge jedes Bildes genau halb so groß ist wie die Seitenlänge des vorhergehenden Bildes. Leiten Sie eine möglichst gute obere Schranke für den gesamten Speicherbedarf einer solchen Repräsentation her, wobei – das erste (größte) Bild aus N × N Pixeln besteht, – alle Bilder als Grauwertbilder mit 8 Bit pro Pixel betrachtet werden, – eine mögliche Komprimierung nicht berücksichtigt werden soll! Hinweis: Benutzen Sie die Gleichung Antwort: P∞ i=0 S(N ) < qi = N2 · 1 1−q für q ∈ R, 0 < q < 1. ∞ i X 1 i=0 4 1 1 − 14 1 4 = N2 · 3 = N2 3 4 = N2 · 10.7 Three-Dimensional Data The requirements for a successful data structure are listed in Slide 10.34. Little needs to be added to the contents of that slide. Prüfungsfragen: • Nennen Sie allgemeine Anforderungen an eine Datenstruktur zur Repräsentation dreidimensionaler Objekte! 10.8 The Wire-Frame Structure Definition 28 Wireframe structure The simplest three-dimensional data structure is the wire-frame. A wireframe model captures the shape of a 3D object in two lists, a vertex list and an edge list. The vertex list specifies geometric information: where each corner is located. The edge list provides conectivity information, specifying (in arbitrary order) the two vertices that form the endpoints of each edge. The vertex-lists are used to build edges, the edges build edge-lists which then build faces or facets and facets may build objects. In a wire-frame, there are no real facets, we simply go from edges to objects directly. The simplest three-dimensional data structure is the wire-frame. At the lowest level we have a list of three-dimensional coordinates. The point-lists are used to build edges, the edges build edge-lists which then build faces or facets and facets may build objects. In a wire-frame, there are no real facets, we simply go from edges to objects directly. Slide 10.36 shows the example of a cube with 10.9. OPERATIONS ON 3-D BODIES 201 the object, the edge-lists and the point-lists. The edges or lines and the points or vertices are again listed in Slide 10.37 for a cube. In Slide 10.38 the cube is augmented by an extra-plane and represented by two extra vertices and three extra lines. Prüfungsfragen: • In Abbildung B.2 ist das Skelett eines menschlichen Fußes in verschiedenen Darstellungstechniken gezeigt. Benennen Sie die vier Darstellungstechniken! 10.9 Operations on 3-D Bodies Assume that we have 2 cubes, A and B, and we need to intersect them. A number of Boolean operations can be defined as an intersection or a union of 2 bodies, subtracting B from A or A from B leading to different results. 10.10 Sweep-Representations A sweep-representation creates a 3-D object by means of a 2-D shape. An object will be created by moving the 2-D representation through 3-D space denoting the movement as sweep. We may have a translatory or a rotational sweep as shown in Slide ?? and Slide 10.43. A translatory sweep can be obtained by a cutting tool. A rotational sweep obviously will be obtained by a rotational tool. We have in Slide 10.43 the cutting tool, the model of a part and the image of an actual part as produced in a machine. Prüfungsfragen: • Was versteht man unter einer Sweep“-Repräsentation? Welche Vor- und Nachteile hat diese ” Art der Objektrepräsentation? • In Abbildung B.70 ist ein Zylinder mit einer koaxialen Bohrung gezeigt. Geben Sie zwei verschiedene Möglichkeiten an, dieses Objekt mit Hilfe einer Sweep-Repräsentation zu beschreiben! 10.11 Boundary-Representations A very popular representation of objects is by means of their boundaries. Generally, these representations are denoted as B-reps. They are built from faces with vertices and edges. Slide 10.45 illustrates an object and asks the question of how many objects are we facing here, how many faces, how many edges and so forth? A B-rep system makes certain assumptions about the topology of an object. In Slide 10.46 we show a prism that is formed from 5 faces, 6 vertices, 9 edges. A basic assumption is that differential small pieces on the surface of the object can be represented by a plane as shown in the left and central elements of Slide 10.46. On the right-hand side of Slide 10.46 is a body that does not satisfy the demands on a 2-manifold topology and that is the type of body we may have difficulties with in a B-rep system. A boundary representation takes advantage of Euler’s Formula. It relates the number of vertices, faces and edges to one another as shown in Slide 10.47. A simple polyhedron is a body that can be deformed into a sphere and therefore has no holes. In this case, Euler’s Formula applies. Slide 10.48 shows three examples that confirm the validity of Euler’s Formula. Slide 10.49 illustrates a body with holes. In that case, Euler’s formula needs to be modified. 202 CHAPTER 10. DATA STRUCTURES Prüfungsfragen: • Finden Sie eine geeignete Bezeichnung der Elemente in Abbildung B.10 und geben Sie die Boundary-Representation dieses Objekts an (in Form von Listen). Achten Sie dabei auf die Reihenfolge, damit beide Flächen in die gleiche Richtung weisen“! ” • In Abbildung B.2 ist das Skelett eines menschlichen Fußes in verschiedenen Darstellungstechniken gezeigt. Benennen Sie die vier Darstellungstechniken! 10.12 A B-Rep Data Structure Definition 29 Boundary representation A B-Rep structure describes the boundary of an object with the help of 3-dimensional Polygon Surfaces. The B-Rep model consists of three different object types: vertices, edges and surfaces. The B-Rep strucure is often organized in: • V: A set of vertices (Points in 3D-Space) • E: A set of edges. The edges are defined by 2 points referenced from E • S: A set of surfaces. Each surface is defined by a sequence of edges from V (at least 3 edges define a surface) The direction of the normal vector of the surfaces is usually given by the order of its edges (clockwise or counterclockwise). Due to the referencing the B-Rep permits a redundancy-free managment of the geometric information. A B-rep structure is not unlike a wire-frame representation, but it does represent an object with pointers to polygons and lists of polygons with pointers to edges and one differentiates between spaces that are outside and inside the object taking advantage of the sequence of edges. Slide 10.52 illustrates a body that is represented by 2 faces in 3-D. We show the point-list, list of edges and list of phases. Slide 10.53 illustrates a B-rep representation of a cube with the list of faces, the list of edges, the point-lists, and the respective pointers. Slide 10.54 explains the idea of inside and outside directions for each face. The direction of the edges defines the direction of the normal vector onto a face. As shown in Slide 10.54, A would be inside of B in one case, and outside of B in the other depending on the direction of the normal onto face B. 10.13 Spatial Partitioning An entirely different approach to 3-dimensional data structures is the idea of a spatial partitioning approach. In Slide 10.56 we choose the primitives to be prisms and cubes. They build the basic cells for a decomposition. From those basic elements we can now build up various shapes as shown in that slide. A special case occurs if the primitive is a cube of given size as shown in Slide 10.57. Slide 10.58 introduces the idea of the oct-tree which is the 3-dimensional analogon to the quadtree. Slide 10.59 explains how the 3 dimensional space as a root is decomposed into 8 sons, which then are further decomposed until there is no further decomposition necessary because each son is either empty or full. The example of Slide 10.59 has 2 levels and therefore the object can be created from 2 types of cubes. Slide 10.60 illustrates the resulting representation in a computer that takes 10.14. BINARY SPACE PARTITIONING BSP 203 the root, subdivides it into 8 sons, calls them either white or black and if it needs to be further subdivided then substitutes for the element another expression with 8 sons and so forth. Slide 10.61 illustrates an oct-tree representation of a coffee cup. We can see how the surface, because of its curvature, requires many small cubes to be represented whereas on the inside of the cup the size of the elements increases. The data structure is very popular in medical imaging because there exist various sensor systems that produce voxels, and those voxels can be generalized into oct-trees, similar to pixels that can be generalized into quadtrees in 2 dimensions. Prüfungsfragen: • Erklären Sie den Begriff spatial partitioning“ und nennen Sie drei räumliche Datenstruk” turen aus dieser Gruppe! 10.14 Binary Space Partitioning BSP Definition 30 Cell-structure An example for a 3-dimensional data structure is the idea of spatial partitioning. Therefor some primitives like prisms or cubes are choosen. These primitives build the ”CELLS” for a decomposition of an object. Every geometrical object can be build with these cells. A special case occurs, if the primitive is an object of a given size. A very common datastructure to find the decomposition is the oct-tree. The root (3-dimensional space) of the oct-tree is subdivided into 8 cubes of equal size and these resulting cubes are subdivided themselve again until there is no further decomposition necessary. A son in the tree is marked as black or white (represented or not) or is marked as gray. Then a further decomposition is needed. This type of datastructure is very popular in medical imaging. The different sensor systems, like ”Computer Aided Tomography”, are producing voxels. These voxels can be generalized into oct-trees. A more specific space partitioning approach is the Binary Space Partitioning or BSP. We subdivide space by means of planes that can be arbitrarily arranged. The Binary Space Partition is a tree in which the nodes are represented by the planes. Each node has two sons which are the spaces which result on the two sides of a plane, we have the inner and the outer half space. Slide 10.63 illustrates the basic idea in 2 dimensions where the plane degenerates into straight lines. The figure on the left side of Slide 10.63 needs to be represented by a BSP structure. The root is the straight line a, subdividing a 2-D space into half spaces, defining an outside and an inside by means of a vector shown on the left side of the slide. There are two sons and we take the line b and the line j as the two sons. We further subdivide the half-spaces. We go on until the entire figure is represented in this manner. A similar illustration representing the same idea is shown in Slide 10.64. At the root is line 1, the outside half-space is empty, the inside half-space contains line 2, again with an outside space empty and the inside space containing line 3 and we repeat the structure. If we start out with line 3 at the root, we obtain a different description of the same object. We have in the outside half-space line 4 and on the inside half-space line 2, and now the half-space defined by line 4 within the half-space defined by line 3 contains only the line segment 1b and the other half-space as seen from line 3 which is then further subdivided into a half-space by line 2 contains line-segment 1a. The straight line 1 in this case is appearing twice, once in the form of 1a and another time in the form of 1b. 204 CHAPTER 10. DATA STRUCTURES Algorithm 28 Creation of a BSP tree 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: polygon root; {current Root-Polygon} polygon *backList, *frontList; {polygons in current Halfspaces} polygon p, backPart, frontPart; {temporary variables} if (polyList == NULL) then return NULL; {no more polygons in this halfspace} else root = selectAndRemovePolygon(&polyList); {prefer polygons defining planes that don’t intersect with other polygons} backList = NULL; frontList = NULL; for (each remaining polygon in polyList) do if (polygon p in front of root) then addToList(p, &frontList); else if (polygon p in back of root) then addToList(p, &backList); else {polygon p must be split} splitPoly(p, root, &frontPart, &backPart); addToList(frontPart, &frontList); addToList(backPart, &backList); end if end if end for return new BSPTREE(root, makeTree(frontList), makeTree(backList)); end if Prüfungsfragen: • Geben Sie einen Binary Space Partitioning Tree“ (BSP-Tree) mit möglichst wenig Knoten ” für das Polygon aus Abbildung B.17 an und zeichnen Sie die von Ihnen verwendeten Trennebenen ein! 10.15 Constructive Solid Geometry, CSG This data structure takes 3D-primitives as input and produces Boolean operations, translations, scaling and rotational operators to construct 3-dimensional objects from the primitives. Slide 10.66 and Slide 10.67 explain. A complex object as shown in Slide 10.67 may be composed of a cylinder with an indentation and a rectangular body of which a corner is cut off. The cylinder itself is obtained by subtracting a smaller cylinder from a larger cylinder. The cut-off is obtained by subtracting from a fully rectangular shape another rectangular shape. So we have 2 subtractions and one union to produce our object. In Slide 10.67 we have again 2 primitives, a block and a cylinder, we can scale them, so we start out with two types of blocks and two types of cylinders. By an operation of intersection union and difference we obtain a complicated object from those primitives. Slide 10.68 explains how Constructive Solid Geometry can produce a result in two different ways. We can take two blocks and subtract them from one another or we can take two blocks and form the union of them to obtain a particular shape. We cannot say generally that those two operations are equivalent, because if we change the shapes of the two blocks, the same two operations may not result in the same object shown in Slide 10.68. 10.16. MIXING VECTORS AND RASTER DATA 205 Prüfungsfragen: • Gegeben sei der in Abbildung B.7 dargestellte Tisch (ignorieren Sie die Lampe). Als Primitiva bestehen Quader und Zylinder. Beschreiben Sie bitte einen CSG-Verfahrensablauf der Konstruktion des Objektes (ohne Lampe). 10.16 Mixing Vectors and Raster Data When we have photo-realistic representations of 3-D objects, we may need to mix data structures, e.g. vector data or three-dimensional data structures representing 3-D objects and raster data coming from images. The example of city models has illustrated this issue. introduces a particular hierarchical structure for the geometric data. It is called the LoD/R-Tree-data structure for Level of Detail and Rectangular Tree-structure. The idea is that objects are approximated by boxes in 3D generalized from rectangles in 2 dimensions. These blocks can overlap and so we have the entire city being at the root of a tree, represented by one block. Each district now is a son of that root and is represented by blocks. Within each district we may have city blocks, within the city blocks we may have buildings, and one particular building may therefore be the leaf of this data structure. We also have the problem of a level of detail for the photographic texture. We create an image pyramid by image processing and then store the pyramids and create links to the geometric elements in terms of level of detail, so that if we have wanted an overview of an object we get very few pixels to process. If we take a vantage point to look at the city, we have in the foreground a high resolution for the texture and in the background low resolution. So we precompute per vantage point a hierarchy of resolutions that may fall within the so-called View-Frustum. As we change our vantage point by rotating our eyes, we have to call up from a data base a related element. If we move, thus change our position, we have to call up from the data base different elements at high resolution and elements at low resolution. Slide 10.72 illustrates how the vector data structure describes nothing but the geometry whereas the raster data describes the character of the object in Slide 10.73. We may also use a raster data structure for geometric detail as shown in Slide 10.74. In that case we have an (x, y) pattern of pixels and we associate with each pixel not the gray value but an elevation representing therefore a geometry in the form of a raster which we otherwise have typically used for images only. 10.17 Summary We summarize the various ideas for data structures of spatial objects, be they in 2D or in 3D. Slide 10.76 addresses 3D. Prüfungsfragen: • In Abbildung B.2 ist das Skelett eines menschlichen Fußes in verschiedenen Darstellungstechniken gezeigt. Benennen Sie die vier Darstellungstechniken! 206 CHAPTER 10. DATA STRUCTURES 10.17. SUMMARY 207 Slide 10.1 Slide 10.2 Slide 10.3 Slide 10.4 Slide 10.5 Slide 10.6 Slide 10.7 Slide 10.8 Slide 10.9 Slide 10.10 Slide 10.11 Slide 10.12 Slide 10.13 Slide 10.14 Slide 10.15 Slide 10.16 Slide 10.17 Slide 10.18 Slide 10.19 Slide 10.20 Slide 10.21 Slide 10.22 Slide 10.23 Slide 10.24 Slide 10.25 Slide 10.26 Slide 10.27 Slide 10.28 208 CHAPTER 10. DATA STRUCTURES Slide 10.29 Slide 10.30 Slide 10.31 Slide 10.32 Slide 10.33 Slide 10.34 Slide 10.35 Slide 10.36 Slide 10.37 Slide 10.38 Slide 10.39 Slide 10.40 Slide 10.41 Slide 10.42 Slide 10.43 Slide 10.44 Slide 10.45 Slide 10.46 Slide 10.47 Slide 10.48 Slide 10.49 Slide 10.50 Slide 10.51 Slide 10.52 Slide 10.53 Slide 10.54 Slide 10.55 Slide 10.56 10.17. SUMMARY 209 Slide 10.57 Slide 10.58 Slide 10.59 Slide 10.60 Slide 10.61 Slide 10.62 Slide 10.63 Slide 10.64 Slide 10.65 Slide 10.66 Slide 10.67 Slide 10.68 Slide 10.69 Slide 10.70 Slide 10.71 Slide 10.72 Slide 10.73 Slide 10.74 Slide 10.75 Slide 10.76 210 CHAPTER 10. DATA STRUCTURES Chapter 11 3-D Objects and Surfaces 11.1 Geometric and Radiometric 3-D Effects We are reviewing various effects we can use to model and perceive the 3-dimensional properties of objects. This could be radiometric or geometric effects of reconstructing and representing objects. When we look at a photograph of a landscape as in Slide 11.3, we notice various depth cues. Slide 11.4 summarizes these and other depth cues. Total of eight different cues are being described. For example, colors tend to become bluer as the objects are farther away. Obviously, objects that are nearby would cover and hide objects that are farther away. Familiar objects, such as buildings will appear smaller as the distance grows. Our own motion will make nearby things move faster. We have spatial viewing by stereoscopy. We have brightness that reduces as the distance grows. Focus for one distance will have to change at others distance. Texture of a nearby object will become simple shading of a far-away object. Slide 11.5 shows that one often times differentiates between so-called two-dimensional, two-anda-half and three-dimensional objects. When we deal with two-and-half objects, we deal with one surface of that object, essentially a function z(x, y) that is single-valued. In contrast a threedimensional object may have multiple values of z for a given x and y. Slide 11.5 is a typical example of a two-and-a-half-dimensional object, Slide 11.7 of a three-D object. Prüfungsfragen: • Man spricht bei der Beschreibung von dreidimensionalen Objekten von 2 21 D- oder 3DModellen. Definieren Sie die Objektbeschreibung durch 2 12 D- bzw. 3D-Modelle mittels Gleichungen und erläutern Sie in Worten den wesentlichen Unterschied! 11.2 Measuring the Surface of An Object (Shape from X) ”Computer Vision“ is an expression that is particularly used when dealing with 3-D objects. Methods that determine the surface of an object are numerous. One generally denotes methods that will create a model of one side of a object (a two-and-a-half-dimensional model), as shapefrom-X. One typically will include the techniques which use images as the source of information. In Slide 11.9 we may have sources of shape information that are not images. Slide 11.10 highlights the one technique that is mostly used for small objects that can be placed inside a measuring device. This may or may not use images to support the shape reconstruction. A laser may scan a profile across the object, measuring the echo-time, and creating the profile sequentially across the 211 212 CHAPTER 11. 3-D OBJECTS AND SURFACES object thereby building up the shape of the object. The object may rotate under a laser scanner, or the laser scanner may rotate around the object. In that case we obtain a complete three-D model of the object. Such devices are commercially available. For larger objects airborne laser scanners exist such as shown in Slide 11.11 and previously discussed in the Chapter 2. A typical product of an airborne laser scanner is shown in Slide 11.12. The next technique is so-called Shape-from-Shading. In this technique, an illuminated object’s gray tones are used to estimate a slope of the surface at each pixel. Integration of the slopes to a continuous surface will lead to a model of the surface’s shape. This technique is inherently unstable and under-constrained. There is not a unique slope associated with a pixel’s brightness. The same gray value may be obtained from various illumination directions and therefore slopes. In addition, the complication with this technique is that we must that the reflectance properties of the surface. We have knowledge in an industrial environment where parts of known surface properties are kept in a box and a robot needs to recognize the shape. In natural terrain, shading alone is an insufficient source of information to model the surface shape. Slide 11.14 suggests an example where a picture of a sculpture of Mozart is used to recreate the surface shape. With perfectly known surface properties and with a known light source, we can cope with the variables and constrain the problem sufficiently to find a solution. An analogy of Shape-from-Shading is Photometric Stereo, where multiple images are taken of a single surface from multiple vantage points that are known, but where the geometry of the individual images is identical, only the illumination is not. This can be used in microscopy as shown in the example of Slide 11.16. Shape-from-Focus is also usable in microscopes, but also in a natural environment with small objects. A Shape-from-Focus imaging system finds the portion of an object that is in focus, thereby producing a contour of the object. By changing the focal distance we obtain a moving contour and can reconstruct the object. Slide 11.18 illustrates a system that can do a shape reconstruction in real time using the changing focus. Slide 11.19 illustrates two real-time reconstructions by Shape-from-Focus. Slide 11.20 has additional examples. The method of Structured Light projects a pattern onto an object and makes one or more images of the surface with the pattern. Depending on the type of patterns we can from a single image reconstruct the shape, or we can use the pattern as a surface texture to make it easy for an algorithm to find overlapping image points in the stereo-method we will discuss in a moment. Slide 11.22 through Slide 11.25 illustrate the use of structured light. In case of Slide 11.22 and Slide 11.23 a stereo-pair is created and matching is made very simple. Slide 11.24 illustrates the shape that is being reconstructed. Slide 11.25 suggests that by using a smart pattern, we can reconstruct the shape from the gray-code that is being projected. Slide 11.27 illustrates a fairly new technique for mapping terrain using interferometric radar. A single radar pulse is being transmitted from an antenna in an aircraft or satellite and this is reflected off the surface of the Earth and is being received by the transmitting antenna and an auxiliary second antenna that is placed in the vicinity of the first one, say at the two wings of an airplane. The difference in arrival time of the echoes at the two antennas is indicative of the angle under which the pulse has traveled to the terrain and back. The method is inherently accurate to within the wavelength of the used radiation. This technique is available even for satellites, with two antennas on the space shuttle (NASA mission SRTM for Shuttle Radar Topography Mission, 1999), or is applicable to systems with a single antenna on a satellite, where the satellite repeats an orbit very close, to within a few hundred meters of the original orbit, and in the process produces a signal as if the two antennas had been carried along simultaneously. The most popular and most widely used technique of Shape-from-X is the stereo-method . Slide 11.29 suggests a non-traditional arrangement, where two cameras take one image each of a scene where the camera’s stereo-base b is the distance from one another. Two objects Pk and Pt are at different depths as seen from the stereo-base and we can from the two images determine a parallactic angle γ which allows us to determine the depth difference between the two points. 11.3. SURFACE MODELING 213 Obviously, a scene as shown in Slide 11.29 will produce a 2-D representation on a single image in which the depth between Pt and Pk is lost. However, given two images, we can determine the angle (and the distance to point Pk and we can also determine the angle dγ (and obtain the position of point Pt at a depth different from Pk ’s. Slide 11.30 illustrates two images of a building. The two images are illuminated in the same manner by the sunlight. The difference between the two images is strictly geometrical. We have lines in the left image and corresponding lines in the right images that are called “epi-polar lines”. Those are intersections of a special plane in 3-d space with each of the two images. These planes are formed by the two projection centers and a point on the object. If we have a point on the line of the left image, we know that it’s corresponding matching point must be on the corresponding epi-polar line in the right image. Epi-polar lines help in reducing the searching for match points for automated stereo. Slide 11.31 is a stereo representation from an electron microscope. Structures are very small, pixels may have the size of a few nanometers in object-space. We do not have a center-perspective camera model as the basis for this type of stereo. However, the electron microscopic mode of imaging can be modeled and we can reconstruct the surface by a method similar to classical camera stereo. Slide 11.32 addresses a last technique of Shape-from-X, tomography. Slide ?? and Slide ?? illustrate from medical imaging, a so-called computer-aided tomographic CAT scan of a human scull. Individual images represent a slice through the object. By stacking up a number of those images we obtain a replica of the entire original space. Automated methods exist that collect all the voxels that belong to a particular object and in the process determine the surface of that object. The result is shown in Slide 11.34. Prüfungsfragen: • Erstellen Sie bitte eine Liste aller Ihnen bekannten Verfahren, welche man als Shape-from” X“ bezeichnet. • Wozu dient das sogenannte photometrische Stereo“? Und was ist die Grundidee, die diesem ” Verfahren dient? • In der Vorlesung wurden Tiefenwahrnehmungshilfen ( depth cues“) besprochen, die es dem ” menschlichen visuellen System gestatten, die bei der Projektion auf die Netzhaut verlorengegangene dritte Dimension einer betrachteten Szene zu rekonstruieren. Diese Aufgabe wird in der digitalen Bildverarbeitung von verschiedenen shape from X“-Verfahren gelöst. Welche ” depth cues“ stehen in unmittelbarem Zusammenhang mit einem entsprechenden shape ” ” from X“-Verfahren, und für welche Methoden der natürlichen bzw. künstlichen Tiefenabschätzung kann kein solcher Zusammenhang hergestellt werden? 11.3 Surface Modeling There is an entire field of study to optimally model a surface from the data primitives one may have obtained from stereo or other Shape-from-X techniques. We are dealing with point clouds, connecting the point clouds to triangles, building from triangles polygonal faces, then take the faces and replace them by continuos functions such as bi-cubic or quadric functions. Slide 11.36 illustrates a successfully constructed network of triangles, using as input a set of points created from stereo. Slide 11.37 illustrates the triangles formed from all the photogrammetrically obtained points of Emperor Charles in the National Library in Vienna. Also shown is a rendering of that surface using photographic texture. calls to mind that these problems of creating a surface from measured points. triangulating points etc have been previously discussed in the Chapters 9 and 10. 214 CHAPTER 11. 3-D OBJECTS AND SURFACES 11.4 Representing 3-D Objects In representing 3-D objects we have to cope with 2 important subjects: • hidden edges and hidden surfaces • the interaction of light and material In dealing with hidden edges and surfaces, we essentially differentiate among two classes of procedures. The first is an image space method where we go through all the pixels of an image and find the associated object point that is closest to the image. This method is very susceptible to aliasing effects. The object space method searches through the object among all object elements and is checking what can be seen from the vantage point of the user. These techniques are less prone to suffer from aliasing. The issue of hidden lines or surfaces is illustrated in Slide 11.40 with a single-valued function y = f (x, z). We might represent this surface by drawing profiles from the left edge to the right edge of the 2-D surface. The resulting image in Slide 11.40 is not very easily interpreted. Slide 11.42 illustrates the effect of removing hidden lines. Hidden lines are being removed by going from profile to profile through the data set and plotting them into a 2-D form as shown in Slide 11.43. Each profile is compared with the background and we can find by a method of clipping which surface elements are hidden by previous profiles. This can be done in one dimension as shown in Slide 11.43 and then in a second dimension (Slide 11.44). When we look at Slide 11.44 we might see slight differences between two methods of hidden line removal in case (c) and case (d). Many tricks are being applied to speed up the computation of hidden lines and surfaces. One employs the use of neighborhoods or some geometric auxiliary transformations, some accelerations using bounding boxes around objects or finding surfaces that are facing away from the view position (back-face culling), a subdivision of the view frustum and the use of hierarchies. Slide 11.46 illustrates the usefulness of enclosing rectangles or bounding boxes. Four objects exist in a 3-D space and it is necessary to decide which ones cover the other ones up. Slide 11.47 illustrates that the bounding box approach, while helping many times, may also mislead one to suspecting overlaps when there are none. Prüfungsfragen: • Bei der Erstellung eines Bildes mittels recursive raytracing“ trifft der Primärstrahl für ein ” bestimmtes Pixel auf ein Objekt A und wird gemäß Abbildung B.11 in mehrere Strahlen aufgeteilt, die in weiterer Folge (sofern die Rekursionstiefe nicht eingeschränkt wird) die Objekte B, C, D und E treffen. Die Zahlen in den Kreisen sind die lokalen Intensitäten jedes einzelnen Objekts (bzgl. des sie treffenden Strahles), die Zahlen neben den Verbindungen geben die Gewichtung der Teilstrahlen an. Bestimmen Sie die dem betrachteten Pixel zugeordnete Intensität, wenn 1. die Rekursionstiefe nicht beschränkt ist, 2. der Strahl nur genau einmal aufgeteilt wird, 3. die Rekursion abgebrochen wird, sobald die Gewichtung des Teilstrahls unter 15% fällt! Kennzeichnen Sie bitte für die letzten beiden Fälle in zwei Skizzen diejenigen Teile des Baumes, die zur Berechnung der Gesamtintensität durchlaufen werden! Antwort: 11.5. THE Z-BUFFER 215 1. ohne Beschränkung: I = = = = = 2.7 + 0.1 · 2 + 0.5 · (3 + 0.4 · 2 + 0.1 · 4) 2.7 + 0.2 + 0.5 · (3 + 0.8 + 0.4) 2.9 + 0.5 · 4.2 2.9 + 2.1 5 2. Rekursionstiefe beschränkt: I = 2.7 + 0.1 · 2 + 0.5 · 3 = 2.7 + 0.2 + 1.5 = 4.4 3. Abbruch nach Gewichtung: I 11.5 = 2.7 + 0.5 · (3 + 0.4 · 2) = 2.7 + 0.5 · 3.8 = 2.7 + 1.9 = 4.6 The z-Buffer Algorithm 29 z-buffer 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: Set zBuffer to infinite for all possible polygons plg that have to be drawn do for all possible scanlines scl of that polygon plg do for all possible pixels pxl of that scanline scl do if z-Value of pixel pz is nearer than zBuffer then set zBuffer to z-Value of pixel pz draw pixel pxl end if end for end for end for The most popular approach to hidden line and surface removal is the well-known z-Buffer method (algorithm 29). It has been introduced in 1974 and uses a transformation of an object’s surface facets into the image plane and keeping track at each pixel of the distance between the camera and the corresponding element on an object facet. One is keeping that gray value in each pixel which comes from an object point that is closest to the image plane. Another procedure is illustrated in Slide 11.50 with an oct-tree. The view reference point V as shown in that slide leads to labeling of the octtree space and shows that the element 7 will be seen most. Prüfungsfragen: • Die vier Punkte aus Aufgabe B.2 bilden zwei Strecken A = p1 p2 , B = p3 p4 , 216 CHAPTER 11. 3-D OBJECTS AND SURFACES z 12 11 10 9 8 7 6 A B 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 x − − B B A A B B B − − Figure 11.1: grafische Auswertung des z-Buffer-Algorithmus deren Projektionen in Gerätekoordinaten in der Bildschirmebene in die gleiche Scanline fallen. Bestimmen Sie grafisch durch Anwendung des z-Buffer-Algorithmus, welches Objekt (A, B oder keines von beiden) an den Pixelpositionen 0 bis 10 dieser Scanline sichtbar ist! Hinweis: Zeichnen Sie p1 p2 und p3 p4 in die xz-Ebene des Gerätekoordinatensystems ein! Antwort: siehe Abbildung 11.1 11.6 Ray-tracing The most popular method to find hidden surfaces (but also used in other contexts) is the so-called ray-tracing method. Slide 11.52 illustrates the basic idea that we have a projection center, an image window and the object space. We cast a ray from the projection center through a pixel into the object space and check to see where it hits the objects. To accelerate the ray-tracing we subdivide the space and instead of intersecting the ray with each actual object we do a search through the bounding boxes surrounding the objects. In this case we can dismiss many objects because they are not along the path of the ray that is cast for a particular pixel. Pseudocode can be seen in algorithm 30 Prüfungsfragen: • Beschreiben Sie das ray-tracing“-Verfahren zur Ermittlung sichtbarer Flächen! Welche ” Optimierungen können helfen, den Rechenaufwand zu verringern? Antwort: Vom Projektionszentrum aus wird durch jedes Pixel der Bildebene ein Strahl in die Szene geschickt und mit allen Objekten geschnitten. Von allen getroffenen Objekten bestimmt jenes, dessen Schnittpunkt mit dem Strahl dem Projektionszentrum am nächsten liegt, den Farbwert des Pixels. – Die Zahl der benötigten Schnittberechnungen kann durch Verwendung von hierarchischen bounding-Volumina stark reduziert werden. – Das getroffene Objekt (bei recursive ray-tracing nur im ersten Schnitt) kann auch mit Hilfe des z-buffer Algorithmus ermittelt werden. 11.6. RAY-TRACING Algorithm 30 Raytracing for Octrees Raytracing - Algorithmus Für jede Zeile des Bildes Für jedes Pixel der Zeile Bestimme Strahl vom Auge zum Pixel; Pixelfarbe = Raytrace(Strahl); Raytrace(Strahl) Für alle Objekte der Szene Wenn Strahl Objekt schneidet und Schnittpunkt ist bisher am nächsten notiere Schnitt; Wenn kein Schnitt dann Ergebnis:=Hintergrundfarbe sonst Ergebnis:= Raytrace(reflektierter Strahl) + Raytrace(gebrochener Strahl); Für alle Lichtquellen Für alle Objekte der Szene Wenn Strahl zur Lichtquelle Objekt schneidet Schleifenabbruch, nächste Lichtquelle Wenn kein Schnitt gefunden Ergebnis += lokale Beleuchtung Octree - Implementierung Aufbau Lege Quader q um Szene Für alle Objekte o Einfügen(o, q) Einfügen (Objekt o, Quader q) Für alle acht Teilquader t von q Wenn o ganz in t passt Ggf. t erstellen Einfügen(o,t ) return Ordne Objekt o Quader q zu Schnitt Schnitt (Quader q, Strahl s) Wenn q leer return NULL Wenn Schnitttest(q, s) Für alle acht Teilquader t von q res += Schnitt(t, s) Für alle zugeordneten Objekte o res += Schnitttest(o, s) return nächsten Schnitt(res) 217 218 11.7 CHAPTER 11. 3-D OBJECTS AND SURFACES Other Methods of Providing Depth Perception Numerous methods exist to help us create the impression of depth in the rendering of a 3-D model. These may include coding by brightness or coding in color. Slide 11.55 illustrates depth encoding by means of the brightness of lines. The closer an object is to the viewer, the brighter it is. In Slide 11.56 we even add color to help obtaining a depth perception. Of course the depth perception improves dramatically if we use the removal of edges as shown in Slide 11.57. We now can take advantage of our knowledge that nearby objects cover up objects that are farther away. Slide 11.60 indicates that the transition to illumination methods for rendering 3-D objects is relevant for depth perception. Slide 11.58 introduces the idea of halos to represent 3-D objects, and Slide 11.59 is an example. At first we see a wire-frame model of a human head and we see the same model after removing the hidden lines but also interrupting some of the lines when they intersect with other lines. The little interruption is denoted as a halo. 11.7. OTHER METHODS OF PROVIDING DEPTH PERCEPTION 219 220 CHAPTER 11. 3-D OBJECTS AND SURFACES Slide 11.1 Slide 11.2 Slide 11.3 Slide 11.4 Slide 11.5 Slide 11.6 Slide 11.7 Slide 11.8 Slide 11.9 Slide 11.10 Slide 11.11 Slide 11.12 Slide 11.13 Slide 11.14 Slide 11.15 Slide 11.16 Slide 11.17 Slide 11.18 Slide 11.19 Slide 11.20 Slide 11.21 Slide 11.22 Slide 11.23 Slide 11.24 Slide 11.25 Slide 11.26 Slide 11.27 Slide 11.28 11.7. OTHER METHODS OF PROVIDING DEPTH PERCEPTION 221 Slide 11.29 Slide 11.30 Slide 11.31 Slide 11.32 Slide 11.33 Slide 11.34 Slide 11.35 Slide 11.36 Slide 11.37 Slide 11.38 Slide 11.39 Slide 11.40 Slide 11.41 Slide 11.42 Slide 11.43 Slide 11.44 Slide 11.45 Slide 11.46 Slide 11.47 Slide 11.48 Slide 11.49 Slide 11.50 Slide 11.51 Slide 11.52 Slide 11.53 Slide 11.54 Slide 11.55 Slide 11.56 222 CHAPTER 11. 3-D OBJECTS AND SURFACES Slide 11.57 Slide 11.58 Slide 11.59 Slide 11.60 Chapter 12 Interaction of Light and Objects Radiation and the natural environment have a complex interaction. If we assume as in Slide 12.2 that the sun illuminates the Earth, we have atmospheric scattering as the radiation approaches the surface. We have atmospheric absorption that reduces to power in the light coming from the sun. Then we have reflection of the top surface, which can be picked up by a sensor and used in image formation. The light will go through an object and will be absorbed, but at the same time an object might emit radiation, such as for example in the infrared wave length. Finally the radiation will hit the ground and might again be absorbed, reflected or emitted. As the light returns from the Earth’s surface to the sensor we again have atmospheric absorption and emission. In remote sensing many of those factors will be used to describe and analyze objects based on sensed images. In computer graphics we use a much simplified approach. 12.1 Illumination Models Definition 31 Ambient light In the Ambient Illumination Model the light intensity I after reflection from an object´s surface is given by the equation I = Ia ka Ia is the intensity of ambient light, assumed to be constant for all objects. ka is a constant between 0 and 1, called ambient-reflection coefficient. ka is a material property, and must be defined for every object. Ambient light alone creates unnatural images, because every point on an object´s surface is assigned the same intensity. Shading is not possible with this kind of light. Ambient light is used mainly as an additional term in more complex illumination models, to illuminate parts of an object that are visible to the viewer, but invisible to the light source. The resulting image then becomes more realistic. The simplest case is illumination by ambient light (definition 31). The existing light will be multiplied with the properties of an object to produce the intensity of an object point in an image. Slide 12.4 illustrates this with the previously used indoor scene. Slide 12.5 goes one step further and introduces the diffuse Lambert reflection. There is a light source which illuminates the surface under an angle Θ from the surface normal. The illumination intensity I is the amount of incident light × the surface property k × the angle under which the light is falling onto the surface is then being reflected. Slide 12.6 illustrates the effect of various 223 224 CHAPTER 12. INTERACTION OF LIGHT AND OBJECTS Definition 32 Lambert model The Lambert Model describes the reflection of the light of a point light source on a matte surface like chalk or fabrics. Light emitted on a matte surface is diffuse reflected. This means that the light is reflected uniformely in any direction. Because of the uniform reflection in any direction the amount of light seen from any angle in front of the surface is the same. As the point of view does not influence the amount of reflected light seen, the position of the light source has to. This relationship is described in the Lambertian law: Lambertian law: Assume that a surface facet is directly illuminated by light so that the normal vector of the surface is parallel to the vector from the light source to the surface facet. Now if you tilt the surface facet by an angle θ the amount of light falling on the surface facet reduces by cos θ. A tilted surface is illuminated by less light than a surface normal to the light direction. So it reflects less light. This is called the diffuse Lambertian reflection: I = Ip · kd · cos θ Where I is the amount of reflected light; Ip is the intensity of the point light source; kd is the materials diffuse reflection coefficient and cos θ is the angle between the surface normal and the light vector. values of the parameter k. The angle cos Θ in reality can also be expressed as an in-product of two vectors, namely the vector from the light and the surface normal. Considering this diffuse Lambert reflection our original image becomes Slide 12.7. The next level of complexity is to add the two brightnesses together, the ambient and the Lambert illumination. A next sophistication gets introduced if we add an atmospheric attenuation of the light as a function of distance to the object shown in Slide 12.9. So far we have not talked about mirror reflection. For this we need to introduce a new vector. We have the light source L, the surface normal N , the mirror reflection vector R and the direction to a camera or viewer V . We have a mirror reflection component in the system that is illustrated in Slide 12.10 with a term W cosn α. α is the angle between the viewing direction and the direction of mirror reflection. W is a value that the user can choose to indicate how mirror-like the surface is. Phong introduced the model of this mirror reflection in 1975 and explained the effect of the power of n of cosn α. The larger the power is, the more focussed and smaller will the area of mirror reflection be. But not only does the power n define the type of mirror reflection, but also the parameter W as shown in Slide 12.12 where the same amount of mirror reflection produces different appearances by varying the value of the parameter W . W is describing the blending of the mirror reflection into the background whereas the value n is indicating how small or large the area is that is affected by the mirror reflection. Slide 12.13 introduces the idea of a light source that is not a point. In that case we introduce a point light source and a reflector, which will reflect light onto the scene. The reflector represendts the extended light source. Prüfungsfragen: • Was ist eine einfache Realisierung der Spiegelreflektion“ (engl.: specular reflection) bei ” der Darstellung dreidimensionaler Objekte? Ich bitte um eine Skizze, eine Formel und den Namen eines Verfahrens nach seinem Erfinder. • In Abbildung B.15 ist ein Objekt gezeigt, dessen Oberflächeneigenschaften nach dem Beleuchtungsmodell von Phong beschrieben werden. Tabelle B.2 enthält alle relevanten Parame- 12.2. REFLECTIONS FROM POLYGON FACETS 225 ter der Szene. Bestimmen Sie für den eingezeichneten Objektpunkt p die vom Beobachter wahrgenommene Intensität I dieses Punktes! Hinweis: Der Einfachkeit halber wird nur in zwei Dimensionen und nur für eine Wellenlänge gerechnet. Zur Ermittlung der Potenz einer Zahl nahe 1 beachten Sie bitte, dass die Näherung (1 − x)k ≈ 1 − kx für kleine x verwendbar ist. 12.2 Reflections from Polygon Facets Gouraud introduced the idea of interpolated shading. Each pixel on a surface will have a brightness in an image that is interpolated using the three surrounding corners of the triangular facet. The computation is made along a scan line as shown in Slide 12.15 with auxiliary brightness values Ia and Ib . Note that the brightnesses are computed with a sophisticated illumination model at positions I1 , I2 and I3 of the triangle and then a simple interpolation scheme is used to obtain the brightness in Ip . Gouraud does not consider a specular reflection while Phong does. Gouraud just interpolated brightnesses (algorithm 31), Phong interpolates surface normals from the corners of a triangle (algorithm 32). Slide 12.16 explains. Slide 12.17 illustrates the appearance of a Gouraud illumination model. Note how smooth the illumination changes along the surface whereas the geometry of the object is not smoothly interpolated. Slide 12.18 adds specular reflection to Gouraud. Phong, as shown in Slide 12.19, is creating a smoother appearance of the surface because of its interpolation of the surface normal. Of course it includes specular reflection. In order to not only have smoothness in the surface illumination but also in the surface geometry, facets of the object must be replaced by curved surfaces. Slide 12.20 illustrates the idea: the model’s appearance is improved, also due to the specular reflection of the Phong model. Slide 12.21 finally is introducing additional light sources. Slide 12.22 summarizes the various types of reflection. We have the law of Snell, indicating that the angle of incidence equals the angle of reflection and these angles are measured with respect to the surface normal. A mirror or specular reflection is very directed and the incoming ray is reflected in the opposite output direction. The opposite of specular reflection is the ”diffuse“ reflection. If it is near perfect, it will radiate into all directions almost equally. The Lambert reflection is a perfect diffuse reflector as shown in on the right-hand side. Prüfungsfragen: • Gegeben sei die Rasterdarstellung eines Objektes in Abbildung B.58, wobei das Objekt nur durch seine drei Eckpunkte A, B und C dargestellt ist. Die Helligkeit der Eckpunkte ist IA = 100, IB = 50 und IC = 0. Berechne die Beleuchtungswerte nach dem GouraudVerfahren in zumindest fünf der zur Gänze innerhalb des Dreieckes zu liegenden kommenden Pixeln. • Beschreiben Sie zwei Verfahren zur Interpolation der Farbwerte innerhalb eines Dreiecks, das zu einer beleuchteten polygonalen Szene gehört. 12.3 Shadows Typically, shadows are computed in two steps or phases. The computations for shadows are related to the computation of hidden surfaces, because areas in shadows are areas that are not seen from the illuminating sun or light source. Slide 12.24 explains the two types of transformation. We first have to transform a 3-D object into a fictitious viewing situation with the view point at the light source. That produces the visible surfaces in that view. A transform into the model coordinates produces shadow edges. We now have to merge the 3-D viewing and the auxiliary lines from 226 CHAPTER 12. INTERACTION OF LIGHT AND OBJECTS Algorithm 31 Gouraud shading Prozedur ScanLine (xa , Ia , xb , Ib , y) 1: grad = (Ib − Ia )/(xb − xa ) {Schrittweite berechnen} 2: if xb > xa then 3: xc = (int)xa + 1 {xc und xd auf Mittelpunkt von Pixel setzen} 4: xd = (int)xb 5: else 6: xc = (int)xb + 1 7: xd = (int)xa 8: end if 9: I = Ia + (xc − xa ) ∗ grad {Startwert für erstes Pixel berechnen} 10: while xc ≤ xd do 11: I auf Pixel (xc ,y) anwenden 12: xc = xc + 1 {einen Schritt weiter gehen} 13: I = I + grad 14: end while Function Triangle(x1 , y1 , I1 , x2 , y2 , I2 , x3 , y3 , I3 ) 1: Punkte aufsteigend nach der y-Koordinate sortieren 2: ∆xa = (x2 − x1 )/(y2 − y1 ) {Schrittweiten für linke Kante berechnen} 3: ∆Ia = (I2 − I1 )/(y2 − y1 ) 4: ∆xb = (x3 − x1 )/(y3 − y1 ) {Schrittweiten für rechte Kante berechnen} 5: ∆Ib = (I3 − I1 )/(y3 − y1 ) 6: y = (int)y1 + 1 {Startzeile berechnen} 7: yend = (int)(y2 + 0.5) {Endzeile für oberes Teildreieck berechnen} 8: xa = x1 + (y − y1 ) ∗ ∆xa {Startwerte berechnen} 9: xb = x1 + (y − y1 ) ∗ ∆xb 10: Ia = I1 + (y − y1 ) ∗ ∆Ia 11: Ib = I1 + (y − y1 ) ∗ ∆Ib 12: while y < yend do 13: eine Zeile mit ScanLine(xa , Ia , xb , Ib , y) berechnen 14: xa = xa + ∆xa {einen Schritt weiter gehen} 15: xb = xb + ∆xb 16: Ia = Ia + ∆Ia 17: Ib = Ib + ∆Ib 18: y =y+1 19: end while {oberes Teildreieck fertig} 20: ∆xa = (x3 − x2 )/(y3 − y2 ) {Schrittweiten für Kante berechnen} 21: ∆Ia = (I3 − I2 )/(y3 − y2 ) 22: yend = (int)(y3 + 0.5) {Endzeile für unteres Teildreieck berechnen} 23: xa = x2 + (y − y2 ) ∗ ∆xa {Startwert berechnen} 24: while y < yend do 25: eine Zeile mit ScanLine(xa , Ia , xb , Ib , y) berechnen 26: xa = xa + ∆xa {einen Schritt weiter gehen} 27: xb = xb + ∆xb 28: Ia = Ia + ∆Ia 29: Ib = Ib + ∆Ib 30: y =y+1 31: end while {unteres Teildreieck fertig} 12.3. SHADOWS 227 Algorithm 32 Phong - shading 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: for all polygons do compute the surface normal in the corners of the polygon. project the corners of the polygon into the plane for all scanlines, which are overlaped by the polygon do compute the linear interpolated surface normals on the left and right edge of the polygon for all pixels of the polygon on the scanline do compute the linear interpolated surface normals normalize the surface normals compute the illuminating modell and set the color of the pixel to the computed value end for end for end for Algorithm 33 Shadow map 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: make lightsource coordinate be center of projection render object using zbuffer assign zbuffer to shadowzbuffer make camera coordinate be center of projection render object using zbuffer for all pixels visible do Map coordinate from ’camera space’ into ’light space’ Project transformed coordinate to 2D (x’,y’ ) if transformed Z-coordinate > shadowzbuf f er[x’, y’] then shadow pixel {A Surface is nearer to the point than the lightsource} end if end for Algorithm 34 Implementation of Atheron-Weiler-Greeberg Algorithm 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: make lightpoint be center of projection determine visible parts of polygones split visible and invisible parts of partial lightened polygones transform to modelling database merge original database with lightened polygones {results a object splitted in lightened an unlightened polygones} make (any) eye point be center of projection for all polygons do {reder scene} if polygone is in shadow then set shading model to ambient model else set shading model to default model end if draw polygones end for 228 CHAPTER 12. INTERACTION OF LIGHT AND OBJECTS the shadow boundaries into a combined polygon data base. The method of computing hidden surfaces from the viewer’s perspective is repeated in Slide 12.25. Slide 12.26 illustrates the use of the z-buffer method for the computation of shadow boundaries (algorithm 33). L is the direction of the light, V is the position of the viewer. We first have to do a z-buffer from the light source, and then we do a z-buffer from the viewer’s perspective. The view without shadows and the view with them give a dramatically different impression of realism of the scene with two objects. Prüfungsfragen: • Erklären Sie den Vorgang der Schattenberechnung nach dem 2-Phasen-Verfahren mittels z-Buffer! Beschreiben Sie zwei Varianten sowie deren Vor- und Nachteile. 12.4 Physically Inspired Illumination Models There is a complex world of illumination computations that are concerned with the bi-directional reflectivity function BRDF. In addition we can use ray-tracing for illumination and a very particular method called radiosity. We will spend a few thoughts on each of those three subjects. A BRDF in Slide 12.28 describes the properties of a surface as a function of illumination. A 3-D shape indicates how the incoming light from a light source is being reflected from a particular surface. Many of the mathematical models used to describe those complex shapes bear their inventors’ names. 12.5 Regressive Ray-Tracing As discussed before, we have to cast a ray from the light source onto the object and find points in shadow or illuminated. Similarly, rays cast from the observer’s position will give us the hidden lines from the viewer’s reference point. Slide 12.30 illustrates again the geometry of ray-tracing to obtain complex patterns in an image from an object and from light cast from other objects onto that surface. Transparent object reflections may be obtained from the interface of the object with the air at the back, away from the viewer. 12.6 Radiosity A very interesting illumination concept that has been studied extensively during the last ten years is called radiosity. It is a method that derives from modeling the distribution of temperature in bodies in mechanical engineering (see Algorithm 35). We subdivide the surface of our 3-D space into small facets. We have a light source, illuminating all the facets, but the facets illuminate one another, and they become a form of secondary illumination source. Each surface facet has associated with it the differential surface area dA. We can set up an equation that relates the incoming light of the facets to all other facets. Very large systems of equations comes about. They can, however, be efficiently reduced in the number of unknowns, and therefore efficiently be solved. Let’s have a look at the few of the examples of these technique. In Slide 12.38, we see a radiosity used in the representation of a classroom, Slide 12.39 is an artificial set of cubes, Slide 12.39 illustrates one table at two levels of resolution. In the first case, the facets used for radiosity are fairly large, in the second the facets are made much smaller. We see how the realism in this illumination model increases. 12.6. RADIOSITY 229 Algorithm 35 Radiosity 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: load scene divide surfaces into patches for all patches do if patch is a light then patch.emmision := amount of light patch.available emmision := amount of light else patch.emmision := 0 patch.available emmision := 0 end if end for {initialize patches} repeat {render scene} for all patches i, starting at the patch with the highest emmision available do place hemicube on top of patch i {needed to calculate form factors} for all patches j do calculate form factor between patch i and patch j {needed to calculate amount of light} end for for all patches j do ∆R := amount of light from patch i to patch j {using the form factor and properties of the patches} j.emmision available := j.emmision available +∆R j.emmision := j.emmision +∆R end for i.emmision available := 0 {all aviailable light has been distributed to the other patches} end for until good enough 230 CHAPTER 12. INTERACTION OF LIGHT AND OBJECTS Similarly, we have radiosity in modeling a computer room in Slide 12.39. We have internal illumination, and in one case on the lower right of Slide 12.40 we have illumination from the outside of the room. In slide we see a radiosity-based computation of an indoor scene again in two levels of detail in the mesh sizes for the radiosity computation. 12.6. RADIOSITY 231 232 CHAPTER 12. INTERACTION OF LIGHT AND OBJECTS Slide 12.1 Slide 12.2 Slide 12.3 Slide 12.4 Slide 12.5 Slide 12.6 Slide 12.7 Slide 12.8 Slide 12.9 Slide 12.10 Slide 12.11 Slide 12.12 Slide 12.13 Slide 12.14 Slide 12.15 Slide 12.16 Slide 12.17 Slide 12.18 Slide 12.19 Slide 12.20 Slide 12.21 Slide 12.22 Slide 12.23 Slide 12.24 Slide 12.25 Slide 12.26 Slide 12.27 Slide 12.28 12.6. RADIOSITY 233 Slide 12.29 Slide 12.30 Slide 12.31 Slide 12.32 Slide 12.33 Slide 12.34 Slide 12.35 Slide 12.36 Slide 12.37 Slide 12.38 Slide 12.39 Slide 12.40 Slide 12.41 234 CHAPTER 12. INTERACTION OF LIGHT AND OBJECTS Chapter 13 Stereopsis 13.1 Binokulares Sehen The 3-dimensional impressions of our environment as received by our two eyes is called binocular vision. Slide 13.10 explains that the human perceives two separate images via two eyes, merges the two images in the brain and reconstructs a depth-model of the perceived scene in the brain. The two images obtained by the two eyes differ slightly, because of the two different vantage points. The stereo-base for natural binocular vision is typically six-and-a-half centimeters, thus the distance between the eyes. Recall that natural depth perception is defined by many depth queues other that binocular vision. We talked about depth queues by color, by size, by motion, by objects covering up one another etc. (see Chapter 11) Slide 13.4 explains geometrically the binocular stereo-effect. On the retina, two points, P and Q, will be imaged on top of one another in one eye, but will be imaged side by side subtending an small angle dγ in the other eye. We call γ the parallactic angle or parallax an dγ a parallel difference. It is the measure of disparity which is sensed and used in the brain for shape reconstruction. The angle γ itself gives us the absolute distance to a point P and is usually computed from the stereobase ba . Note that our eyes are sensitive to within a parallactic angle of 15 seconds of arc (15”), and may be limited to perceive a parallactic angle no larger than 7 minutes of arc (7’). Slide 13.5, Slide 13.6, and Slide 13.7 illustrate two cases of stereo-images taken from space and one from microscopy. Note the difference between binocular viewing and stereo-viewing as discussed in a moment. What is of interest in a natural binocular viewing environment is the sensitivity of the eyes to depth. Slide 13.8 explains that the difference in depth between 2 points, d, can be obtained by our sensitivity to the parallactic angle, dγ. Since this is typically no smaller than 17 seconds of arc, we have a depth differentiation ability dγ as shown in Slide 13.8. At a distance of 25 cm we may be able to perceive depth differences as small a few p micrometers. At a meter it may be a tenth of a millimeter, but at ten meters distance, it may already be about a meter. At a distance of about 900 meters, we may not see any depth at all from our binocular vision. Prüfungsfragen: • Gegeben sei eine Distanz yA = 3 Meter vom Auge eines scharfäugigen Betrachters mit typischem Augenabstand zu einem Objektpunkt A. Wie viel weiter darf sich nun ein zweiter Objektpunkt B vom Auge befinden, sodass der Betrachter den Tiefenunterschied zwischen den beiden Objektpunkten A und B gerade nicht mehr wahrnehmen kann? Es wird um die 235 236 CHAPTER 13. STEREOPSIS entsprechende Formel, das Einsetzen von Zahlenwerten und auch um die Auswertung der Formel gebeten. • Auf der derzeit laufenden steirischen Landesausstellung comm.gr2000az“ im Schloss Eggen” berg in Graz ist ein Roboter installiert, der einen ihm von Besuchern zugeworfenen Ball fangen soll. Um den Greifer des Roboters zur richtigen Zeit an der richtigen Stelle schließen zu können, muss die Position des Balles während des Fluges möglichst genau bestimmt werden. Zu diesem Zweck sind zwei Kameras installiert, die das Spielfeld beobachten, eine vereinfachte Skizze der Anordnung ist in Abbildung B.63 dargestellt. Bestimmen Sie nun die Genauigkeit in x-, y- und z-Richtung, mit der die in Abbildung B.63 markierte Position des Balles im Raum ermittelt werden kann! Nehmen Sie der Einfachkeit halber folgende Kameraparameter an: – Brennweite: 10 Millimeter – geometrische Auflösung des Sensorchips: 100 Pixel/Millimeter Sie können auf die Anwendung von Methoden zur subpixelgenauen Bestimmung der Ballposition verzichten. Bei der Berechnung der Unsicherheit in x- und y-Richtung können Sie eine der beiden Kameras vernachlässigen, für die z-Richtung können Sie die Überlegungen zur Unschärfe der binokularen Tiefenwahrnehmung verwenden. 13.2 Stereoskopisches Sehen We can now trick our two eyes to think they would see the natural environment, when in fact they look at two images presented separately to the left and right eye. Since those images will not be at an infinite distance, but will be perhaps at 25 cm, we will be forced with our eyes to focus at 25 cm, yet use in our brain an attitude as if one were to look at a much larger distance where the eye’s optical axes are parallel. Many people have difficulties focussing at 25 cm, and simultaneously obtaining a stereoscopic impression. To help, one has auxiliary tools called a mirror stereoscope. Two images are placed on a table, an assembly of two mirrors and a lens present each image separately to each eye, whereby the eye is permitted to focus at infinity and not at 25 cm. Slide 13.12 lists alternative modes of stereo-viewing. We mentioned the mirror stereoscope with separate optical axes. A second approach is by anaglyphs, implemented in the form of glasses, where one eye is only receiving the red, the other one only the green component in an image. A third approach is polarization, where the images presented to the eyes are polarized differently for the left and right eye. And a further approach is the ultimate presentation of images by shutters and glasses and presentation by projection or on a monitor. All four approaches have been implemented on computer monitors. You can think of a mirror stereoscope looking at two images by putting two optical systems on a monitor and have the left half present one image, the right half the other. Anaglyphs are a classical case of looking at stereo on a monitor by presenting in the green and red channel two images, and wearing glasses to perceive a stereo impression. The most popular way of presenting soft copy images on a monitor is by polarization, wearing simple glasses that look at two polarized images on a monitor or active glasses, that are being controlled from the monitor and presenting 120 images per second, 60 to one eye and 60 to the other eye, and by polarization ensuring that the proper image hits the proper eye. This will be called image flickering using polarization. Slide 13.13 explains how stereoscopic viewing by means of two images increases the ability of the human to perceive depth way beyond the ability available from binocular vision. The reason is very simple. Binocular vision is limited by the six-and-half cm distance between the two eyes, whereas 13.3. STEREO-BILDGEBUNG 237 Definition 33 total plastic Let p=n·v be the total plastic, whereby n ... image magnification v ... eye base magnification The synthetic eye base dA, typically 6.5 cm, can be magnified by the stereo base dK from which the images are taken. That implies v = dK/dA stereoscopic vision can employ images taken from a much larger stereobase. Take the example of aerial photography, where two images may be taken from an airplane with the perspective centers 600 meters apart. We obtain an increase in our stereo-perception that is called total plastic (see Definition 13.2). We look at a ratio between the eye base and the stereo base from which the images are taken which gives us a factor v. In addition, we look at the images under a magnification n, and a total plastic increases our stereo-ability, by n · v, thus by a factor of tens of thousand. As a result, even though the object may be, say, a thousand meters away, we still may have a depth acuity of three cm. Prüfungsfragen: • Quantifizieren Sie bitte an einem rechnerischen Beispiel Ihrer Wahl das Geheimnis“, welches ” es gestattet, in der Stereobetrachtung mittels überlappender photographischer Bilder eine wesentlich bessere Tiefenwahrnehmung zu erzielen, als dies bei natürlichem binokularem Sehen möglich ist. • Nennen Sie verschiedene technische Verfahren der stereoskopischen Vermittlung eines echten“ ” (dreidimensionalen) Raumeindrucks einer vom Computer dargestellten Szene! 13.3 Stereo-Bildgebung We need to create two natural images, with one camera at two positions, taking images sequentially, or with a camera pair, taking images simultaneously. The simultaneous imaging is preferred when the object moves. Slide 13.15 illustrates the two camera positions looking at a two-dimensional scene and explaining again the concept of a stereobase b, of the angle γ of convergence giving us the distance to the object PK and the parallactic difference angle dγ which is a measure of depth between two points PK and PT . Slide 13.16 repeats the same idea for the case of aerial photography where an airplane takes one image at position O1 and a second image at position O2 . The distance between 01 and 02 is this aerial stereobase b, the distance to the ground is the flying height H, the ratio b/H is called base-to-height-ratio and is a measure for the stereo acuity of an image pair. Slide 13.17 repeats again the case of two images taken from an overhead position. Note that the two images look identical to the casual observer. What makes the stereo-process work are the minute small geometric differences between the two images which occur in the direction of flight. There are no geometric differences in the direction perpendicular to the flight direction. Going back to Slide 13.16, we may appreciate the necessity of recreating in the computer the relative position and orientation of the two images in space. An airplane or satellite may make 238 CHAPTER 13. STEREOPSIS unintended motions that will lead the user to not get an accurate measure of the positions O1 and O2 and of the direction of imaging for the two camera positions. A stereo-process will therefore typically require that sets of points are extracted from overlapping images representing the same object on the ground. These are called homologue points. In Slide 13.16 is suggested that a rectangular pattern of six points has been observed in image 1 and the same six points have been observed in image 2. What now needs to happen mathematically is that two bundles of rays are created from the image coordinates and the knowledge of the perspective centers O1 and O2 in the camera system. And then the two bundles of rays need to be arranged such, that the corresponding rays (homologue rays) intersect in the three-dimensional space of the object world. We call the reconstruction of a bundle of rays from image coordinates the inner orientation. We call the process by which we arrange the two images such that all corresponding rays intersect in object space, the relative orientation. And we call the process by which we take the final geometric arrangement and we make it fit into the world coordinate system by a three-dimensional conformal transformation the absolute transformation. Prüfungsfragen: • Wie werden in der Stereo-Bildgebung zwei Bilder der selben Szene aufgenommen? Beschreiben Sie typische Anwendungsfälle beider Methoden! 13.4 Stereo-Visualization images to stereoscopically view the natural environment is stereo-visualization by creating artificial images presented to the eyes and obtaining a three-dimensional impression of an artificial world. We visit Slide 13.19 to explain that we need to create two images for the left and the right eye of a geometric scene, represented in the slide by a cube and its point P . Slide 13.20 shows that we compute two images of each world point W , assuming that we have two cameras, side by side, at a stereobase b and with their optical axes being parallel. Recall that in computer graphics the optical axes are called view point normals, the lens center is the view point V P . Slide 13.21 is the ground view of the geometric arrangement. We have used previously Slide 13.22 to illustrate the result obtained by creation of two images of a three-dimensional scene. In this particular case it is a wire-frame representation for the left and right eye. If we present those two images at a distance of about six-and-a-half cm on a piece on a flat table, and we look vertically down and think we are looking at infinity (so that your eye-axes are parallel) we will be able to merge the two images into a three-dimensional model of that object. However, we will notice that we will not have a focused image, because our eyes will tend to focus at infinity, when we force our eye axes to be parallel. Computer generated stereo-images are the essence of virtual environments and augmented environments. Slide 13.23 illustrates how a person does look at artificial images and receives a three-dimensional impression, using motion detectors, that will feed the head’s position and orientation into the computer, so that as the head gets moved, a new image will be projected to the eyes, and the motion of the head will be consistent with the experience of the natural environment. In contrast, Slide 13.24 illustrates again augmented reality, where the monitors are semi-transparent and therefore the human observer does not only see the artificial virtual impression of computed images, but has superimposed on them the natural environment which is visible binocularly: augmented reality uses both, the binocular and stereo-vision. 13.5 Non-Optical Stereo Eyes are very forgiving, and the images we observe stereoscopically need not necessarily be taken by a camera and therefore need not be centrally perspective. Slide 13.26 explains how NASA Space 13.6. INTERACTIVE STEREO-MEASUREMENTS 239 Shuttle has created radar images in sequential orbits. Those images overlap with one another and show the same terrain. Slide 13.27 illustrates a mountain range in Arizona imaged by radar. Note that the two images look more different than our previous optical images did. Shadows are longer in one image than the other. Yet a stereo-impression can be obtained in the same way as we have obtaining it with optical imagery. The quality of the stereo-measurement will be lower, because of the added complexity that the two images are less similar in gray tones. The basic idea of this type of stereo is repeated in Slide 13.28. We have two antennas illuminating the ground and receiving echoes as in a traditional radar image, and the overlap area can be presented to the eyes as if they were two optical images. The basic idea is also explained in Slide 13.29. Note that in each radar image, point P is projected into position P 0 or P 00 and we get a parallactic distance dp . The corresponding camera position that will produce from a point P the same positions P 0 and P 00 and this parallax distance dp would be camera positions 1 and 2 shown in Slide 13.29. Prüfungsfragen: • Nennen Sie ein Beispiel und eine konkrete Anwendung eines nicht-optischen Sensors in der Stereo-Bildgebung! 13.6 Interactive Stereo-Measurements If we want to make measurements using the stereo-impression from two images, we need to add something to our visual impression: a measuring mark. Slide 13.31 explains the two stereo-images, and our eyes viewing the same point M in the two images, where they are presented as M1 and M2 . If we add a measuring mark as shown in e will perceive the measuring mark (M ) to float above or below the ground. If we now move the measuring mark in the two images, such that they superimpose the points M1 and M2 , the measure mark will coincide with the object point M . We can now measure the elevation differences between two points by tracking the motion that we have to apply to the measuring mark in image space. Slide 13.32 explains the object point M , the measuring mark (M ) and their positions in image space at M1 , M2 . Slide 13.33 is an attempt at illustrating the position of the measuring mark above the ground, on the ground and below the ground. In this particular case, the stereo-perception is anaglyphic. 13.7 Automated Stereo-Measurements See Algorithm ??. The measuring mark for stereo-measurements needs to be placed on a pair of homologue points. Knowing the location of the stereo-measuring mark permits us to measure the coordinates of the 3D point in the world coordinate system. A systematic description of the terrain shape, or more generally, the shape of 3D objects, requires many surface measurements to be made by hand. This can be automated if the location of homologue points can be found without manual interference. Slide 13.35 and Slide 13.36 explain. Two images exist, building a stereo-pair, and a window is taken out of each image to indicate a homologue area. The task exists, as shown in Slide 13.37 to automatically find the corresponding locations in such windows. For such purpose, we define a master-and-slave image. We take a window of the master image and move it over the slave image and at each location we compute a value, describing the similarity between the two image windows. At a maximum value of similarity, we have found a point of correspondence. We have as a result a point 1’ in image (’) and a point 1” in image (”). These two points define two perspective rays from a perspective center through the image plane into the world coordinate system, and intersect at a surface point 1. We need to 240 CHAPTER 13. STEREOPSIS verify that the surface point 1 makes sense, we will not accept that point if it is totally inconsistent with its neighborhood, we will call this a gross error. We will accept the point if it is consistent with its neighborhood. Slide 13.38 explains the process of matching with the master-and-slave image windows. Note that the window may be of size K × J and we are looking in the master window of size N × M , obtaining many measures of similarity. Slide 13.39 defines individual pixels within the sub-image and is the basis for one particular measure of similarity shown in Slide 13.40. In it, a measure of 2 similarity, called normalized correlation as defined by a value RN (m, n) at location (m, n). The values in this formula are the gray values W in the master and S in the slave image. A double summation occurs, because of the two-dimensional nature of the windows of size M × N . Slide 13.41 illustrates two additional image correlation measures. The normalized correlation produces a value R, which typically assumes numbers between 0 and 1. Full similarity is expressed with a value 1, total dissimilarity results in a value 0. A non-normalized correlation will not have a range between 0 and 1, but will assume much larger ranges. However, whether the correlation is normalized or not, one will likely find the same extremas and therefore the same matchpoints. A much different measure of similarity is the sum of absolute differences in gray values. We essentially sum up the absolute differences in gray between the master-and-slave images at a particular location (m, n) of the window. The computation is much faster than the computation of a correlation since we avoid the squaring of values, also if a measure of similarity becomes larger than a previous value, we can stop the double summation, since we have already found a lower value of absolute differences, and therefore a more likely place at which maximum similarities are achieved. Slide 13.42 explains how the many computations of correlation values result in a window of such correlation values and we need to find the extremum, the highest correlation within the window, as marked by a star in Slide 13.42. Problems occur if we have multiple extremas and we don’t know which one to choose. Slide 13.43 suggests that various techniques exist that accelerate the matching process. Slide 13.44 indicates how the existence of a pyramid will allow us to do a preliminary match with reduced versions of the two images and then limit the size of the search windows dramatically and thereby increase the speed of finding successful matches. We call this a hierarchical matching approach. Another trick is shown in Slide 13.45 where an input image is converted into a gradient image or an image of interesting features. Instead of matching two gray value images, we match two edge images. A whole theory exists on how to optimize the search for edges in images in preparation for a stereo-matching approach. Slide 13.46 explains that a high-pass filter that suppresses noise and computes edges is preferable. Such a filter is the so-called LoG-filter or Laplacian-of-Gaussian transformation of an image. Where we get two lines for each edge since we are looking for zerotransitions1 . That subject is an extension of the topic of filtering. Prüfungsfragen: 2 • Bestimmen Sie mit Hilfe der normalisierten Korrelation RN (m, n) jenen Bildausschnitt innerhalb des fett umrandeten Bereichs in Abbildung B.25, der mit der ebenfalls angegebenen Maske M am besten übereinstimmt. Geben Sie Ihre Rechenergebnisse an und markieren Sie den gefundenen Bereich in Abbildung B.25! Antwort: i2 W (j, k)Sm,n (j, k) 2 RN (m, n) = PM PN 2 PM PN 2 j=1 k=1 [W (j, k)] · j=1 k=1 [Sm,n (j, k)] hP M j=1 1 in German: Nulldurchgänge PN k=1 13.7. AUTOMATED STEREO-MEASUREMENTS cWS 241 2 M X N X := W (j, k)Sm,n (j, k) j=1 k=1 cWW := M X N X [W (j, k)] M X N X [Sm,n (j, k)] 2 j=1 k=1 cSS := 2 j=1 k=1 Position links oben rechts oben links unten rechts unten cWS 25 25 16 4 cWW 6 6 6 6 cSS 5 6 6 6 2 RN (m, n) 0.833 0.694 0.444 0.111 Die beste Übereinstimmung besteht links oben. • Nach welchem Grundprinzip arbeiten Verfahren, die aus einem Stereobildpaar die Oberfläche eines in beiden Bildern sichtbaren Körpers rekonstruieren können? 242 CHAPTER 13. STEREOPSIS 13.7. AUTOMATED STEREO-MEASUREMENTS 243 Slide 13.1 Slide 13.2 Slide 13.3 Slide 13.4 Slide 13.5 Slide 13.6 Slide 13.7 Slide 13.8 Slide 13.9 Slide 13.10 Slide 13.11 Slide 13.12 Slide 13.13 Slide 13.14 Slide 13.15 Slide 13.16 Slide 13.17 Slide 13.18 Slide 13.19 Slide 13.20 Slide 13.21 Slide 13.22 Slide 13.23 Slide 13.24 Slide 13.25 Slide 13.26 Slide 13.27 Slide 13.28 244 CHAPTER 13. STEREOPSIS Slide 13.29 Slide 13.30 Slide 13.31 Slide 13.32 Slide 13.33 Slide 13.34 Slide 13.35 Slide 13.36 Slide 13.37 Slide 13.38 Slide 13.39 Slide 13.40 Slide 13.41 Slide 13.42 Slide 13.43 Slide 13.44 Slide 13.45 Slide 13.46 Slide 13.47 Chapter 14 Classification 14.1 Introduction Concepts of classification cannot just be used in image analysis and computer vision but also in many other fields where one has to make decisions. First, we want to define the problem, then see some examples. We then review an heuristic approach called minimum distance classifier. We finally go through the Bayes Theorem as the basis of statistical classification. We round out this chapter with a and sketch of a simple implementation based on the Bayes Theorem. Classification is a topic based to a considerable extent on the field of statistics, dealing with probabilities, errors, estimations. We will stay away from statistics here, but only take a short look. What is the definition of classification? We have object classes Ci , i = 1, . . . , n, and we search a certain class Ci which belongs with a set of observations. The question is first which observations to make and then second is the classification itself, namely the decisions to which class the observations belong. 14.2 Object Properties Let us review object features. Objects have colors, texture, height, whatever one can imagine. If we classify the types of land use in Austria, as suggested in Slide 14.5, a set of terrain surface properties will be needed perhaps from satellite images and public records. Slide 14.6 enumerates the 7 properties of electromagnetic radiation one can sense remotely, say by camera, thermal images, radiometry, radar and interferometric sensors. As a sensor collects image data about a scene from a distance, up to 7 characteristics are accessible. However, the properties of the sensed signal may be used to “invert” it into a physical parameter of the object. Examples may be the object point’s moisture or roughness, possibly its geometric shape. Slide ?? illustrates a camera image of a small segment of skin with a growth called lesion that could be cancer. One can extract from the physically observed color image some geometric properties of the lesion such as length, width, roughness of the edge etc. Slide ?? is a fingerprint, Slide 14.9 a set of derived numbers describing the finger print. Each number is associated with a pixel for a feature vector per pixel, or with a larger object such as the lesion or finger print. The feature vector x is the input to a classification. 245 246 CHAPTER 14. CLASSIFICATION Prüfungsfragen: • Welche physikalischen Merkmale der von einem Körper ausgesandten oder reflektierten Strahlung eignen sich zur Ermittlung der Oberflächeneigenschaften (z.B. zwecks Klassifikation)? 14.3 Features, Patterns, and a Feature Space Algorithm 36 Feature space 1: 2: 3: 4: FeatureSpace = CreateHyperCube(n-Dimensional); {Create an n-Dimensional Hypercube} for all Pixels in Image do FeatureSpace[Pixel[Plane-1], Pixel[Plane-2], .. Pixel[Plane-n]] +=1; {Increment the corresponding Point in the FeatureSpace by 1} end for {This algorithm creates a Feature-Space represented by a n-Dimensional Hypercube.} If we have to a color classification, then our features will be “color”. In a color image we represent color via the red-green-blue (RGB) planes. Recall the eight bit gray value image representing the R channel, next the G channel and last the B channel, representing red, green, blue. SlideFigure x suggests color classifications, but has 4 images or channels, for instance, infrared (IR) in addition to RGB, or temperature or whatever we can find as an object feature. We now build up a feature space. In the case of RGB we would have three dimensions. Slide 14.12 presents just tow dimensions for simplicity, for example R and G. If we add more features (B, IR, temperature...) we end up with hyperspaces which are hard to visualize. 14.4 Principle of Decisions Slide 14.14 illustrates what we would like to get from the classifier’s decisions: each object, in this case pixel, is to be assigned to a class, here denoted by O1 , O2 , O3 . . . The simplest method of classification is a so-called minimum-distance classifier. Slide 14.19 presents a 2-dimensional feature space. Each entry into this 2D space is a vector x = (x1 , x2 )T or (g1 , g2 )T , with the observations x1 , x2 or g1 , g2 , for example representing the amount of red (R) or green (G) as an 8-bit digital number DN from an image. These observations describe in this case one pixel each and we find that the value for R may be 50 and for G 90. This determines a unique entry in the feature space. As we make observations of known objects we may define a so-called learning phase, in which we find feature pairs defining a specific class. R = 50, G = 90 might be a type of object. We now calculate the mean value of a distribution which is nothing else than the expected value of a set of observations. The arithmetic mean in this case is obtained by summing up all the values and calculating the mean. We connect those means via a straight line and define a line halfway between the means perpendicular to the connection line. This is the boundary between the two classes is called the discriminating function. If we now make an observation of a new unknown object (pixel), we simply determine the distances to the various means. In Slide 14.16 the new object belongs class O3 . This is the minimum distance classifier. What could be a problem with the minimum distance classifier? Suppose that in the learning phase one makes an error and for the class O3 we make an “odd” observation. This will affect 14.4. PRINCIPLE OF DECISIONS 247 Algorithm 37 Classification without rejection TYPE pattern = feature: ARRAY [1 .. NbOfFeatures] of Integer; classIdentifier: Integer; Classify-by-MinimumDistance (input: pattern) this Method sets the ”classIdentifier” of ”input” to the class represented by the nearest sample-Pattern for i:=1 to NbOfSamples do Distance := 0 ...initial value Summarizing all differences between ”input” and ”SamplePattern[i]”: for j:=1 to NbOfFeatures do Difference := input.feature[j] - SamplePattern[i].feature[j] Distance := Distance + |Difference| end for if i=1 then minDistance := Distance end if ...initial value Setting the Class: if Distance ≤ minDistance then minDistance := Distance input.classIdentifier := SamplePattern[i].classIdentifier end if end for Classify-by-DiscriminationFunction (input: pattern) this Method sets the ”classIdentifier” of ”input” to the class with maximum function result for i:=1 to NbOfClasses do Sum := 0 ...initial value Summarizing all function results of the input-features: for j:=1 to NbOfFeatures do functionResult := DiscriminationFunction[i] (input.feature[j]) Sum := Sum + functionResult end for if i=1 then maxSum := Sum end if ...initial value Setting the Class: if Sum ≥ maxSum then maxSum := Sum input.classIdentifier := i end if end for ...representing the actual function set 248 CHAPTER 14. CLASSIFICATION the expected value for the entire data set. One problem is then that we have not considered the “uncertainty” of the observation in defining the various classes. This “uncertainty” would be represented by the “variance” of our observations. If the observations are clustered together closely, then their variance is small. If they are spread out widely, then their variance is larger. Variance is not considered in a minimum classifier. Figure x illustrates that each pixel gets classified and assigned to a class. There are no rejections where the classifier is unable to make a decision and rejects a pixel/object/feature vector as belonging to none of the classes Prüfungsfragen: • Gegeben seien Trainingspixel mit den in der beiliegenden Tabelle ?? angegebenen Grauwerten. Gegeben sei auch ein neues Pixel xneu = (13, 7). 1. Spannen Sie bitte nun grafisch einen zwei-dimensionalen Merkmalsraum auf und tragen Sie die Lage der Trainingspixel ein. 2. Beschreiben Sie bitte einen einfachen Rechenvorgang (Algorithmus) zur Entscheidung, welcher Objektklasse“ dieses neue Pixel mit hoher Wahrscheinlichkeit angehören wird. ” 3. Führen Sie die numerische Berechnung dieser Entscheidung durch und begründen Sie daher numerisch die Zurdnung des neuen Pixels zu einer der in den Trainingspixeln dargestellten Objektklassen. 14.5 Bayes Theorem Algorithm 38 Classification with rejection 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: Pmax := −1 {initial value} Pmin := 0.6 {choosen border for what to classify} while there is a pixel to classify do pick and remove pixel from the to-do-list x := f (pixel) {n-dim feature vector, represents information about pixel} for all existing classes Ci do with x calculate a posteriori probability P (Ci |x) for pixel if P (Ci |x) > Pmax then Pmax := P (Ci |x) k := i {store the actual most probable class k for pixel} end if end for if Pmax > Pmin then add pixel to corresponding class k {classification} else leave pixel unclassified {rejection} end if end while Bayes Theorem looks complicated, but is not. We define a probability that an observation x belongs to a class Ci . We call it an a-posteriori probability because it is a probability of a result, after the classification. This resulting probability is computed from 3 other probabilities. The first is the result of the learning phase which is the probability that given a class Ci , we make the observation x. Second, we have the a-priori knowledge of the expert providing a probability that a class Ci may occur. The third probability is the so-called joint probability of observation x and class Ci . 14.5. BAYES THEOREM 249 This formula will not help us with the implementation of software codes. But the expression in Slide 14.18 serves to explain relationships. A sketch of a possible implementation follows. First we make a very common assumption in Slide 14.18. This assumption is called the closed world assumption over all the classes stating that there is no unknown class and that an observation will belong to one of the n classes. This expresses itself in statistics by means of a sum of all posteriori probabilities being 1. For example colors: there is no pixel in the image where we do not know a color. Bayes Theorem simplifies under this assumption since the joint probability is a constant factor 1/a. The problem with all classifiers is the need to model expert knowledge, then to learn one’s system. The hard thing is to find a correct computation model. One simple implementation would thus be that we just calculate the variances of our observations in the learning phase. We compute not only the means, as we did before, but also the variance or standard deviation. We need to learn our pixels, our colors, our triplets in color, we need to assign certain triplets to certain colors and this will give us our means and our variances as in Slide 14.23. Note that the slide shows 2, not 3 dimensions. The mean value and the variance define a Gauss an function representing the so-called distribution of the observations. In Slide 14.23 the ellipse may for instance define the 1-sigma border: “Sigma” or σ is the standard deviation, σ 2 is the variance. The probability is represented by a curve or surface in 2D that is called a “Gaussian curve” or surface. This means that the probability that an observation within the ellipse of O3 is 66%. If the ellipse is drawn at 3σ (3 times the standard deviation), then the probability goes to 99%. By calculating the variance and the sigma border for each class Ci or Oi we produce n Gaussian functions. In Slide ?? we have two dimensions, red and green. We make an observation which we want to classify. We do not calculate the minimum distance, but we check in which ellipse the vector of a new observation will come to lie. To summarize, we have performed two steps: we calculate the mean and variance of each class in the learning phase and then “intersect” the unknown observation with the result of the learning phase. A simple Bayes classifier requires no more than to determine the Gaussian function discussed above. The Gaussian function in a single dimension for classing is 1 (x − mj )2 exp − dj (x) = p , 2σj 2σj with x being the feature vector, σj the standard deviation, mj the mean and j is the index associated with a specific class. In more than one dimension, m and x get replaced by vectors and σ becomes a matrix. This algorithm is summarized in Slide 14.23: m is the mean of each class, C is the variance. In a multi-dimensional context, C is a matrix of numbers, the so-called co-variance matrix. It is computed using the coordinates of the mean m. The expression E{·} in Slide 14.23 is denoted as expected value and can be estimated by C= N 1 X T xx − mmT N k=1 or equivalently cij = N 1 X (xk,i − mi )(xk,j − mj ), N i, j = 1 . . . M, k=1 where M is the dimension of feature space and N is the number of feature vectors or pixels per class for the learning phase. 250 CHAPTER 14. CLASSIFICATION As shown in Slide 14.23, each class of objects gets defined by an ellipse. Prüfungsfragen: • In der Bildklassifikation wird oft versucht, die unbekannte Wahrscheinlichkeitsdichtefunktion der N bekannten Merkmalsvektoren im m-dimensionalen Raum durch eine Gausssche Normalverteilung zu approximieren. Hierfür wird die m×m-Kovarianzmatrix C der N Vektoren benötigt. Abbildung B.28 zeigt drei Merkmalsvektoren p1 , p2 und p3 in zwei Dimensionen (also N = 3 und m = 2). Berechnen Sie die dazugehörige Kovarianzmatrix C! Antwort: Zuerst den Mittelwert m berechnen: 1 1 3 2 2 + + = m= −1 3 4 2 3 dann die (pi − m) · (pi − m)T bestimmen: T (p1 − m) · (p1 − m) = T (p2 − m) · (p2 − m) = T (p3 − m) · (p3 − m) = −1 · −1 −3 = −3 1 1 1 · 1 1 = 1 1 1 0 0 0 · 0 2 = 2 0 4 1 3 3 9 Die Kovarianzmatrix ist 3 C= 1X 1 (pi − m) · (pi − m)T = 3 i=1 3 2 4 4 14 • Es sei p(x), x ∈ R2 die Wahrscheinlichkeitsdichtefunktion gemäß Gaussscher Normalverteilung, deren Parameter aufgrund der drei Merkmalsvektoren p1 , p2 und p3 aus Aufgabe B.2 geschätzt wurden. Weiters seien zwei Punkte x1 = (0, 3)T und x2 = (3, 6)T im Merkmalsraum gegeben. Welche der folgenden beiden Aussagen ist richtig (begründen Sie Ihre Antwort): 1. p(x1 ) < p(x2 ) 2. p(x1 ) > p(x2 ) Hinweis: Zeichnen Sie die beiden Punkte x1 und x2 in Abbildung B.28 ein und überlegen Sie sich, in welche Richtung die Eigenvektoren der Kovarianzmatrix C aus Aufgabe B.2 weisen. Antwort: Es ist p(x1 ) < p(x2 ), da x2 in Richtung“ des größten Eigenvektors von C liegt ” (gemessen vom Klassenzentrum m) und daher die Wahrscheinlichkeit von x2 größer ist als die von x1 . 14.6 Supervised Classification The approach where training/learning data exist is called supervised classification. Unsupervised is a method where pixels (or objects) get entered into the feature space not knowing what they are. In that case a search gets started to detect clusters in the data. The search comes up with aggregations of pixels/objects and simply defines that each aggregate is a class. 14.7. REAL LIFE EXAMPLE 251 In contrast to this approach common classification starts out from known training pixels or objects. A real life case is shown in Slide 14.22. A clustering algorithm may find here 3 clusters. In fact, Slide ?? is the actual segmentation of these training pixels into 6 object classes (compare with Slide 14.23). The computation in the learning or training phase which leads to Slide ??, is the basis to receive new pixels. If they fall within the agreed-upon range of a class, the pixel is assigned to that class.. Otherwise it is not assigned to any class: it gets rejected. 14.7 Real Life Example Slide 14.26 to Slide 14.31 illustrate a classification of the territory of Austria on behalf of a cellphone project where surface cover was needed for wave propagation and signal strength assessment. It is suggested that the classification was unsupervised, thus without training pixels and simply looking for groups of similar pixels (clusters). A rather “noisy” result is obtained in Slide 14.28, Slide 14.29 presents the forest pixels where many pixels get assigned to different classes, although they are adjacent to one another. This is the result of not considering “neighborhoods”. One can fix this by means of a filter that will aggregate adjacent pixels into one class if this does not totally contradict the feature space. The city of Vienna’s surface cover and landuse result is shown in Slide 14.31. 14.8 Outlook In the specialization class on “Image Processing and Pattern Recognition” we will discuss more details of this important and central topic of: • Multi-variable probabilities • Neural network classification • Dependencies between features • Non statistical classification (shape, chain codes) • Transition to Artificial Intelligence AI 252 CHAPTER 14. CLASSIFICATION 14.8. OUTLOOK 253 Slide 14.1 Slide 14.2 Slide 14.3 Slide 14.4 Slide 14.5 Slide 14.6 Slide 14.7 Slide 14.8 Slide 14.9 Slide 14.10 Slide 14.11 Slide 14.12 Slide 14.13 Slide 14.14 Slide 14.15 Slide 14.16 Slide 14.17 Slide 14.18 Slide 14.19 Slide 14.20 Slide 14.21 Slide 14.22 Slide 14.23 Slide 14.24 Slide 14.25 Slide 14.26 Slide 14.27 Slide 14.28 254 CHAPTER 14. CLASSIFICATION Slide 14.29 Slide 14.30 Slide 14.31 Slide 14.32 Chapter 15 Resampling We have previously discussed the idea of resampling under the heading of Transformation (Chapter 9). It was a side-topic in that chapter, essentially an application. We will focus on the topic here, using many of the illustrations from previous chapters. Prüfungsfragen: • Was versteht man unter (geometrischem) Resampling“, und welche Möglichkeiten gibt es, ” die Intensitäten der Pixel im Ausgabebild zu berechnen? Beschreiben sie verschiedene Verfahren anhand einer Skizze und ggf. eines Formelausdrucks! 15.1 The Problem in Examples of Resampling Slide 15.3 recalls an input image that is distorted and illustrates in connection with Slide 15.4 the rectification of the image, a geometric transformation from the input geometry to an output geometry. The basic idea is illustrated in Slide 15.5. On the left, we have an input image geometry, representing an distorted image. On the right, we have the output geometry, representing a corrected or rectified image. The suggestion is here that we take a grid mesh of lines to cut up the input image and we stretch each quadrilateral on the input image to fit into a perfect square on the output image. This casual illustration of geometric transformation actually presents reasonably fairly what happens in geometric transformation and resampling in digital image processing. Resampling is also applicable in a context where we have individual images taken at different times from different vantage points and we need to merge them into a continuous large image. We call this process mosaicing. The images might overlap, and the overlap is used to achieve a match between the images, finding homologue points. Those are the basis for a geometric transformation and resampling process, to achieve the mosaic. Finally, resampling is also an issue in computer graphics when dealing with texture. We may have an input image, showing a particular pattern, and as we geometrically transform or change the scale of that pattern, we will have to resample the texture. The illustration shows so-called MIP-maps which are small image segments which are rich in detail. 15.2 A Two-Step Process Geometric transformation and resampling really are typically performed in a two-step process. The first step is the establishment of a geometric relationship between the input and the output 255 256 CHAPTER 15. RESAMPLING images, essentially a coordinate processing issue. We typically have a regular pattern of pixels in the input image, and conceptually we need to find a geometric location in the output image, representing the center of each pixel from the input image. Vice-versa, we may have a regular image matrix on the output side (the ground), and for each center of an output pixel, we need to find the location in the input image, from where to pick a gray value. Slide 15.9 explains. Slide 15.10 and augment that explanation. We do have an input image that is geometrically distorted. The object might be a stick figure as suggested in Slide 15.10. The output or target image is a transformed stick figure. We have regular pixels in the target or output image, that need to be assigned gray values as a function of the input image. Slide 15.12 explains the idea of the two-step process: We have on the one hand a step 1 with a manipulation of coordinates, mapping the input (x, y) into output (x̂, ŷ) coordinates. We have on the other hand a step 2 with a search for a gray value for each output pixel, starting form the output location of a pixel and looking in the input image for that gray value. 15.2.1 Manipulation of Coordinates We have correspondence points between image-space and target or output space. These correspondence points serve to establish a geometric transformation that converts the input (x, y) coordinates of an arbitrary image location into an output as (i, j) coordinate in the target space. This particular transformation has its unknown transformation parameters which have to be computed in a separate process called spatial transformation. We will discuss in a moment how this is done efficiently. 15.2.2 Gray Value Processing Once this spatial transformation is known, we need to go through the output image and for each pixel center (i, j) we need to find an input coordinate location (x, y) and we need to grab that gray value and place that value at the pixel location of the output or target image. Algorithm 39 Calculation with a node file 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: while there is another quadrangle quadin in the input node file do if there is a corresponding quadrangle quadout in the output node file then read the four mesh points of the quadrangle quadin read the four mesh points of the quadrangle quadout calculate the (eight) parameters params of the (bilinear) transformation save the parameters params else error {no corresponding quadrangle quadout for quadrangle quadin} end if end while for all pixels pout of the output image do get the quadrangle quadout in which pixel pout lies get the parameters params corresponding to the quadrangle quadout calculate the input image position pin of pout with the parameters params calculate the grey value grey of pixel pout according to the position of pin assign the grey value grey to pout end for 15.3. GEOMETRIC PROCESSING STEP 15.3 257 Geometric Processing Step See Algorithm 39. We go back to the idea that we cut up the input image into irregular meshes, and each corner of the mesh pattern represents a corner of a regular mesh pattern in the output image. We call these mesh points also nodes, we obtain a node file in the input image that corresponds to the node file in the output image. Slide 15.15 suggests that the geometric transformation that will relate the irregular meshes of the input image to the rectangular meshes of the output image could be a polynomial transformation as previously discussed. More generally, we use a simple transformation that takes four input points into the four output points as suggested in Slide 15.16. That is a bi-linear transformation with 8 coefficients. The relationships between the mesh points of the input and output image are obtained as a function of control points1 . Suggested in Slide 15.16 and Slide 15.17 are control points at the locations marked by little stars. It is those stars that define the parameters of a complex transformation function. The transformation function is applied to the individual mesh points in the input and output images. For each location in the output image, we compute the corresponding input mesh point. Slide 15.18 summarizes the result of these transactions. Recall that we had given control points, which we use to compute the transformation function. With the transformation function we establish image coordinates that belong to mesh points in the image representing regularly spaced mesh points in the output image. With this process, we have established the geometric relationship between input and output image using the ideas of transformations and resulting with a node file in the input- and output images. Algorithm 40 Nearest neighbor 1: 2: 3: 4: read float-coordinates x0 and y0 of the input-point x1 := round(x0 ) y1 := round(y0 ) return grayvalue of the new point (x1 , y1 ) 15.4 {result is an integer} {result is an integer} Radiometric Computation Step After the geometric relationships have been resolved, we now go to an arbitrary output pixel and using its position within a square mesh, we compute the location in the input image, using the bi-linear relationship within the mesh to find the location in the input image as suggested in Slide 15.20. That location will be an arbitrary point (x, y) that is not at the center of any pixel. We now can select among various techniques to find a gray value for that location to be put into the output pixel. Suggested in Slide 15.20 are 3 different techniques. If we take the gray value of the pixel onto which location (x, y) falls, we call this the nearest neighbor (see Algorithm 40). If we take four pixels that are nearest to the location (x, y), we can compute a bi-linear interpolation (see Algorithm ??). If we use the 9 closest pixels, we can use a bi-cubic interpolation. We differentiate between nearest neighbor, bi-linear and bi-cubic resampling in accordance with the technique for gray value assignment. Slide 15.21 specifically illustrates the bi-linear interpolation: which gray value do we assign to the output pixel as shown in Slide 15.21? We take the 4 gray values nearest the location (x, y), those gray values are g1 , g2 , g3 , g4 , and by a simple interpolation, using auxiliary values a and b, we obtain a gray value bi-linearly interpolated from the four gray values g1 , g2 , g3 , g4 . Prüfungsfragen: 1 in German: Pass-Punkte 258 CHAPTER 15. RESAMPLING • Gegeben sei ein Inputbild mit den darin mitgeteilten Grauwerten (Abbildung B.8). Das Inputbild umfasst 5 Zeilen und 7 Spalten. Durch eine geometrische Transformation des Bildes gilt es nun, einigen bestimmten Pixeln im Ergebnisbild nach der Transformation einen Grauwert zuzuweisen, wobei der Entsprechungspunkt im Inputbild die in Tabelle B.1 angegebenen Zeilen- und Spaltenkoordinaten aufweist. Berechnen Sie (oder ermitteln Sie mit grafischen Mitteln) den Grauwert zu jedem der Ergebnispixel, wenn eine bilineare Grauwertzuweisung erfolgt. 15.5 Special Case: Rotating an Image by Pixel Shifts We show in Slide 15.23 an aerial oblique image of an urban scene. We want to rotate that image by 45o . We achieve this by simply shifting rows and columns of pixels (see Algorithm ??).. In a first step, we shift each column of the image, going from right to left and increasingly shifting the rows down. In a second step, we now take the rows of the resulting image and shift them horizontally. As a result, we obtain a rotated version of the original image. 15.5. SPECIAL CASE: ROTATING AN IMAGE BY PIXEL SHIFTS 259 260 CHAPTER 15. RESAMPLING Slide 15.1 Slide 15.2 Slide 15.3 Slide 15.4 Slide 15.5 Slide 15.6 Slide 15.7 Slide 15.8 Slide 15.9 Slide 15.10 Slide 15.11 Slide 15.12 Slide 15.13 Slide 15.14 Slide 15.15 Slide 15.16 Slide 15.17 Slide 15.18 Slide 15.19 Slide 15.20 Slide 15.21 Slide 15.22 Slide 15.23 Slide 15.24 Slide 15.25 Slide 15.26 Slide 15.27 Chapter 16 About Simulation in Virtual and Augmented Reality 16.1 Various Realisms Recall that we have earlier defined various types of reality. We talked about virtual reality, that presents objects to the viewer that are modeled in a computer. Different from that is photographic reality that we experience by an actual photograph of the natural environment. It differs from the experience we have when we go in the real world and experience physical reality. You may recall that we also talked about emotions and therefore talked about psychological reality, different from the physical one. Simulation is now an attempt at creating a virtual environment that provides essential aspects of the physical or psychological reality in a human being without the presence of the full physical reality. 16.2 Why simulation? To save money when training pilots, bus drivers, ship captains, soldiers, etc. Simulation servers may be used for disaster preparedness training. Simulation is big business. How realistic does a simulation have to be? Sufficiently realistic to serve the training purpose. Therefore not under all circumstances do we need photorealism in simulation. We just need to have enough visual support to challenge the human in a training situation. 16.3 Geometry, Texture, Illumination Simulation needs information about the geometry of a situation, the illumination and the surface properties. These are three factors, illustrated in Slide 16.8, Slide 16.9, Slide 16.10. The geometry will not suffice if we need to recognize a particular scene. We will have difficulties with depth queues as a function of size. We have a much reduced quality of data if we ignore texture. Texture provides a greatly enhanced sense of realism and helps us better to estimate depth. In a disasterpreparedness scenario, the knowledge of windows and doors may be crucial and it may only be available through texture and not through geometry. Illumination is a third factor that creates shadows and light, again to help better understand the context of a scene, estimate distances and intervisibility. 261 262 CHAPTER 16. ABOUT SIMULATION IN VIRTUAL AND AUGMENTED REALITY 16.4 Augmented Reality We combine the real world and the computer generated representation of a modeled world that does not need to be in existence in reality. A challenge is the calibration of a system. We need to see the real world and what is superimposed on it is shown on the two monitors. This needs to match geometrically and in scale with the real environment that we see. Therefore we need to define a world coordinate system and communicate that to the computer. We also need sufficient speed, so if we turn our head, the two stereo-images computed for visual consumption are recomputed instantly as a function of the changed angle. We need also to be accurate to assess any rotations or change of position. Magnetic positioning often is too slow and too inaccurate to serve the purpose well. For that reason, an optical auxiliary system may be included in an augmented reality environment, so that the world is observed through the camera and any change in attitude or position of the viewer is more accurately tracked than the magnetic position could achieve. However, a camera-based optical tracking system may be slow, too slow to act in real time at a rate of about thirty positioning computations per second. Therefore the magnetic positioning may provide an approximate solution that is only refined by the optical tracking. Slide 16.13 illustrates an application with a game played by two people seeing the same chess board. An outside observer seeing the two players will see nothing. It is the two chess players who will see one another and the game board. Prüfungsfragen: • Beschreiben Sie den Unterschied zwischen Virtual Reality“ und Augmented Reality“. ” ” Welche Hardware wird in beiden Fällen benötigt? 16.5 Virtual Environments If we exclude the real world from being experienced, then we talk about the virtual environment or, more customarily, virtual reality. We immerse ourselves in the world of data. However, we still have our own position and direction of viewing. As we move or turn our head we would like to have in a virtual environment a resulting effect of looking at a new situation. Therefore, much as in augmented reality, do we have a need to recompute very rapidly the stereo-impression of the data world. However, virtual reality is simpler than augmented reality, because we don’t have the accuracy requirement to superimpose the virtual over the real, as we have in augmented reality. In a virtual reality environment, we would like to interact with the computer using our hands and as a result we need some data garments that allow us to provide inputs to the compute, for example by motions of our hands and fingers. Prüfungsfragen: • Erklären Sie das Funktionsprinzip zweier in der Augmented Reality häufig verwendeter Trackingverfahren und erläutern Sie deren Vor- und Nachteile! Antwort: Tracking magnetisch optisch Vorteile robust schnell genau Nachteile kurze Reichweite ungenau Anforderung an Umgebung aufwändig 16.5. VIRTUAL ENVIRONMENTS 263 Slide 16.1 Slide 16.2 Slide 16.3 Slide 16.4 Slide 16.5 Slide 16.6 Slide 16.7 Slide 16.8 Slide 16.9 Slide 16.10 Slide 16.11 Slide 16.12 Slide 16.13 Slide 16.14 Slide 16.15 Slide 16.16 264 CHAPTER 16. ABOUT SIMULATION IN VIRTUAL AND AUGMENTED REALITY Chapter 17 Motion 17.1 Image Sequence Analysis A fixed sensor may observe a moving object, as suggested in Slide 17.3, where a series of images is taken of moving ice in the arctic ocean. There is not only a motion of the ice, there is also a change of the ice over time. Slide 17.4 presents a product obtained from an image sequence analysis, representing a vector diagram of ice flows in the arctic ocean. The source of the results was a satellite radar system of NASA, called Seasat that flew in 1978. This is now available also from recent systems such as Canada’s Radarsat, currently orbiting the globe. 17.2 Motion Blur Slide 17.6 illustrates a blurred image that is a result of an exposure taken while an object moved. If the motion is known, then its effect can be removed and we can restore an image as if no motion had happened. The inverse occurs in Slide 17.7, where the object was stable but the camera moved during the exposure. The same applies: if we can model the motion of the camera we will obtain a successful reconstruction of the object by removal of the motion blur of the camera Slide 17.7 suggests that simple filtering will not remove that blur. We need to model the effect of the motion. Yet the process itself is called an Anti-Blur filter . Prüfungsfragen: • Was versteht man unter motion blur“, und unter welcher Voraussetzung kann dieser Effekt ” aus einem Bild wieder entfernt werden? Antwort: Durch Bewegung des aufgenommenen Objekts relativ zur Kamera während der endlichen Öffnungszeit der Blende wird das Bild verwischt“. Eine Entfernung dieses Effekts ” setzt voraus, dass diese Bewegung genau bekannt ist. 17.3 Detecting Change Change may occur because of motion. Slide 17.9 explains the situation in which a group of people is imaged while a person is moving out of the field-of-view of the camera. An algorithm can be constructed that will detect the change between each image and its predecessors and in the process 265 266 CHAPTER 17. MOTION allows one to map just changes. The inverse idea is to find what is constant and eliminate changes form a sequence of images. An example is to compute texture of a building’s facade covered by trees. 17.4 Optical Flow A rapid sequence of images may be obtained of a changing situation. An example is the observation of traffic. Optical flow is the analysis of the sequence of images, and the assessment of the motion that is evident from the image stream. A typical representation of optical flow is by vectors representing moving objects. Slide 17.12 explains. 17.4. OPTICAL FLOW 267 Slide 17.1 Slide 17.2 Slide 17.3 Slide 17.4 Slide 17.5 Slide 17.6 Slide 17.7 Slide 17.8 Slide 17.9 Slide 17.10 Slide 17.11 Slide 17.12 Slide 17.13 Slide 17.14 Slide 17.15 Slide 17.16 Slide 17.17 268 CHAPTER 17. MOTION Chapter 18 Man-Machine-Interfacing Our University offers a separate class on Man-Machine Interaction or Human-Computer-Interfaces (HCI) as part of the multi-media program and as part of the computer graphics program. This topic relates to elements of Computer-Graphics and Image Analysis since visual information in created and manipulated. 18.1 Visualization of Abstract Information Use of color and shape are a widely applicable tool in converging information. We have seen examples in the Chapter on Color, encoding terrain elevation or temperature in color, or marking contours of objects in color. A very central element in the man-machine interaction is the use of the human visual sense to present non-visual information for communication and interaction. An example is shown in Slide 18.3 where a diagram is presented that has on one axis the calendar time and on the other axis a measure of popularity of movies. The interface serves to find movies on a computer monitor by popularity and by age. Simultaneously, we can switch between various types of movies, like drama, mystery, comedy and so forth. Slide 18.4 is a so-called table-lens. This is a particular type of Excel sheet which shows the entire complexity of the sheet in the background and provides a magnifying class that can be moved over the spread sheet. Another idea is shown in Slide 18.5 with the so-called cone-tree, representing a file structure. It is a tree, which at its root has an entire directory, this is broken up into folders or subdirectories which are then further broken up until each leaf is reached representing an individual file. A similar idea is shown in Slide 18.6 called information slices. We have a very large inventory of files, organized in subdirectories and directories. We can take subgroups of these subdirectories and magnify them, until we can recognize each individual file. 18.2 Immersive Man-Machine Interactions The subject of man-machine interaction also is involved in an immersion of the human in the world of data, as we previously discussed in virtual reality which is sometimes denoted as immersive visualization. Of particular interest is the input to the computer by means other than a keyboard and mouse. This of course is increasingly by speech, but also by motions of the hands and fingers, or by the recognition of facial expressions. This represents a hot subject in man-machine interaction and ties in with computer graphics and image analysis. 269 270 CHAPTER 18. MAN-MACHINE-INTERFACING Slide 18.1 Slide 18.2 Slide 18.3 Slide 18.4 Slide 18.5 Slide 18.6 Slide 18.7 Slide 18.8 Slide 18.9 Slide 18.10 Chapter 19 Pipelines 19.1 The Concept of an Image Analysis System Various ideas exist in the literature about a system for image analysis. The idea of a pipeline comes about if we consider that we have many components and algorithms in a repository of an image possessing library. In order to set up an entire image analysis process, we plug the individual processing steps together, much like a plumber will put a plumbing system in a building together from standard components. In computer graphics and image processing we call this plumbing also creation of a pipeline. As shown in Slide 19.3 an image analysis system always begins with image acquision and sensing. We build up a system by going through preprocessing and segmentation to representation, recognition and final use of the results of the image analysis system. All of this is built around knowledge. A somewhat different view combines the role of image analysis with the role of computer graphics and separates the role into half worlds, one of reality and one of computer models. In the simplest case, we have the world, within it a scene from which we obtain an image which goes into the computer. The image will be replaced by an image description which then leads to a scene description, which ultimately ends up with a description of the world. We can close the loop from the description of the world, go back to the world, make the transition from computer to reality by computer graphics. The idea of active vision is going from the world to a description of the world, closing the loop from an incomplete description of the world to a new second loop through the selection of a scene, selection of images and so forth as shown in Slide 19.12. If as in analogy to the previous model, we assign a central control element with expert knowledge, we have a similar idea as shown before. Prüfungsfragen: • Skizzieren Sie die Grafik-Pipeline“ für die Darstellung einer digitalen dreidimensionalen ” Szene mittels z-buffering und Gouraud-shading! 19.2 Systems of Image Generation Prüfungsfragen: • Was wird in der Bildanalyse mit dem Begriff Active Vision“ bezeichnet? ” 271 272 CHAPTER 19. PIPELINES 19.3 Revisiting Image Analysis versus Computer Graphics Slide 19.18 suggests that the transition from an image to a model of a scene is the subject of image understanding or image processing. The inverse, the transition from a scene model to an image is the subject of computer graphics. We do have a great overlap between image analysis and computer graphics when it concerns the real world. Image analysis will always address the real world, whereas computer graphics may deal with a virtual world that does not exist in reality. In cases where one goes from a model of a non-existing world to an image, we are not dealing with the inverse of image analysis. Prüfungsfragen: • Welche ist die wesentliche Abgrenzung zwischen Computergrafik und Bildanalyse, welches ist ihr Zusammenhang? Hier ist die Verwendung einer grafischen Darstellung in der Beantwortung erwünscht. Algorithm 41 z-buffer pipeline 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: for y = 0 to YMAX do for x = 0 to XMAX do WritePixel(x, y, backgroundcolor) Z[x, y] := 0 end for end for for all Polygons polygon do for all pixel in the projection of the polygon do pz :=GetZValue(polygon,x,y) if pz ≥ Z[x, y] then Z[x, y] := pz WritePixel(x, y, Color of polygon at (x,y) ) end if end for end for {new point is in front} Algorithm 42 Phong pipeline 1: 2: 3: 4: 5: set value ai set value il diff:=diffuse() reflect:=reflection() result:= ai + il * (diff + reflect) {ai is the ambient intensity.} {il is the intensity of the light source.} {calculates the amount of light which directly fall in.} {calculates the amount of light which reflect.} {formula developed by Phong.} 19.3. REVISITING IMAGE ANALYSIS VERSUS COMPUTER GRAPHICS 273 Slide 19.1 Slide 19.2 Slide 19.3 Slide 19.4 Slide 19.5 Slide 19.6 Slide 19.7 Slide 19.8 Slide 19.9 Slide 19.10 Slide 19.11 Slide 19.12 Slide 19.13 Slide 19.14 Slide 19.15 Slide 19.16 Slide 19.17 Slide 19.18 Slide 19.19 Slide 19.20 274 CHAPTER 19. PIPELINES Chapter 20 Image Representation The main goal of this chapter is to briefly describe some of the most common graphic file formats for image files, as well as how to determine which file format to use for certain applications. When an image is saved to a specific file format, one tells the application how to write the image’s information to disk. The specific file format which is chosen depends on the graphics software application one is using (e.g., Illustrator, Freehand, Photoshop) and how and where the image will be used (e.g., the Web or a print publication). There are three different categories of file formats: bitmap, vector and metafiles. When an image is stored as a bitmap file, its information is stored as a pattern of pixels, or tiny, colored or black and white dots. When an image is stored as a vector file, its information is stored as mathematical data. The metafile format can store an image’s information as pixels (i.e. bitmap), mathematical data (i.e., vector), or both. 20.1 Definition of Terms 20.1.1 Transparency Transparency is the degree of visibility of a pixel against a fixed background. A totally transparent pixel is invisible. Normal images are opaque, in the sense that no provision is made to allow the manipulation and display of multiple overlaid images. To allow image overlay, some mechanism must exist for the specification of transparency on a per-image, per-strip, per-tile, or per-pixel bases. In practice, transparency is usually controlled through the addition of information to each element of the pixel data. The simplest way to allow image overlay is the addition of an overlay bit to each pixel value. Setting the overlay bit in an area of an image allows the rendering application or output device to selectively ignore those pixel values with the bit sample. Another simple way is to reserve one unique color as transparency color, e.g. the background color of a homogenous background. As all images are usually rectangular - regardless of the contours of whatever have been drawn within the image - this property of background transparency is useful for concealing image-backgrounds and making it appear that they are non rectangular. This feature is widely used e.g., for logos on Web pages. A more elaborate mechanism for specifying image overlays allows variations in transparency between bottom and overlaid images. Instead of having a single bit of overlay information, each pixel value has more (usually eight bits). The eight transparency bits are sometimes called the alpha channel. The degree of pixel transparency for an 8-bit alpha channel ranges from 0 (the pixel is completely invisible or transparent) to 255 (the pixel is completely visible or opaque). 275 276 20.1.2 CHAPTER 20. IMAGE REPRESENTATION Compression This is a new concept not previously discussed in this class, except in the context of encoding contours of objects. The amount of image data produced from all kinds of sensor, like digital cameras, remote sensing satellites medical imaging devices, video cameras, increases steadily with increasing number of sensors, resolution and color capabilities. Especially for transmission and storage of this large amount of image data compression is a big issue. We separate data compression into two classes, lossless and lossy compression. Lossless compression preserves all information present in the original data, the information is only stored in an optimized way. Examples for lossless compression are run-length-encoding, where subsequent pixels of the same color are replaced by one color information and the number of following identical pixels, Huffman coding uses codewords of different size instead of the usual strictly 8 or 24 bits, shorter codewords are assignd to symbols which occur more often, this usually reduces the total number of bits used to code an image. Compression rates between 2:1 and maximum 5:1 can be achieved using lossless compression. Lossy compression on the other hand removes invisible or only slightly visible information from the image, e.g. only a reduced set of colors is used or high spatial frequencies in the image are removed. The amount of compression which can be achieved by lossy compression is superior to lossless compression schemes, at compression rates of 10:1 with no visible difference is feasible, the quality for photographs is usually sufficient after a 20:1 compression. However, the information content is changed by such an operation, therefore lossy compressed images are not suitable for further image processing stages. We will see exampels of JPEG compressed images further on in this lecture. Algorithms 43 and ?? illustrate the principles. Algorithm 43 Pipeline for lossless compression load image; // find redundancy and eliminate redundancy for i = 0 to number of image columns do for j = 0 to number of image rows do // find out how often each pixel value appears // (needed for the variable-length coding) for pixel value = 0 to 2b do histogram[pixel value]++; end for huffman (histogram, image); // instead of Huffman other procedures can be used that // produce variable-length code but Huffman leads to // best compression results end for end for save image; 20.1.3 Progressive Coding Progressive image transmision is based on the fact that transmitting all image data may not be necessary under some circumstances. Imagine a situation in which an operator is searching an image database looking for a particular image. If the transmission is based on a raster scanning order, all the data must be transmitted to view the whole image, but often it is not necessary to have the highest possible image quality to find the image for which the operator is looking. Images 20.1. DEFINITION OF TERMS 277 Algorithm 44 Pipeline for lossy compression load image; // find irrelevancy like high frequencies and // eliminate them split image in nxn subimages; // a common value for n is 8 or 16 transform in frequency domain; cut off high frequencies; // find redundancy and eliminate redundancy for i = 0 to number of image columns do for j = 0 to number of image rows do // find out how often each pixel value appears // (needed for the variable-length coding) for pixel value = 0 to 2b do histogram[pixel value]++; end for huffman (histogram, image); // instead of Huffman other procedures can be used that // produce variable-length code but Huffman leads to // best compression results end for end for save image; do not have to be displayed with the hightest available resolution, and lower resolution may be sufficient to reject an image and to begin displaying another one. This approach is also commonly used to decrease the waiting time needed for the image to start appearing after transmission and is used by WWW image transmission. In progressive transmissions, the images are represented in a pyramid structure, the higher pyramid levels (lower resolution) being transmitted first. The number of pixels representing a lowerresolution image is substantially smaller and thus the user can decide from lower resolution images whether further image refinement is needed. 20.1.4 Animation A sequence of two or more images displayed in a rapid sequence so as to provide the illusion of continuous motion. Animations are typically played back at a rate of 12 to 15 frames per second. 20.1.5 Digital Watermarking A digital watermark is a digital signal or pattern inserted into a digital image. Since this signal or pattern is present in each unaltered copy of the original image, the digital watermark may also serve as a digital signature for the copies. A given watermark may be unique to each copy (e.g., to identify the intended recipient), or be common to multiple copies (e.g., to identify the document source). In either case, the watermarking of the document involves the transformation of the original into another form. Unlike encryption, digital watermarking leaves the original image or (or file) basically intact and recognizable. In addition, digital watermarks, as signatures, may not be validated without special software. Further, decrypted documents are free of any residual effects of encryption, whereas 278 CHAPTER 20. IMAGE REPRESENTATION digital watermarks are designed to be persistent in viewing, printing, or subsequent re-transmission or dissemination. Two types of digital watermarks may be distinguished, depending upon whether the watermark appears visible or invisible to the casual viewer. Visible watermarks Slide ?? are used in much the same way as their bond paper ancestors. One might view digitally watermarked documents and images as digitally ”stamped”. Invisible watermarks Slide ??, on the other hand, are potentially useful as a means of identifying the source, author, creator, owner, distributor or authorized consumer of a document or image. For this purpose, the objective is to permanently and unalterably mark the image so that the credit or assignment is beyond dispute. In the event of illicit usage, the watermark would facilitate the claim of ownership, or the receipt of copyright revenues. 20.2 Common Image File Formats Following are descriptions of some commonly used file formats: 20.2.1 BMP: Microsoft Windows Bitmap The bitmap file format is used for bitmap graphics on the Windows platform only. Unlike other file formats, which store image data from top to bottom and pixels in red/green/blue order, the BMP format stores image data from bottom to top and pixels in blue/green/red order. This means that if memory is tight, BMP graphics will sometimes appear drawn from bottom to top. Compression of BMP files is not supported, so they are usually very large. 20.2.2 GIF: Graphics Interchange Format The Graphics Interchange Format was originally developed by CompuServe in 1987. It is one of the most popular file formats for Web graphics for exchanging graphics files between computers. It is most commonly used for bitmap images composed of line drawings or blocks of a few distinct colors. The GIF format supports 8 bits of color information or less. Therefore it is not suitiable for photographs. In addition, the GIF89a file format supports transparency, allowing you to make a color in your image transparent. (Please note: CompuServe Gif(87) does not support transparency). This feature makes GIF a particularly popular format for Web images. When to use GIF Use the GIF file format for images with only a few distinct colors, such as illustrations, cartoons, and images with blocks of color, such as icons, buttons, and horizontal rules. GIF, like JPEG, is a “lossy” file format! It reduces an image’s file size by removing bits of color information during the conversion process. The GIF format supports 256 colors or less. When creating images for the Web, be aware that only 216 colors are shared between Macintosh and Windows monitors. These colors, called the “Web palette,” should be used when creating GIFs for the Web because colors that are not in this palette display differently on Macintosh and Windows monitors. The restriction to only 256 colors is the reason why GIF is not siutable for color photographs. 20.2. COMMON IMAGE FILE FORMATS 20.2.3 279 PICT: Picture File Format The Picture file format is for use primarily on the Macintosh platform; it is the default format for Macintosh image files. The PICT format is most commonly used for bitmap images, but can be used for vector images was well. Avoid using PICT images for print publishing. The PICT format is “lossless,” meaning it does not remove information from the original image during the file format conversion process. Because the PICT format supports only limited compression on Macintoshes with QuickTime installed, PICT files are usually large. When saving an image as a PICT, add the extension “.pct” to the end of its file name. Use the PICT format for images used in video editing, animations, desktop computer presentations, and multimedia authoring. 20.2.4 PNG: Portable Network Graphics The Portable Network Graphics format was developed to be the successor to the GIF file format. PNG is not yet widely supported by most Web browsers; Netscape versions 4.04 and later and Internet Explorer version 4.0b1 and later currently support this file format. However, PNG is expected to become a mainstream format for Web images and could replace GIF entirely. It is platform independent and should be used for single images only (not animations). Compared with GIF, PNG offers greater color support, better compression, gamma correction for brightness control across platforms, better support for transparency (alpha channel), and a better method for displaying progressive images. 20.2.5 RAS: Sun Raster File The Sun Raster image file format is the native bitmap format of the SUN Microsystems UNIX platforms using the SunOS operating system. This format is capable of storing black-and-white, gray-scale, and color bitmapped data of any pixel depth. The use of color maps and a simple Run-Length data compression are supported. Typically, most images found on a SunOS system are Sun Raster images, and this format is supported by most UNIX imaging applications. 20.2.6 EPS: Encapsulated PostScript The Encapsulated PostScript file format is a metafile format; it can be used for vector images or bitmap images. The EPS file format can be used on a variety of platforms, including Macintosh and Windows. When you place an EPS image into a document, you can scale it up or down without information loss. This format contains PostScript information and should be used when printing to a PostScript output device. The PostScript language , which was developed by Adobe, is the industry standard for desktop publishing software and hardware. EPS files can be graphics or images of whole pages that include text, font, graphic, and page layout information. 20.2.7 TIFF: Tag Interchange File Format The Tag Interchange File Format is a tag-based format that was developed and maintained by Aldus (now Adobe). TIFF, which is used for bitmap images, is compatible with a wide range of software applications and can be used across platforms such as Macintosh, Windows, and UNIX. The TIFF format is complex, so TIFF files are generally larger than GIF or JPEG files. TIFF supports lossless LZW (Lempel-Ziv-Welch) compression ; however, compressed TIFFs take longer to open. When saving a file to the TIFF format, add the file extension “.tif” to the end of its file name. 280 20.2.8 CHAPTER 20. IMAGE REPRESENTATION JPEG: Joint Photographic Expert Group Like GIF, the Joint Photographic Experts Group format is one of the most popular formats for Web graphcis. It supports 24 bits of color information, and is most commonly used for photographs and similar continous-tone bitmap images. The JPEG file format stores all of the color information in an RGB image, then reduces the file size by compressing it, or saving only the color information that is essential to the image. Most imaging applications and plug-ins let you determine the amount of compression used when saving a graphic in JPEG format. Unlike GIF, JPEG does not support transparency. When to use JPEG? JPEG uses a “lossy” compression technique, which changes the original image by removing information during the conversion process. In theory, JPEG was designed especially for photographs so that changes made to the orginal image during conversion to JPEG would not be visible to the human eye. Most imaging applications let you control the amount of lossy compression performed on an image, so you can tade off image quality for smaller file size and vice versa. Be aware that the chances of degrading our image when converting it to JPEG increase proportionally with the amount of compression you use. JPEG is superior to GIF for storing full-color or grayscale images of “realistic” scenes, or images with continouos variation in color. For example, use JPEG for scanned photographs and naturalistic artwork with hightlights, shaded areas, and shadows. The more complex and subtly rendered the image is, the more likeley it is that the image should be converted to JPEG. Do not use JPEG for illustrations, cartoons, lettering, or any images that have very sharp edges (e.g., a row of black pixels adjacent to a row of white pixels). Sharp edges in images tend to blur in JPEG unless you use only a small amount of compression when converting the image. The JPEG data compression is being illustrated with an original image shown in Slide ??. We have an input parameter into a JPEG compression scheme that indicates how many coefficients one is carrying along. This is expressed by a percentage. Slide ?? shows 75% of the coefficients, leading to a 15:1 compression of that particular image. We go on to 50% of the coefficients in Slide ?? and 20% in Slide ??. We can appreciate the effect of the compression on the image by comparing a enlarged segment of the original image with a similarly enlarged segment of the de-compressed JPEG-image. Note how the decompression reveals that we have contaminated the image, because objects radiate out under the effect of the forward transform that cannot fully be undone by an inverse transform using a reduced set of coefficients. The effect of the compression and the resulting contamination of the image is larger as we use fewer and fewer coefficients of the transform as shown in Slide ?? and Slide ??. The effect of the compression can be shown by computing a difference image of just the intensity component (black and white component) as shown in Slide ??, Slide ??, and Slide ??. The basic principle of JPEG compression is illustrated in Algorithm 45. Prüfungsfragen: • Nach welchem Prinzip arbeitet die JPEG-Komprimierung von digitalen Rasterbildern? 20.3 Video File Formats: MPEG Slide ?? illustrates the basic idea of the MPEG-1 standard for the compression of movies. MPEG stands for Motion Picture Expert Group. Note that the MPEG approach takes key frames and compresses them individually as shown as image frames I in Slide ??. Slides P get interpolated between frames I. Frames are then further interpolated using the frames P . Fairly large compression rates can be achieved of 200:1. This leads to the ability of showing movies on laptop computers at 20.4. NEW IMAGE FILE FORMATS: SCALABLE VECTOR GRAPHIC - SVG 281 Algorithm 45 JPEG image compression 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: divide the picture into blocks of 8x8 pixels for all blocks do transform the block by DCT-II methode for all values in the block do quantize the value dependent from the position in the block {high frequencies are less important} end for reorder the values in a zic-zac way {DC value of block is replaced by difference to DC value of previous block} perform a run-length encoding of the quantized values compress the resulting bytes with Huffmann coding end for this time. Slide ?? explains that the requirements for the standard, as they are defined, includes the need to have the ability to play backwards and forwards, to compress time, to support fast motions and rapid changes of scenes, and to randomly access any part of the movie. The basic principle of MPEG compression is illustrated in Algorithm 46. Prüfungsfragen: • Erklären Sie die Arbeitsweise der MPEG-Kompression von digitalen Videosequenzen! Welche Kompressionsraten können erzielt werden? 20.4 New Image File Formats: Scalable Vector Graphic SVG A Vector graphic differs from a raster graphic in that its content is described by mathematical statements. The statements instruct a computer’s drawing engine what to display on screen i.e. pixel information for a bitmap is not stored in the file and loaded into the display device as it is in the case of JPEG and GIF. Instead shapes and lines, their position and direction, colours and gradients are drawn. Vector graphics files contain instructions for the rasterisation of graphics as the statements arrive at the viewer’s browser - ’on the fly’. Vector graphics are resolution independent. That is, they can be enlarged as much as required with no loss of quality as there is no raster type image to enlarge and pixelate. A vector graphic will always display at the best quality that the output device is set to. When printing out a vector graphic from a Web page it will print at the printer’s optimum resolution i.e. without ’jaggies’. Until recently only proprietary formats such as Macromedia Flash or Apple’s QuickTime have allowed Web designers to create and animate vector graphics for the Web. That is going to change with the implementation of SVG (Scalable Vector Graphics). SVG is the standard, based on XML (Extensible Mark-up Language), which is currently undergoing development by the W3C consortium. An SVG file is itself comprised of text, that is the drawing engine instructions within it are written in ordinary text and not the binary symbols 1 and 0. The file can therefore be edited in an application no more complicated than a plain text editor, unlike raster graphics which have to be opened in image editing applications where pixel values are changed with the use of the program’s tools. If the appearance of a vector graphic is required to change in the Web browser, then the text file is edited via: 282 CHAPTER 20. IMAGE REPRESENTATION Algorithm 46 MPEG compression pipeline 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: Open MPEG stream {Encoder, not specified as part of MPEG standard. Subject to various implementation dependant enhancements.} Close MPEG stream Open MPEG stream {Decoder} for all PictureGroups in MPEG stream do for all Pictures in PictureGroup do for all Slices in Picture do for all MacroBlock in Slice do for all Blocks in MacroBlock do {all I,P,B pictures} Variable Length Decoder {Huffman with fixed DC Tables} Inverse Quantizer Inverse ZigZag Inverse Diskrete Cosine Transformation {IDCT} end for end for end for if Picture != I then {interpolated pictures P and B} average +1/2 interpolation new-Picture = IDCT-Picture + interpolated-Picture else new-Picture is ready end if Dither new-Picture for display display new-Picture end for end for Close MPEG stream 20.4. NEW IMAGE FILE FORMATS: SCALABLE VECTOR GRAPHIC - SVG 283 • Editing the graphic in an SVG compliant drawing application (e.g. Adobe Illustrator 9) • Editing the text of which the file is comprised in a text editor • The actions of the viewer in the Web browser - clicking the mouse which triggers a script which changes the text in the vector file As the files are comprised of text the images themselves can be dynamic. For instance CGI and PERL can generate images and animation based on user choices made in the browser. SVG graphics can be used to dynamically (in real time) render database information, change their appearance, and respond to user input and subsequent database queries. As the SVG standard is based on XML it is fully compatible with existing Web standards such as HTML (HyperText Mark Up Language), CSS (Cascading Style Sheets), DOM (Document Object Model), JavaScript and CGI (Common Gateway Interface) etc. The SVG format supports 24-bit colour, ICC color profiles for colour management, pan, zoom, gradients and masking and other features. Type rendered as SVG will look smoother and attributes such as kerning (spacing between characters), paths (paths along which type is run) and ligatures (where characters are joined together) are as controllable as in DTP and drawing applications. Positioning of SVG graphics in the Web browser window will be achieved with the use of CCS (Cascading Style Sheets) which are part of the HTML 4 standard. 284 CHAPTER 20. IMAGE REPRESENTATION Appendix A Algorithmen und Definitionen Algorithmus 1: Affines Matching (siehe Abschnitt 0.6) Definition 2: Modellieren einer Panoramakamera (siehe Abschnitt 0.15) Definition 3: Berechnung der Datenmenge eines Bildes (siehe Abschnitt 1.2) Algorithmus 4: Bildvergrößerung (Raster vs. Vektor) (siehe Abschnitt 1.5) Definition 5: Berechnung der Nachbarschaftspixel (siehe Abschnitt 1.6) Definition 6: Berechnung des Zusammenhanges (siehe Abschnitt 1.6) Definition 7: Berechnung der Distanz zwischen zwei Pixeln (siehe Abschnitt 1.6) Algorithmus 8: Berechnung logischer Maskenoperationen (siehe Abschnitt 1.7) Algorithmus 9: Berechnung schneller Maskenoperationen (siehe Abschnitt 1.7) Definition 10: Modellierung einer perspektiven Kamera (siehe Abschnitt 2.2) Algorithmus 11: DDA einer Geraden (siehe Abschnitt 3.1) Algorithmus 12: Bresenham einer Geraden (siehe Abschnitt 3.1) Algorithmus 13: Füllen eines Polygons (siehe Abschnitt 3.2) Algorithmus 14: Zeichnen dicker Linien (siehe Abschnitt 3.3) Definition 15: Skelettberechnung via MAT (siehe Abschnitt 3.4) Definition 16: Translation (siehe Abschnitt 4.1) Definition 17: Reflektion (siehe Abschnitt 4.1) Definition 18: Komplement (siehe Abschnitt 4.1) Definition 19: Differenz (siehe Abschnitt 4.1) Algorithmus 20: Dilation (siehe Abschnitt 4.2) Definition 21: Erosion (siehe Abschnitt 4.2) Definition 22: Öffnen (siehe Abschnitt 4.3) Definition 23: Schließen (siehe Abschnitt 4.3) Definition 24: Filtern (siehe Abschnitt 4.4) Definition 25: Hit oder Miss (siehe Abschnitt 4.5) 285 286 APPENDIX A. ALGORITHMEN UND DEFINITIONEN Definition 26: Umriss (siehe Abschnitt 4.6) Definition 27: Regionenfüllung (siehe Abschnitt 4.6) Algorithmus 28: Herstellung von Halbtonbildern (siehe Abschnitt 5.1) Definition 29: Farbtransformation in CIE (siehe Abschnitt 5.3) Definition 30: Farbtransformation in CMY (siehe Abschnitt 5.6) Definition 31: Farbtransformation in CMYK (siehe Abschnitt 5.7) Algorithmus 32: HSV-HSI-HLS-RGB (siehe Abschnitt 5.8) Definition 33: YIK-RGB (siehe Abschnitt 5.9) Algorithmus 34: Umwandlung von Negativ- in Positivbild (siehe Abschnitt 5.14) Algorithmus 35: Bearbeitung eines Masked Negative (siehe Abschnitt 5.14) Algorithmus 36: Berechnung eines Ratiobildes (siehe Abschnitt 5.16) Definition 37: Umrechnung lp/mm in Pixelgröße (siehe Abschnitt 6.4) Algorithmus 38: Berechnung eines Histogrammes (siehe Abschnitt 6.6) Algorithmus 39: Äquidistanzberechnung (siehe Abschnitt 6.6) Definition 40: Spreizen des Histogrammes (siehe Abschnitt 6.6) Algorithmus 41: Örtliche Histogrammäqualisierung (siehe Abschnitt 6.6) Algorithmus 42: Differenzbild (siehe Abschnitt 6.6) Algorithmus 43: Schwellwertbildung (siehe Abschnitt 7) Definition 44: Kontrastspreitzung (siehe Abschnitt 7) Definition 45: Tiefpassfilter mit 3 × 3 Fenster (siehe Abschnitt 7.2) Algorithmus 46: Medianfilter (siehe Abschnitt 7.2) Algorithmus 47: Faltungsberechnung (siehe Abschnitt 7.3) Definition 48: USM Filter (siehe Abschnitt 7.4) Definition 49: Allgemeines 3 × 3 Gradientenfilter (siehe Abschnitt 7.5) Definition 50: Roberts-Filter (siehe Abschnitt 7.5) Definition 51: Prewitt-Filter (siehe Abschnitt 7.5) Definition 52: Sobel-Filter (siehe Abschnitt 7.5) Algorithmus 53: Berechnung eines gefilterten Bildes im Spektralbereich (siehe Abschnitt 7.6) Algorithmus 54: Ungewichtetes Antialiasing (siehe Abschnitt 7.9) Algorithmus 55: Gewichtetes Antialiasing (siehe Abschnitt 7.9) Algorithmus 56: Gupte-Sproull-Antialiasing (siehe Abschnitt 7.9) Definition 57: Statistische Texturberechnung (siehe Abschnitt 8.2) Definition 58: Berechnung eines spektralen Texturmasses (siehe Abschnitt 8.4) Algorithmus 59: Aufbringen einer Textur (siehe Abschnitt 8.5) Definition 60: Berechnung einer linearen Transformation in 2D (siehe Abschnitt 9.2) Definition 61: Konforme Transformation (siehe Abschnitt 9.3) Definition 62: Modellierung einer Drehung in 2D (siehe Abschnitt 9.4) 287 Definition 63: Aufbau einer 2D Drehmatrix bei gegebenen Koordinatenachsen (siehe Abschnitt 9.4) Definition 64: Rückdrehung in 2D (siehe Abschnitt 9.4) Definition 65: Aufeinanderfolgende Drehungen (siehe Abschnitt 9.4) Definition 66: Affine Transformation in 2D in homogenen Koordinaten (siehe Abschnitt 9.5) Definition 67: Affine Transformation in 2D in kartesischen Koordinaten (siehe Abschnitt 9.5) Definition 68: Allgemeine Transformation in 2D (siehe Abschnitt 9.6) Algorithmus 69: Berechnung unbekannter Transformationsparameter (siehe Abschnitt 9.6) Algorithmus 70: Cohen Sutherland (siehe Abschnitt 9.8) Definition 71: Aufbau einer homogenen Transformationsmatrix in 2D (siehe Abschnitt 9.9) Definition 72: 3D Drehung (siehe Abschnitt 9.10) Definition 73: 3D affine Transformation in homogenen Koordinaten (siehe Abschnitt 9.11) Definition 74: Bezier-Kurven in 2D (siehe Abschnitt 9.20) Algorithmus 75: Casteljau (siehe Abschnitt 9.21) Algorithmus 76: Berechnung einer Kettenkodierung (siehe Abschnitt 10.1) Algorithmus 77: Splitting (siehe Abschnitt 10.2) Definition 78: Parameterdarstellung einer Geraden für 2D Morphing (siehe Abschnitt 10.3) Algorithmus 79: Aufbau eines Quadtrees (siehe Abschnitt 10.5) Definition 80: Aufbau einer Wireframestruktur (siehe Abschnitt 10.8) Definition 81: Aufbau einer B-Rep-Struktur (siehe Abschnitt 10.12) Definition 82: Aufbau einer Cell“-Struktur (siehe Abschnitt 10.14) ” Algorithmus 83: Aufbau einer BSP-Struktur (siehe Abschnitt 10.14) Algorithmus 84: z-Buffering für eine Octree-Struktur (siehe Abschnitt 11.5) Algorithmus 85: Raytracing für eine Octree-Struktur (siehe Abschnitt 11.6) Definition 86: Ambient Beleuchtung (siehe Abschnitt 12.1) Definition 87: Lambert Modell (siehe Abschnitt 12.1) Algorithmus 88: Gouraud (siehe Abschnitt 12.2) Algorithmus 89: Phong (siehe Abschnitt 12.2) Algorithmus 90: Objektgenaue Schattenberechnung (siehe Abschnitt 12.3) Algorithmus 91: Bildgenaue Schattenberechnung (siehe Abschnitt 12.3) Algorithmus 92: Radiosity (siehe Abschnitt 12.6) Definition 93: Berechnung der Binokularen Tiefenschärfe (siehe Abschnitt 13.1) Definition 94: Berechnung der totalen Plastik (siehe Abschnitt 13.2) Algorithmus 95: Berechnung eines Stereomatches (siehe Abschnitt 13.7) Definition 96: LoG Filter als Vorbereitung auf Stereomatches (siehe Abschnitt 13.7) Algorithmus 97: Aufbau eines Merkmalsraums (siehe Abschnitt 14.3) Algorithmus 98: Pixelzuteilung zu einer Klasse ohne Rückweisung (siehe Abschnitt 14.4) 288 APPENDIX A. ALGORITHMEN UND DEFINITIONEN Algorithmus 99: Pixelzuteilung zu einer Klasse mit Rückweisung (siehe Abschnitt 14.4) Algorithmus 100: Zuteilung eines Merkmalsraumes mittels Trainingspixeln (siehe Abschnitt 14.6) Algorithmus 101: Berechnung einer Knotendatei (siehe Abschnitt 15.3) Algorithmus 102: Berechnung eines nächsten Nachbars (siehe Abschnitt 15.4) Algorithmus 103: Berechnung eines bilinear interpolierten Grauwerts (siehe Abschnitt 15.4) Algorithmus 104: Bilddrehung (siehe Abschnitt 15.5) Algorithmus 105: z-Buffer Pipeline (siehe Abschnitt 19.2) Algorithmus 106: Phong-Pipeline (siehe Abschnitt 19.2) Algorithmus 107: Kompressionspipeline (siehe Abschnitt 20.1.2) Algorithmus 108: JPEG Pipeline (siehe Abschnitt 20.2.8) Algorithmus 109: MPEG Pipeline (siehe Abschnitt 20.3) Appendix B Fragenübersicht B.1 Gruppe 1 • Es besteht in der Bildverarbeitung die Idee eines sogenannten Bildmodelles“. Was ist ” darunter zu verstehen, und welche Formel dient der Darstellung des Bildmodells? [#0001] (Frage I/8 14. April 2000) • Bei der Betrachtung von Pixeln bestehen Nachbarschaften“ von Pixeln. Zählen Sie alle ” Arten von Nachbarschaften auf, die in der Vorlesung behandelt wurden, und beschreiben Sie diese Nachbarschaften mittels je einer Skizze. [#0003] (Frage I/9 14. April 2000, Frage I/1 9. November 2001) • Beschreiben Sie in Worten die wesentliche Verbesserungsidee im Bresenham-Algorithmus gegenüber dem DDA-Algorithmus. [#0006] (Frage I/5 11. Mai 2001, Frage 7 20. November 2001) • Erläutern Sie die morphologische Erosion“ unter Verwendung einer Skizze und eines Forme” lausdruckes. [#0007] (Frage I/2 14. April 2000) • Gegeben sei der CIE Farbraum. Erstellen Sie eine Skizze dieses Farbraumes mit einer Beschreibung der Achsen und markieren Sie in diesem Raum zwei Punkte A, B. Welche Farbeigenschaften sind Punkten, welche auf der Strecke zwischen A und B liegen, zuzuordnen, und welche den Schnittpunkten der Geraden durch A, B mit dem Rand des CIE-Farbraumes? [#0012] (Frage I/3 14. April 2000) • Zu welchem Zweck würde man als Anwender ein sogenanntes Ratio-Bild“ herstellen? Ver” wenden Sie bitte in der Antwort die Hilfe einer Skizze zur Erläuterung eines Ratiobildes. [#0015] (Frage I/4 14. April 2000) • Welches Maß dient der Beschreibung der geometrischen Auflösung eines Bildes, und mit welchem Verfahren wird diese Auflösung geprüft und quantifiziert? Ich bitte Sie um eine Skizze. [#0017] (Frage I/10 14. April 2000) 289 290 APPENDIX B. FRAGENÜBERSICHT • Eines der populärsten Filter heißt Unsharp Masking“ (USM). Wie funktioniert es? Ich bitte ” um eine einfache formelmäßige Erläuterung. [#0021] (Frage I/11 14. April 2000) • In der Vorlesung wurde ein Baum“ für die Hierarchie diverser Projektionen in die Ebene ” dargestellt (Planar Projections). Skizzieren Sie bitte diesen Baum mit allen darin vorkommenden Projektionen. [#0026] (Frage I/12 14. April 2000) • Wozu dient das sogenannte photometrische Stereo“? Und was ist die Grundidee, die diesem ” Verfahren dient? [#0033] (Frage I/5 14. April 2000, Frage I/1 28. September 2001) • Was ist eine einfache Realisierung der Spiegelreflektion“ (engl.: specular reflection) bei ” der Darstellung dreidimensionaler Objekte? Ich bitte um eine Skizze, eine Formel und den Namen eines Verfahrens nach seinem Erfinder. [#0034] (Frage I/6 14. April 2000, Frage I/6 28. September 2001, Frage I/6 1. Februar 2002) • Welche ist die wesentliche Abgrenzung zwischen Computergrafik und Bildanalyse, welches ist ihr Zusammenhang? Hier ist die Verwendung einer grafischen Darstellung in der Beantwortung erwünscht. [#0041] (Frage I/1 14. April 2000) • Was bedeuten die Begriffe geometrische“ bzw. radiometrische“ Auflösung eines Bildes? ” ” Versuchen Sie, Ihre Antwort durch eine Skizze zu verdeutlichen. [#0047] (Frage I/1 14. Dezember 2001) • Was versteht man unter Rasterkonversion“, und welche Probleme können dabei auftreten? ” [#0058] (Frage I/1 26. Mai 2000, Frage I/8 15. März 2002) • Erläutern Sie das morphologische Öffnen“ unter Verwendung einer Skizze und eines Formel” ausdruckes. [#0059] (Frage I/2 26. Mai 2000, Frage I/4 10. November 2000) • Erklären Sie das Problem, das bei der Verwendung von einem Pixel breiten“ Linien auftritt, ” wenn eine korrekte Intensitätswiedergabe gefordert ist. Welche Lösungsmöglichkeiten gibt es für dieses Problem? Bitte verdeutlichen Sie Ihre Antwort anhand einer Skizze! (Hinweis: betrachten Sie Linien unterschiedlicher Orientierung!) [#0060] (Frage I/3 26. Mai 2000) • Was versteht man unter dem dynamischen Bereich“ eines Mediums zur Wiedergabe bild” hafter Informationen, und im welchem Zusammenhang steht er mit der Qualität der Darstellung? Reihen Sie einige gebräuchliche Medien nach aufsteigender Größe ihres dynamischen Bereiches! [#0061] (Frage I/5 30. Juni 2000, Frage 1 20. November 2001, Frage I/5 15. März 2002) • Können von einem RGB-Monitor alle vom menschlichen Auge wahrnehmbaren Farben dargestellt werden? Begründen Sie Ihre Antwort anhand einer Skizze! [#0062] (Frage I/4 26. Mai 2000, Frage I/5 10. November 2000, Frage I/2 9. November 2001, Frage 4 20. November 2001) B.1. GRUPPE 1 291 • Was ist ein Medianfilter, was sind seine Eigenschaften, und in welchen Situationen wird er eingesetzt? [#0063] (Frage I/5 26. Mai 2000, Frage I/7 10. November 2000, Frage I/11 30. März 2001, Frage I/5 28. September 2001, Frage 3 20. November 2001) • Erklären Sie die Bedeutung von homogenen Koordinaten für die Computergrafik! Welche Eigenschaften weisen homogene Koordinaten auf? [#0066] (Frage I/6 26. Mai 2000, Frage 1 15. Jänner 2002) • Was versteht man unter (geometrischem) Resampling“, und welche Möglichkeiten gibt es, ” die Intensitäten der Pixel im Ausgabebild zu berechnen? Beschreiben sie verschiedene Verfahren anhand einer Skizze und ggf. eines Formelausdrucks! [#0067] (Frage I/7 26. Mai 2000, Frage I/6 10. November 2000, Frage I/3 28. September 2001, Frage I/9 9. November 2001, Frage 6 20. November 2001, Frage 6 15. Jänner 2002) • Beschreiben Sie mindestens zwei Verfahren, bei denen allein durch Modulation der Oberflächenparameter (ohne Definition zusätzlicher geometrischer Details) eine realistischere Darstellung eines vom Computer gezeichneten Objekts möglich ist! [#0068] (Frage I/8 26. Mai 2000) • Ein dreidimensionaler Körper kann mit Hilfe von Zellen einheitlicher Größe (Würfeln), die in einem gleichmäßigen Gitter angeordnet sind, dargestellt werden. Beschreiben Sie Vor- und Nachteile dieser Repräsentationsform! Begründen Sie Ihre Antwort ggf. mit einer Skizze! [#0070] (Frage I/9 26. Mai 2000) • Erklären Sie (ohne Verwendung von Formeln) das Prinzip des Radiosity“-Verfahrens zur ” Herstellung realistischer Bilder mit dem Computer. Welche Art der Lichtinteraktion kann mit diesem Modell beschrieben werden, und welche kann nicht beschrieben werden? [#0073] (Frage I/10 26. Mai 2000) • In der Einführungsvorlesung wurde der Begriff Affine Matching“ verwendet. Wozu dient ” das Verfahren, welches dieser Begriff bezeichnet? [#0079] (Frage I/7 14. April 2000) • Skizzieren Sie die Grafik-Pipeline“ für die Darstellung einer digitalen dreidimensionalen ” Szene mittels z-buffering und Gouraud-shading! [#0082] (Frage I/10 30. Juni 2000, Frage I/9 10. November 2000) • Beschreiben Sie den Unterschied zwischen Virtual Reality“ und Augmented Reality“. ” ” Welche Hardware wird in beiden Fällen benötigt? [#0083] (Frage I/9 30. Juni 2000, Frage I/8 28. September 2001, Frage I/8 14. Dezember 2001, Frage I/2 15. März 2002) • Wie werden in der Stereo-Bildgebung zwei Bilder der selben Szene aufgenommen? Beschreiben Sie typische Anwendungsfälle beider Methoden! [#0084] (Frage I/8 30. Juni 2000, Frage I/8 10. November 2000) • Erklären Sie den Vorgang der Schattenberechnung nach dem 2-Phasen-Verfahren mittels z-Buffer! Beschreiben Sie zwei Varianten sowie deren Vor- und Nachteile. [#0086] (Frage I/7 30. Juni 2000) 292 APPENDIX B. FRAGENÜBERSICHT • Man spricht bei der Beschreibung von dreidimensionalen Objekten von 2 12 D- oder 3DModellen. Definieren Sie die Objektbeschreibung durch 2 12 D- bzw. 3D-Modelle mittels Gleichungen und erläutern Sie in Worten den wesentlichen Unterschied! [#0087] (Frage I/6 30. Juni 2000, Frage I/6 9. November 2001, Frage I/6 14. Dezember 2001, Frage 5 15. Jänner 2002) • Welche Eigenschaften weist eine (sich regelmäßig wiederholende) Textur im Spektralraum auf? Welche Aussagen können über eine Textur anhand ihres Spektrums gemacht werden? [#0093] (Frage I/4 30. Juni 2000) • Erklären Sie, unter welchen Umständen Aliasing“ auftritt und was man dagegen unterneh” men kann! [#0094] (Frage I/3 30. Juni 2000) • Geben Sie die Umrechnungsvorschrift für einen RGB-Farbwert in das CMY-Modell und in das CMYK-Modell an und erklären Sie die Bedeutung der einzelnen Farbanteile! Wofür wird das CMYK-Modell verwendet? [#0095] (Frage I/2 30. Juni 2000, Frage 2 20. November 2001) • Welche Vor- und Nachteile haben nicht-perspektive (optische, also etwa Zeilen-, Wärmeoder Panorama-) Kameras gegenüber herkömmlichen (perspektiven) Kameras? [#0097] (Frage I/1 30. Juni 2000) • Definieren Sie den Begriff Kante“. ” (Frage I/1 13. Oktober 2000) [#0105] • Erklären Sie anhand einer Skizze den zeitlichen Ablauf des Bildaufbaus auf einem Elektronenstrahlschirm! [#0109] (Frage I/2 13. Oktober 2000, Frage I/2 1. Februar 2002, Frage I/10 15. März 2002) • Erklären Sie, wie man mit Hilfe der Computertomografie ein dreidimensionales Volumenmodell vom Inneren des menschlichen Körpers gewinnt. [#0110] (Frage I/3 13. Oktober 2000) • Nennen Sie verschiedene Techniken, um dicke“ Linien (z.B. Geradenstücke oder Kreisbögen) ” zu zeichnen. [#0111] (Frage I/4 13. Oktober 2000, Frage I/1 10. November 2000, Frage I/10 9. November 2001) • Zum YIQ-Farbmodell: 1. Welche Bedeutung hat die Y -Komponente im YIQ-Farbmodell? 2. Wo wird das YIQ-Farbmodell eingesetzt? [#0112] (Frage I/5 13. Oktober 2000) • Skizzieren Sie die Form des Filterkerns eines Gaussschen Tiefpassfilters. Worauf muss man bei der Wahl der Filterparameter bzw. der Größe des Filterkerns achten? [#0115] (Frage I/6 13. Oktober 2000, Frage I/3 10. November 2000) • Nennen Sie drei Arten der Texturbeschreibung und führen Sie zu jeder ein Beispiel an. [#0116] (Frage I/7 13. Oktober 2000, Frage I/10 10. November 2000) B.1. GRUPPE 1 293 • Was versteht man unter einer Sweep“-Repräsentation? Welche Vor- und Nachteile hat diese ” Art der Objektrepräsentation? [#0117] (Frage I/8 13. Oktober 2000, Frage I/2 10. November 2000, Frage 4 15. Jänner 2002) • Welche physikalischen Merkmale der von einem Körper ausgesandten oder reflektierten Strahlung eignen sich zur Ermittlung der Oberflächeneigenschaften (z.B. zwecks Klassifikation)? [#0118] (Frage I/9 13. Oktober 2000, Frage I/5 14. Dezember 2001) • Beschreiben Sie zwei Verfahren zur Interpolation der Farbwerte innerhalb eines Dreiecks, das zu einer beleuchteten polygonalen Szene gehört. [#0119] (Frage I/10 13. Oktober 2000) • Was versteht man in der Sensorik unter Einzel- bzw. Mehrfachbildern? Nennen Sie einige Beispiele für Mehrfachbilder! [#0121] (Frage I/1 15. Dezember 2000, Frage I/5 9. November 2001, Frage I/3 14. Dezember 2001) • Skizzieren Sie drei verschiedene Verfahren zum Scannen von zweidimensionalen Vorlagen (z.B. Fotografien)! [#0122] (Frage I/2 15. Dezember 2000) • Beschreiben Sie das Prinzip der Bilderfassung mittels Radar! Welche Vor- und Nachteile bietet dieses Verfahren? [#0123] (Frage I/3 15. Dezember 2000) • Erklären Sie das Funktionsprinzip zweier in der Augmented Reality häufig verwendeter Trackingverfahren und erläutern Sie deren Vor- und Nachteile! [#0124] (Frage I/4 15. Dezember 2000, Frage I/4 1. Februar 2002) • Beschreiben Sie den Unterschied zwischen der Interpolation und der Approximation von Kurven, und erläutern Sie anhand einer Skizze ein Approximationsverfahren Ihrer Wahl! [#0125] (Frage I/5 15. Dezember 2000, Frage 2 15. Jänner 2002) • Geben Sie die Transferfunktion H(u, v) im Frequenzbereich eines idealen Tiefpassfilters mit der cutoff“-Frequenz D0 an! Skizzieren Sie die Transferfunktion! [#0127] ” (Frage I/6 15. Dezember 2000, Frage I/7 14. Dezember 2001) • Erklären Sie, wie in der Visualisierung die Qualität eines vom Computer erzeugten Bildes durch den Einsatz von Texturen verbessert werden kann. Nennen Sie einige Oberflächeneigenschaften (insbesondere geometrische), die sich nicht zur Repräsentation mit Hilfe einer Textur eignen. [#0128] (Frage I/7 15. Dezember 2000) • Erklären Sie, warum bei der Entzerrung von digitalen Rasterbildern meist Resampling“ ” erforderlich ist. Nennen Sie zwei Verfahren zur Grauwertzuweisung für das Ausgabebild! [#0130] (Frage I/8 15. Dezember 2000) • Erklären Sie, wie ein kreisfreier gerichteter Graph zur Beschreibung eines Objekts durch seine (polygonale) Oberfläche genutzt werden kann! [#0131] (Frage I/9 15. Dezember 2000, Frage I/2 28. September 2001) 294 APPENDIX B. FRAGENÜBERSICHT 128x128 256x256 512x512 Figure B.1: wiederholte Speicherung eines Bildes in verschieden Größen • Erklären Sie den Begriff Überwachen beim Klassifizieren“. Wann kann man dieses Verfahren ” einsetzen? [#0133] (Frage I/10 15. Dezember 2000) • Im praktischen Teil der Prüfung wird bei Aufgabe B.2 nach einer Transformationsmatrix (in zwei Dimensionen) gefragt, die sich aus einer Skalierung und einer Rotation um ein beliebiges Rotationszentrum zusammensetzt. Wie viele Freiheitsgrade hat eine solche Transformation? Begründen Sie Ihre Antwort! [#0167] (Frage I/1 2. Februar 2001) • Mit Hilfe von Radarwellen kann man von Flugzeugen und Satelliten aus digitale Bilder erzeugen, aus welchen ein topografisches Modell des Geländes (ein Höhenmodell) aus einer einzigen Bildaufnahme erstellt werden kann. Beschreiben Sie jene physikalischen Effekte der elektromagnetischen Strahlung, die für diese Zwecke genutzt werden! [#0169] (Frage I/2 2. Februar 2001) • In Abbildung B.1 ist ein digitales Rasterbild in verschiedenen Auflösungen zu sehen. Das erste Bild ist 512 × 512 Pixel groß, das zweite 256 × 256 Pixel usw., und das letzte besteht nur mehr aus einem einzigen Pixel. Wie nennt man eine solche Bildrepräsentation, und wo wird sie eingesetzt (nennen Sie mindestens ein Beispiel)? [#0170] (Frage I/6 2. Februar 2001, Frage I/1 1. Februar 2002) • In Abbildung B.2 ist das Skelett eines menschlichen Fußes in verschiedenen Darstellungstechniken gezeigt. Benennen Sie die vier Darstellungstechniken! [#0175] (Frage I/3 2. Februar 2001) • In Abbildung B.3 soll eine Karikatur des amerikanischen Ex-Präsidenten George Bush in eine Karikatur seines Amtsnachfolgers Bill Clinton übergeführt werden, wobei beide Bilder als Vektordaten vorliegen. Welches Verfahren kommt hier zum Einsatz, und welche Datenstrukturen werden benötigt? Erläutern Sie Ihre Antwort anhand einer beliebigen Strecke aus Abbildung B.3! [#0177] (Frage I/5 2. Februar 2001) • Was ist eine 3D Textur“? ” (Frage I/9 2. Februar 2001, Frage I/4 28. September 2001) [#0178] B.1. GRUPPE 1 295 • Welche Rolle spielen die sogenannten Passpunkte“ (engl. Control Points) bei der Interpo” lation und bei der Approximation von Kurven? Erläutern Sie Ihre Antwort anhand einer Skizze! [#0179] (Frage I/7 2. Februar 2001) • Beschreiben Sie eine bilineare Transformation anhand ihrer Definitionsgleichung! [#0180] (Frage I/11 2. Februar 2001) • Zählen Sie Fälle auf, wo in der Bildanalyse die Fourier-Transformation verwendet wird! [#0184] (Frage I/8 2. Februar 2001) • Nach welchem Prinzip arbeitet die JPEG-Komprimierung von digitalen Rasterbildern? [#0185] (Frage I/10 2. Februar 2001, Frage I/9 19. Oktober 2001) • Geben Sie zu jedem der Darstellungsverfahren aus Abbildung B.2 an, welche Informationen über das Objekt gespeichert werden müssen! [#0187] (Frage I/4 2. Februar 2001) • Erläutern Sie den Begriff Sensor-Modell“! ” (Frage I/1 30. März 2001, Frage I/7 19. Oktober 2001) [#0193] • Wie wird die geometrische Auflösung eines Filmscanners angegeben, und mit welchem Verfahren kann man sie ermitteln? [#0194] (Frage I/2 30. März 2001) • Was versteht man unter passiver Radiometrie“? ” (Frage I/3 30. März 2001, Frage I/9 1. Februar 2002) [#0195] • Gegeben sei ein Polygon durch die Liste seiner Eckpunkte. Wie kann das Polygon ausgefüllt (also mitsamt seinem Inneren) auf einem Rasterbildschirm dargestellt werden? Welche Probleme treten auf, wenn das Polygon sehr spitze“ Ecken hat (d.h. Innenwinkel nahe bei Null)? ” [#0196] (Frage I/4 30. März 2001, Frage I/2 14. Dezember 2001) • Wie ist der Hit-or-Miss“-Operator A ~ B definiert? Erläutern Sie seine Funktionsweise zur ” Erkennung von Strukturen in Binärbildern! [#0199] (Frage I/5 30. März 2001) • Was versteht man unter einem Falschfarbenbild (false color image) bzw. einem Pseudofarbbild (pseudo color image)? Nennen Sie je einen typischen Anwendungsfall! [#0200] (Frage I/6 30. März 2001) • Vergleichen Sie die Methode der Farberzeugung bei einem Elektronenstrahlbildschirm mit der beim Offset-Druck. Welche Farbmodelle kommen dabei zum Einsatz? [#0202] (Frage I/7 30. März 2001, Frage I/10 19. Oktober 2001, Frage I/4 14. Dezember 2001) • Was versteht man unter prozeduralen Texturen“, wie werden sie erzeugt und welche Vorteile ” bringt ihr Einsatz? [#0206] (Frage I/8 30. März 2001) • Erklären Sie den Begriff spatial partitioning“ und nennen Sie drei räumliche Datenstruk” turen aus dieser Gruppe! [#0208] (Frage I/9 30. März 2001) 296 APPENDIX B. FRAGENÜBERSICHT • Erklären Sie die Begriffe feature“ (Merkmal), feature space“ (Merkmalsraum) und clus” ” ” ter“ im Zusammenhang mit Klassifikationsproblemen und verdeutlichen Sie Ihre Antwort anhand einer Skizze! [#0209] (Frage I/10 30. März 2001, Frage I/9 28. September 2001, Frage I/7 1. Februar 2002) • Im Folgenden sehen Sie drei 3 × 3 Transformationmatrizen, wobei jede der Matrizen einen bestimmten Transformationstyp für homogene Koordinaten in 2D beschreibt: a11 0 0 A = 0 a22 0 , a11 , a22 beliebig 0 0 1 b11 b12 0 B = −b12 b11 0 , b211 + b212 = 1 0 1 0 1 0 c13 C = 0 1 c23 , c13 , c23 beliebig 0 0 1 Um welche Transformationen handelt es sich bei A, B und C? [#0213] (Frage I/1 11. Mai 2001) • In der Computergrafik ist die Abbildung eines dreidimensionalen Objekts auf die zweidimensionale Bildfläche ein mehrstufiger Prozess (Abbildung B.4), an dem verschiedene Transformationen und Koordinatensysteme beteiligt sind. Benennen Sie die Koordinatensysteme A, B und C in Abbildung B.4! [#0215] (Frage I/1 26. Juni 2001) • Gegeben sei ein verrauschtes monochromes digitales Rasterbild. Gesucht sei ein Filter, das zur Bereinigung eines solchen Bildes geeignet ist, wobei folgende Anforderungen gestellt werden: – Kanten müssen erhalten bleiben und dürfen nicht verwischt“ werden. ” – Im Ausgabebild dürfen nur solche Grauwerte enthalten sein, die auch im Eingabebild vorkommen. Schlagen Sie einen Filtertyp vor, der dafür geeignet ist, und begründen Sie Ihre Antwort! [#0216] (Frage I/2 11. Mai 2001, Frage I/5 19. Oktober 2001) • In der Computergrafik kennt man die Begriffe Phong-shading“ und Phong-illumination“. ” ” Erklären Sie diese beiden Begriffe! [#0219] (Frage I/3 11. Mai 2001) • Bei der Erstellung realistischer Szenen werden in der Computergrafik u.a. die zwei Konzepte shading“ und shadow“ verwendet, um die Helligkeit der darzustellenden Bildpunkte zu ” ” ermitteln. Was ist der Unterschied zwischen diesen beiden Begriffen? [#0220] (Frage I/3 26. Juni 2001, Frage I/2 19. Oktober 2001, Frage I/4 15. März 2002) • Nennen Sie Anwendungen von Schallwellen in der digitalen Bildgebung! [#0225] (Frage I/4 11. Mai 2001) • Nennen Sie allgemeine Anforderungen an eine Datenstruktur zur Repräsentation dreidimensionaler Objekte! [#0230] (Frage I/7 11. Mai 2001) B.1. GRUPPE 1 297 • Beschreiben Sie das ray-tracing“-Verfahren zur Ermittlung sichtbarer Flächen! Welche ” Optimierungen können helfen, den Rechenaufwand zu verringern? [#0231] (Frage I/9 11. Mai 2001, Frage I/8 19. Oktober 2001, Frage 8 15. Jänner 2002) • Beschreiben Sie Anwendungen von Resampling“ und erläutern Sie den Prozess, seine Vari” anten und mögliche Fehlerquellen! [#0232] (Frage I/10 11. Mai 2001) • Nennen Sie verschiedene technische Verfahren der stereoskopischen Vermittlung eines echten“ ” (dreidimensionalen) Raumeindrucks einer vom Computer dargestellten Szene! [#0233] (Frage I/11 11. Mai 2001) • Erklären Sie den Unterschied zwischen supervised classification“ und unsupervised clas” ” sification“! Welche Rollen spielen diese Verfahren bei der automatischen Klassifikation der Bodennutzung anhand von Luftbildern? [#0234] (Frage I/8 11. Mai 2001) • Erklären Sie die Arbeitsweise der MPEG-Kompression von digitalen Videosequenzen! Welche Kompressionsraten können erzielt werden? [#0235] (Frage I/6 11. Mai 2001, Frage I/9 14. Dezember 2001, Frage I/1 15. März 2002) • Was versteht man unter motion blur“, und unter welcher Voraussetzung kann dieser Effekt ” aus einem Bild wieder entfernt werden? [#0238] (Frage I/2 26. Juni 2001, Frage I/10 14. Dezember 2001) • Welchem Zweck dient ein sogenannter Objektscanner“? Nennen Sie drei verschiedene Ver” fahren, nach denen ein Objektscanner berührungslos arbeiten kann! [#0239] (Frage I/4 26. Juni 2001) • Erklären Sie anhand eines Beispiels den Vorgang des morphologischen Filterns! [#0240] (Frage I/6 26. Juni 2001) • Was versteht man unter der geometrischen Genauigkeit (geometric accuracy) eines digitalen Rasterbildes? [#0243] (Frage I/5 1. Februar 2002) • Beschreiben Sie anhand einer Skizze das Aussehen“ folgender Filtertypen im Frequenzbe” reich: 1. Tiefpassfilter 2. Hochpassfilter 3. Bandpassfilter [#0245] (Frage I/8 26. Juni 2001) • Welche statistischen Eigenschaften können zur Beschreibung von Textur herangezogen werden? Erläutern Sie die Bedeutung dieser Eigenschaften im Zusammenhang mit Texturbildern! [#0246] (Frage I/5 26. Juni 2001) 298 APPENDIX B. FRAGENÜBERSICHT • Wird eine reale Szene durch eine Kamera mit nichtidealer Optik aufgenommen, entsteht ein verzerrtes Bild. Erläutern Sie die zwei Stufen des Resampling, die erforderlich sind, um ein solches verzerrtes Bild zu rektifizieren! [#0249] (Frage I/10 26. Juni 2001) • In der Computergrafik gibt es zwei grundlegend verschiedene Verfahren, um ein möglichst (photo-)realistisches Bild einer dreidimensionalen Szene zu erstellen. Verfahren A kommt zum Einsatz, wenn Spiegelreflexion, Lichtbrechung und Punktlichtquellen simuliert werden sollen. Verfahren B ist besser geeignet, um diffuse Reflexion, gegenseitige Lichtabstrahlung und Flächenlichtquellen darzustellen und die Szene interaktiv zu durchwandern. Benennen Sie diese beiden Verfahren und erläutern Sie kurz deren jeweilige Grundidee! [#0253] (Frage I/7 26. Juni 2001) • Was versteht man unter einem LoD/R-Tree“? ” (Frage I/9 26. Juni 2001) [#0254] • Was versteht man unter immersiver Visualisierung“? ” (Frage I/11 26. Juni 2001) [#0256] • Beschreiben Sie die Farberzeugung beim klassischen Offsetdruck! Welches Farbmodell wird verwendet, und wie wird das Auftreten des Moiree-Effekts verhindert? [#0265] (Frage I/10 28. September 2001) • Nennen Sie ein Beispiel und eine konkrete Anwendung eines nicht-optischen Sensors in der Stereo-Bildgebung! [#0266] (Frage I/7 28. September 2001) • Was versteht man unter data garmets“ (Datenkleidung)? Nennen Sie mindestens zwei ” Geräte dieser Kategorie! [#0273] (Frage I/4 19. Oktober 2001) • Skizzieren Sie die Übertragungsfunktion eines idealen und eines Butterworth-Hochpassfilters und vergleichen Sie die Vor- und Nachteile beider Filtertypen! [#0274] (Frage I/1 19. Oktober 2001) • Was versteht man unter einer konformen Transformation“? ” (Frage I/6 19. Oktober 2001) [#0275] • Nach welchem Grundprinzip arbeiten Verfahren, die aus einem Stereobildpaar die Oberfläche eines in beiden Bildern sichtbaren Körpers rekonstruieren können? [#0276] (Frage I/3 19. Oktober 2001) • Beschreiben Sie mindestens zwei Verfahren oder Geräte, die in der Medizin zur Gewinnung digitaler Rasterbilder verwendet werden! [#0278] (Frage I/3 9. November 2001, Frage I/7 15. März 2002) • Was ist Morphologie“? ” (Frage I/7 9. November 2001) [#0279] • Was versteht man unter einem dreidimensionalen Farbraum (bzw. Farbmodell)? Nennen Sie mindestens drei Beispiele davon! [#0280] (Frage I/4 9. November 2001) B.1. GRUPPE 1 • Erläutern Sie die strukturelle Methode der Texturbeschreibung! 299 [#0281] (Frage I/8 9. November 2001) • Nennen Sie ein Verfahren zur Verbesserung verrauschter Bilder, und erläutern sie deren Auswirkungen auf die Qualität des Bildes! Bei welcher Art von Rauschen kann das von Ihnen genannte Verfahren eingesetzt werden? [#0296] (Frage I/3 1. Februar 2002) • Erläutern Sie die Octree-Datenstruktur und nennen Sie mindestens zwei verschiedene Anwendungen davon! [#0298] (Frage I/10 1. Februar 2002) • Erklären Sie den z-buffer-Algorithmus zur Ermittlung sichtbarer Flächen! [#0299] (Frage I/8 1. Februar 2002) • Beschreiben Sie die Arbeitsweise des Marr-Hildreth-Operators1 ! [#0311] (Frage I/9 15. März 2002) • Nennen Sie vier dreidimensionale Farbmodelle, benennen Sie die einzelnen Komponenten und skizzieren Sie die Geometrie des Farbmodells! [#0313] (Frage I/6 15. März 2002) • Versuchen Sie eine Definition des Histogramms eines digitalen Grauwertbildes! [#0314] (Frage I/3 15. März 2002) 1 Dieser Operator wurde in der Vorlesung zur Vorbearbeitung von Stereobilder besprochen und erstmals im Wintersemester 2001/02 namentlich genannt. 300 APPENDIX B. FRAGENÜBERSICHT (a) Verfahren 1 (b) Verfahren 2 (c) Verfahren 3 (d) Verfahren 4 Figure B.2: dreidimensionales Objekt mit verschiedenen Darstellungstechniken gezeigt B.1. GRUPPE 1 301 Figure B.3: Überführung einer Vektorgrafik in eine andere A Modellierungs− Transformation B Projektion C Figure B.4: Prozesskette der Abbildung eines dreidimensionalen Objekts auf die zweidimensionale Bildfläche 302 APPENDIX B. FRAGENÜBERSICHT Figure B.5: Pixelraster Figure B.6: binäres Rasterbild B.2 Gruppe 2 • Gegeben sei ein Druckverfahren, welches einen Graupunkt mittels eines Pixelrasters darstellt, wie dies in Abbildung B.5 dargestellt wird. Wieviele Grauwerte können mit diesem Raster dargestellt werden? Welcher Grauwert wird in Abbildung B.5 dargestellt? [#0011] (Frage II/13 14. Dezember 2001) • Gegeben sei das binäre Rasterbild in Abbildung B.6. Gesucht sei die Quadtree-Darstellung dieses Bildes. Ich bitte Sie, einen sogenannten traditionellen“ Quadtree der Abbildung ” B.6 in einer Baumstruktur darzustellen und mir die quadtree-relevante Zerlegung des Bildes grafisch mitzuteilen. [#0029] (Frage II/14 14. April 2000) • Welche Speicherplatzersparnis ergibt sich im Fall der Abbildung B.6, wenn statt eines traditionellen Quadtrees jener verwendet wird, in welchem die Nullen entfernt sind? Wie verhält sich dieser spezielle Wert zu den in der Literatur genannten üblichen Platz-Ersparnissen? [#0030] (Frage II/15 14. April 2000) • Gegeben sei der in Abbildung B.7 dargestellte Tisch (ignorieren Sie die Lampe). Als Primitiva bestehen Quader und Zylinder. Beschreiben Sie bitte einen CSG-Verfahrensablauf der Konstruktion des Objektes (ohne Lampe). [#0031] (Frage II/17 14. April 2000) B.2. GRUPPE 2 303 Figure B.7: Tisch • Quantifizieren Sie bitte an einem rechnerischen Beispiel Ihrer Wahl das Geheimnis“, welches ” es gestattet, in der Stereobetrachtung mittels überlappender photographischer Bilder eine wesentlich bessere Tiefenwahrnehmung zu erzielen, als dies bei natürlichem binokularem Sehen möglich ist. [#0037] (Frage II/13 14. April 2000) • Gegeben sei ein Inputbild mit den darin mitgeteilten Grauwerten (Abbildung B.8). Das Inputbild umfasst 5 Zeilen und 7 Spalten. Durch eine geometrische Transformation des Bildes gilt es nun, einigen bestimmten Pixeln im Ergebnisbild nach der Transformation einen Grauwert zuzuweisen, wobei der Entsprechungspunkt im Inputbild die in Tabelle B.1 angegebenen Zeilen- und Spaltenkoordinaten aufweist. Berechnen Sie (oder ermitteln Sie mit grafischen Mitteln) den Grauwert zu jedem der Ergebnispixel, wenn eine bilineare Grauwertzuweisung erfolgt. [#0039] 1 2 3 4 5 2 3 4 5 6 3 4 5 6 7 4 5 6 7 8 5 6 7 8 9 6 7 8 9 10 7 8 9 10 11 Figure B.8: Inputbild Zeile 2.5 2.5 4.75 Spalte 1.5 2.5 5.25 Table B.1: Entsprechungspunkte im Inputbild (Frage II/16 14. April 2000, Frage II/17 30. März 2001, Frage II/14 14. Dezember 2001) • Zeichnen Sie in Abbildung B.9 jene Pixel ein, die vom Bresenham-Algorithmus erzeugt werden, wenn die beiden markierten Pixel durch eine (angenäherte) Gerade verbunden werden. Geben Sie außerdem die Rechenschritte an, die zu den von Ihnen gewählten Pixeln führen. [#0057] 304 APPENDIX B. FRAGENÜBERSICHT 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 11 12 Figure B.9: Die Verbindung zweier Pixel soll angenähert werden Figure B.10: Objekt bestehend aus zwei Flächen (Frage II/11 26. Mai 2000) • Finden Sie eine geeignete Bezeichnung der Elemente in Abbildung B.10 und geben Sie die Boundary-Representation dieses Objekts an (in Form von Listen). Achten Sie dabei auf die Reihenfolge, damit beide Flächen in die gleiche Richtung weisen“! [#0069] ” (Frage II/12 26. Mai 2000, Frage II/12 10. November 2000, Frage II/15 11. Mai 2001, Frage II/11 14. Dezember 2001) • Bei der Erstellung eines Bildes mittels recursive raytracing“ trifft der Primärstrahl für ein ” bestimmtes Pixel auf ein Objekt A und wird gemäß Abbildung B.11 in mehrere Strahlen aufgeteilt, die in weiterer Folge (sofern die Rekursionstiefe nicht eingeschränkt wird) die Objekte B, C, D und E treffen. Die Zahlen in den Kreisen sind die lokalen Intensitäten jedes einzelnen Objekts (bzgl. des sie treffenden Strahles), die Zahlen neben den Verbindungen geben die Gewichtung der Teilstrahlen an. Bestimmen Sie die dem betrachteten Pixel zugeordnete Intensität, wenn 1. die Rekursionstiefe nicht beschränkt ist, 2. der Strahl nur genau einmal aufgeteilt wird, 3. die Rekursion abgebrochen wird, sobald die Gewichtung des Teilstrahls unter 15% fällt! Kennzeichnen Sie bitte für die letzten beiden Fälle in zwei Skizzen diejenigen Teile des Baumes, die zur Berechnung der Gesamtintensität durchlaufen werden! [#0072] (Frage II/15 26. Mai 2000) B.2. GRUPPE 2 305 2,7 A 0,1 2 0,5 B 3 0,4 2 C 0,1 D 4 E Figure B.11: Aufteilung des Primärstrahls bei recursive raytracing“ ” y 7 6 5 4 B A 3 M 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 x Figure B.12: Lineare Transformation M eines Objekts A in ein Objekt B • In Abbildung B.12 ist ein Objekt A gezeigt, das durch eine lineare Transformation M in das Objekt B übergeführt wird. Geben Sie (für homogene Koordinaten) die 3 × 3-Matrix M an, die diese Transformation beschreibt (zwei verschiedene Lösungen)! [#0074] (Frage II/13 26. Mai 2000, Frage II/13 10. November 2000) • Definieren Sie den Sobel-Operator und wenden Sie ihn auf die Pixel innerhalb des fett umrandeten Bereiches des in Abbildung B.13 gezeigten Grauwertbildes an! Sie können das Ergebnis direkt in Abbildung B.13 eintragen. [#0075] (Frage II/14 26. Mai 2000) • Wenden Sie ein 3 × 3-Median-Filter auf die Pixel innerhalb des fett umrandeten Bereiches des in Abbildung B.14 gezeigten Grauwertbildes an! Sie können das Ergebnis direkt in Abbildung B.14 eintragen. [#0080] (Frage II/11 30. Juni 2000, Frage II/14 10. November 2000) • In Abbildung B.15 ist ein Objekt gezeigt, dessen Oberflächeneigenschaften nach dem Beleuchtungsmodell von Phong beschrieben werden. Tabelle B.2 enthält alle relevanten Parameter der Szene. Bestimmen Sie für den eingezeichneten Objektpunkt p die vom Beobachter wahrgenommene Intensität I dieses Punktes! Hinweis: Der Einfachkeit halber wird nur in zwei Dimensionen und nur für eine Wellenlänge gerechnet. Zur Ermittlung der Potenz einer Zahl nahe 1 beachten Sie bitte, dass die Näherung (1 − x)k ≈ 1 − kx für kleine x verwendbar ist. [#0085] (Frage II/12 30. Juni 2000, Frage II/15 15. Dezember 2000) 306 APPENDIX B. FRAGENÜBERSICHT 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 3 2 1 1 1 1 Sobel 1 1 1 2 2 2 2 1 1 1 1 2 2 2 2 2 1 Figure B.13: Anwendung des Sobel-Operators auf ein Grauwertbild 0 0 5 0 0 0 0 0 0 0 0 5 0 0 4 0 0 0 0 0 1 5 0 0 1 2 4 0 0 0 5 2 4 5 5 5 0 0 1 3 5 5 5 5 5 0 1 3 5 5 5 2 5 5 0 2 5 5 3 5 5 5 5 Figure B.14: Anwendung eines Median-Filters auf ein Grauwertbild • Ermitteln Sie zu dem Grauwertbild aus Abbildung B.16 eine Bildpyramide, wobei jedem Pixel einer Ebene der Mittelwert der entsprechenden vier Pixel aus der übergeordneten (höher aufgelösten) Ebene zugewiesen wird! [#0088] (Frage II/13 30. Juni 2000, Frage II/11 10. November 2000, Frage II/12 15. Dezember 2000, Frage II/14 28. September 2001) • Geben Sie einen Binary Space Partitioning Tree“ (BSP-Tree) mit möglichst wenig Knoten ” für das Polygon aus Abbildung B.17 an und zeichnen Sie die von Ihnen verwendeten Trennebenen ein! [#0089] (Frage II/14 30. Juni 2000, Frage II/15 10. November 2000, Frage II/15 1. Februar 2002) • Erklären Sie die einzelnen Schritte des Clipping-Algorithmus nach Cohen-Sutherland anhand des Beispiels in Abbildung B.18. Die Zwischenergebnisse mit den half-space Codes sind darzustellen. Es ist jener Teil der Strecke AB zu bestimmen, der innerhalb des Rechtecks R liegt. Die dazu benötigten Zahlenwerte (auch die der Schnittpunkte) können Sie direkt aus Abbildung B.18 ablesen. [#0092] (Frage II/15 30. Juni 2000) • Gegeben seien die Transformationsmatrix 0 0 M = 1 −2 und zwei Punkte 3 p1 = −1 , 1 2 0 0 0 2 0 0 0 −5 0 0 8 2 p2 = 4 −1 in Objektkoordinaten. Führen Sie die beiden Punkte p1 und p2 mit Hilfe der Matrix M in die Punkte p01 bzw. p02 in (normalisierten) Bildschirmkoordinaten über (beachten Sie dabei die Umwandlungen zwischen dreidimensionalen und homogenen Koordinaten)! [#0099] B.2. GRUPPE 2 307 Lichtquelle Beobachter N L V p Figure B.15: Beleuchtetes Objekt mit spiegelnder Oberfläche nach dem Phong-Modell Parameter Formelzeichen Wert diffuser Reflexionskoeffizient Spiegelreflexionskoeffizient Spiegelreflexionsexponent Richtung zur Lichtquelle Richtung zum Beobachter Oberflächennormalvektor Intensität des ambienten Umgebungslichtes Intensität der Lichtquelle kd W (θ) = ks n L V N Ia Ip 0.2 0.5 3 (−0.6, 0.8)T (0.8, 0.6)T (0, 1)T 0 2 Table B.2: Parameter für das Phongsche Beleuchtungsmodell in Abbildung B.15 (Frage II/11 13. Oktober 2000) • Wenden Sie den Clipping-Algorithmus von Cohen-Sutherland (in zwei Dimensionen) auf die in Beispiel B.2 gefundenen Punkte p01 und p02 an, um den innerhalb des Quadrats Q = {(0, 0)T , (0, 1)T , (1, 1)T , (1, 0)T } liegenden Teil der Verbindungsstrecke zwischen p01 und p02 zu finden! Sie können das Ergebnis direkt in Abbildung B.19 eintragen und Schnittberechnungen grafisch lösen. [#0100] (Frage II/12 13. Oktober 2000) • Das Quadrat Q in normalisierten Bildschirmkoordinaten aus Beispiel B.2 wird in ein Rechteck R mit den Abmessungen 10 × 8 in Bildschirmkoordinaten transformiert. Zeichnen Sie die Verbindung der zwei Punkte p01 und p02 in Abbildung B.20 ein und bestimmen Sie grafisch jene Pixel, die der Bresenham-Algorithmus wählen würde, um die Verbindung diskret zu approximieren! [#0102] (Frage II/13 13. Oktober 2000) • Zu dem digitalen Rasterbild in Abbildung B.21 soll das Gradientenbild gefunden werden. Geben Sie einen dazu geeigneten Operator an und wenden Sie ihn auf die Pixel innerhalb des fett umrandeten Rechtecks an. Sie können das Ergebnis direkt in Abbildung B.21 eintragen. Führen Sie außerdem für eines der Pixel den Rechengang vor. [#0103] (Frage II/14 13. Oktober 2000) 308 APPENDIX B. FRAGENÜBERSICHT 3 8 9 9 2 7 6 8 0 3 6 9 0 1 2 7 Figure B.16: Grauwertbild als höchstauflösende Ebene einer Bildpyramide 3 4 2 1 Figure B.17: Polygon für BSP-Darstellung • Nehmen Sie an, der Gradientenoperator in Aufgabe B.2 hätte das Ergebnis in Abbildung B.22 ermittelt. Zeichnen Sie das Histogramm dieses Gradientenbildes und finden Sie einen geeigneten Schwellwert, um Kantenpixel“ zu identifizieren. Markieren Sie in Abbildung ” B.22 rechts alle jene Pixel (Kantenpixel), die mit diesem Schwellwert gefunden werden. [#0104] (Frage II/15 13. Oktober 2000) • Beschreiben Sie mit Hilfe morphologischer Operationen ein Verfahren zur Bestimmung des Randes eines Region. Wenden Sie dieses Verfahren auf die in Abbildung B.23 eingezeichnete Region an und geben Sie das von Ihnen verwendete 3 × 3-Formelement an. In Abbildung B.23 ist Platz für das Endergebnis sowie für Zwischenergebnisse. [#0106] (Frage II/16 13. Oktober 2000) • In Abbildung B.24 sind zwei Binärbilder A und B gezeigt, wobei schwarze Pixel logisch 1“ ” und weiße Pixel logisch 0“ entsprechen. Führen sie die Boolschen Operationen ” 1. A and B, 2. A xor B, 3. A minus B aus und tragen Sie die Ergebnisse in Abbildung B.24 ein! [#0132] (Frage II/11 15. Dezember 2000, Frage II/15 15. März 2002) • Gegeben sei ein Farbwert CRGB = (0.8, 0.5, 0.1)T im RGB-Farbmodell. 1. Welche Spektralfarbe entspricht am ehesten dem durch CRGB definierten Farbton? 2. Finden Sie die entsprechende Repräsentation von CRGB im CMY- und im CMYKFarbmodell! [#0134] (Frage II/13 15. Dezember 2000, Frage II/14 19. Oktober 2001) B.2. GRUPPE 2 309 y 11 B 10 9 8 7 6 5 4 3 2 1 0 A R 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 x Figure B.18: Anwendung des Clipping-Algorithmus von Cohen-Sutherland y 3 2 −2 −1 1 Q 0 1 2 3 x −1 −2 Figure B.19: Clipping nach Cohen-Sutherland 2 • Bestimmen Sie mit Hilfe der normalisierten Korrelation RN (m, n) jenen Bildausschnitt innerhalb des fett umrandeten Bereichs in Abbildung B.25, der mit der ebenfalls angegebenen Maske M am besten übereinstimmt. Geben Sie Ihre Rechenergebnisse an und markieren Sie den gefundenen Bereich in Abbildung B.25! [#0135] (Frage II/14 15. Dezember 2000) • In Abbildung B.26 sehen Sie vier Punkte P1 , P2 , P3 und P4 , die als Kontrollpunkte für eine Bezier-Kurve x(t) dritter Ordnung verwendet werden. Konstruieren Sie mit Hilfe des Verfahrens von Casteljau den Kurvenpunkt für den Parameterwert t = 13 , also x( 13 ), und erläutern Sie den Konstruktionsvorgang! Sie können das Ergebnis direkt in Abbildung B.26 eintragen, eine skizzenhafte Darstellung ist ausreichend. Hinweis: der Algorithmus, der hier zum Einsatz kommt, ist der gleiche, der auch bei der Unterteilung einer Bezier-Kurve (zwecks flexiblerer Veränderung) verwendet wird. [#0164] (Frage II/13 2. Februar 2001, Frage II/12 9. November 2001, Frage II/15 14. Dezember 2001, Frage II/14 15. März 2002) • Berechnen Sie jene Transformationsmatrix M, die eine Rotation um 45◦ im Gegenuhrzeiger√ sinn um den Punkt R = (3, 2)T und zugleich eine Skalierung mit dem Faktor 2 bewirkt 310 APPENDIX B. FRAGENÜBERSICHT R Figure B.20: Verbindung zweier Punkte nach Bresenham 0 0 0 1 2 2 3 0 1 2 3 3 3 3 1 2 3 7 7 6 3 1 2 7 8 9 8 4 2 2 8 8 8 9 5 Figure B.21: Anwendung eines Gradientenoperators (wie in Abbildung B.27 veranschaulicht). Geben Sie M für homogene Koordinaten in zwei Dimensionen an (also eine 3 × 3-Matrix), sodass ein Punkt p gemäß p0 = Mp in den Punkt p0 übergeführt wird. Hinweis: Sie ersparen sich viel Rechen- und Schreibarbeit, wenn Sie das Assoziativgesetz für die Matrixmultiplikation geeignet anwenden. [#0166] (Frage II/15 2. Februar 2001) • In der Bildklassifikation wird oft versucht, die unbekannte Wahrscheinlichkeitsdichtefunktion der N bekannten Merkmalsvektoren im m-dimensionalen Raum durch eine Gausssche Normalverteilung zu approximieren. Hierfür wird die m×m-Kovarianzmatrix C der N Vektoren benötigt. Abbildung B.28 zeigt drei Merkmalsvektoren p1 , p2 und p3 in zwei Dimensionen (also N = 3 und m = 2). Berechnen Sie die dazugehörige Kovarianzmatrix C! [#0173] (Frage II/17 2. Februar 2001) • Skizzieren Sie das Histogramm des digitalen Grauwertbildes aus Abbildung B.29, und kommentieren Sie Ihre Skizze! [#0176] (Frage II/12 2. Februar 2001, Frage II/13 19. Oktober 2001, Frage II/12 14. Dezember 2001) B.2. GRUPPE 2 311 0 1 2 2 0 0 0 2 3 5 7 6 4 2 1 4 8 7 7 7 4 0 6 8 3 2 6 3 0 8 8 1 0 5 0 Figure B.22: Auffinden der Kantenpixel Figure B.23: Rand einer Region • Tragen Sie in die leeren Filtermasken in Abbildung B.30 jene Filterkoeffizienten ein, sodass 1. in Abbildung B.30(a) ein Tiefpassfilter entsteht, das den Gleichanteil des Bildsignals unverändert lässt, 2. in Abbildung B.30(b) ein Hochpassfilter entsteht, das den Gleichanteil des Bildsignals vollständig unterdrückt! [#0182] (Frage II/14 2. Februar 2001, Frage II/15 19. Oktober 2001) • Wenden Sie auf das Binärbild in Abbildung B.31 links die morphologische Operation Öff” nen“ mit dem angegebenen Formelement an! Welcher für das morphologische Öffnen typische Effekt tritt auch in diesem Beispiel auf? Weiße Pixel gelten als logisch 0“, graue Pixel als logisch 1“. Sie können das Ergebnis ” ” rechts in Abbildung B.31 eintragen. [#0186] (Frage II/16 2. Februar 2001) • Gegeben sei ein Farbwert CRGB = (0.8, 0.4, 0.2)T im RGB-Farbmodell. Schätzen Sie grafisch die Lage des Farbwertes CHSV in Abbildung B.32 (also die Entsprechung von CRGB im HSV0 Modell). Skizzieren Sie ebenso die Lage eines Farbwertes CHSV , der den gleichen Farbton und die gleiche Helligkeit aufweist wie CHSV , jedoch nur die halbe Farbsättigung! [#0201] (Frage II/13 11. Mai 2001, Frage II/13 1. Februar 2002) 312 APPENDIX B. FRAGENÜBERSICHT A and B xor minus Figure B.24: Boolsche Operationen auf Binärbildern 0 1 1 1 2 2 2 0 0 1 0 1 1 2 1 1 1 0 0 1 2 0 1 1 2 2 1 2 1 1 1 2 2 2 1 0 1 0 0 M 0 1 0 0 1 1 0 Figure B.25: Ermittlung der normalisierten Korrelation • Abbildung B.33 zeigt einen Graukeil, in dem alle Grauwerte von 0 bis 255 in aufsteigender Reihenfolge vorkommen, die Breite beträgt 50 Pixel. Zeichnen Sie das Histogramm dieses Bildes und achten Sie dabei auf die korrekten Zahlenwerte! Der schwarze Rand in Abbildung B.33 dient nur zur Verdeutlichung des Umrisses und gehört nicht zum Bild selbst. [#0203] (Frage II/12 30. März 2001) • Wenden Sie auf den fett umrandeten Bereich in Abbildung B.34 den Roberts-Operator zur Kantendetektion an! Sie können das Ergebnis direkt in Abbildung B.34 eintragen. [#0204] (Frage II/14 30. März 2001) • Wenden Sie den Splitting-Algorithmus auf Abbildung B.35 an, um eine vereinfachte zweidimensionale Polygonrepräsentation des gezeigten Objekts zu erhalten, und kommentieren Sie einen Schritt des Algorithmus im Detail anhand Ihrer Zeichnung! Wählen Sie den Schwellwert so, dass die wesentlichen Details des Bildes erhalten bleiben (der Mund der Figur kann vernachlässigt werden). Sie können das Ergebnis (und die Zwischenschritte) direkt in Abbildung B.35 einzeichnen. [#0207] (Frage II/13 30. März 2001) • Gegeben seien eine 4 × 4-Matrix 8 0 M= 0 0 0 8 0 0 8 −24 8 8 0 24 1 1 sowie vier Punkte p1 p2 p3 p4 = = = = (3, 0, 1)T (2, 0, 7)T (4, 0, 5)T (1, 0, 3)T im dreidimensionalen Raum. Die Matrix M fasst alle Transformationen zusammen, die zur Überführung eines Punktes p in Weltkoordinaten in den entsprechenden Punkt p0 = M · p B.2. GRUPPE 2 313 Figure B.26: Konstruktion eines Kurvenpunktes auf einer Bezier-Kurve nach Casteljau y 5 y 5 4 4 3 3 2 1 0 2 R R 1 0 0 1 2 3 4 5 6 x 0 1 2 3 4 5 6 x Figure B.27: allgemeine Rotation mit Skalierung in Gerätekoordinaten erforderlich sind (siehe auch Abbildung B.36, die Bildschirmebene und daher die y-Achse stehen normal auf die Zeichenebene). Durch Anwendung der Transformationsmatrix M werden die Punkte p1 und p2 auf die Punkte p01 p02 = (4, 8, 12)T = (6, 8, 3)T in Gerätekoordinaten abgebildet. Berechnen Sie in gleicher Weise p03 und p04 ! [#0210] (Frage II/15 30. März 2001) • Die vier Punkte aus Aufgabe B.2 bilden zwei Strecken A = p1 p2 , B = p3 p4 , deren Projektionen in Gerätekoordinaten in der Bildschirmebene in die gleiche Scanline fallen. Bestimmen Sie grafisch durch Anwendung des z-Buffer-Algorithmus, welches Objekt (A, B oder keines von beiden) an den Pixelpositionen 0 bis 10 dieser Scanline sichtbar ist! Hinweis: Zeichnen Sie p1 p2 und p3 p4 in die xz-Ebene des Gerätekoordinatensystems ein! [#0211] (Frage II/16 30. März 2001) • In Abbildung B.37 ist einen Graukeil gezeigt, in dem alle Grauwerte von 0 bis 255 in aufsteigender Reihenfolge vorkommen (also f (x) = x im angegebenen Koordinatensystem, zur Verdeutlichung ist ein Ausschnitt vergrößert dargestellt). Wenden Sie auf den Graukeil 1. ein lineares Tiefpassfilter F1 , 314 APPENDIX B. FRAGENÜBERSICHT y 6 5 4 3 2 1 0 −1 −2 x −1 0 1 2 3 4 5 6 Figure B.28: drei Merkmalsvektoren im zweidimensionalen Raum Figure B.29: digitales Grauwertbild (Histogramm gesucht) 2. ein lineares Hochpassfilter F2 mit 3×3-Filterkernen Ihrer Wahl an und geben Sie Ihr Ergebnis in Form eines Bildausschnitts wie in Abbildung B.37 oder als Funktionen f1 (x) und f2 (x) an! Zeichnen Sie außerdem die von Ihnen verwendeten Filterkerne. Randpixel müssen nicht gesondert berücksichtigt werden. [#0214] (Frage II/12 11. Mai 2001, Frage II/11 9. November 2001) • In Abbildung B.38(a) ist ein digitales Grauwertbild gezeigt, in dem mittels normalisierter Kreuzkorrelation das Strukturelement aus Abbildung B.38(b) gesucht werden soll. Markieren Sie in Abbildung B.38(a) die Position, an der der Wert der normalisierten Kreuzkorrelation maximal ist! Die Aufgabe ist grafisch zu lösen, es sind keine Berechnungen erforderlich. [#0223] (Frage II/14 11. Mai 2001) • Wenden Sie die medial axis“ Transformation von Bloom auf das Objekt in Abbildung B.39 ” links an! Sie können das Ergebnis direkt in Abbildung B.39 rechts eintragen. [#0226] (Frage II/16 11. Mai 2001) B.2. GRUPPE 2 315 (a) Tiefpass (b) Hochpass Figure B.30: leere Filtermasken Formelement Figure B.31: morphologisches Öffnen • Gegeben seien eine 3 × 3-Transformationsmatrix 3 4 2 M = −4 3 1 0 0 1 sowie drei Punkte a = b = c = (2, 0)T , (0, 1)T , (0, 0)T im zweidimensionalen Raum. Die Matrix M beschreibt in homogenen Koordinaten eine konforme Transformation, wobei ein Punkt p gemäß p0 = Mp in einen Punkt p0 übergeführt wird. Die Punkte a, b und c bilden ein rechtwinkeliges Dreieck, d.h. die Strecken ac und bc stehen normal aufeinander. 1. Berechnen Sie a0 , b0 und c0 durch Anwendung der durch M beschriebenen Transformation auf die Punkte a, b und c! 2. Da M eine konforme Transformation beschreibt, müssen auch die Punkte a0 , b0 und c0 ein rechtwinkeliges Dreieck bilden. Zeigen Sie, dass dies hier tatsächlich der Fall ist! (Hinweis: es genügt zu zeigen, dass die Strecken a0 c0 und b0 c0 normal aufeinander stehen.) [#0229] 316 APPENDIX B. FRAGENÜBERSICHT grün gelb cyan rot weiß blau magenta 50 Figure B.32: eine Ebene im HSV-Farbmodell 256 Figure B.33: Graukeil (Frage II/17 11. Mai 2001) • Geben Sie je eine 3 × 3-Filtermaske zur Detektion 1. horizontaler 2. vertikaler Kanten in einem digitalen Rasterbild an! [#0247] (Frage II/12 26. Juni 2001, Frage II/11 15. März 2002) • In Abbildung B.40 ist einen Graukeil gezeigt, in dem alle Grauwerte von 0 bis 255 in aufsteigender Reihenfolge vorkommen (also f (x) = x im angegebenen Koordinatensystem, zur Verdeutlichung ist ein Ausschnitt vergrößert dargestellt). Wenden Sie auf den Graukeil die in Aufgabe B.2 gefragten Filterkerne an und geben Sie Ihr Ergebnis in Form eines Bildausschnitts wie in Abbildung B.40 oder als Funktionen f1 (x) und f2 (x) an! Randpixel müssen nicht gesondert berücksichtigt werden. [#0248] (Frage II/14 26. Juni 2001) • Wenden Sie den Hit-or-Miss-Operator auf das Binärbild in Abbildung B.41 links an. Verwenden Sie das angebene Strukturelement X (Zentrumspixel ist markiert) und definieren Sie ein geeignetes Fenster W ! Sie können das Ergebnis direkt in Abbildung B.41 rechts eintragen. [#0255] (Frage II/16 26. Juni 2001, Frage II/14 1. Februar 2002) • Gegeben seien eine Kugel mit Mittelpunkt mS , ein Punkt pS auf der Kugeloberfläche und eine Lichtquelle an der Position pL mit der Intensität IL . Die Intensität soll physikalisch korrekt mit dem Quadrat der Entfernung abnehmen. Die Oberfläche der Kugel ist durch das B.2. GRUPPE 2 317 9 9 8 8 6 7 6 6 7 8 9 8 7 2 3 1 6 8 7 8 3 2 0 1 8 7 8 2 3 1 1 2 7 6 7 1 0 2 3 1 7 6 8 2 2 1 2 0 Figure B.34: Roberts-Operator Lambert’sche Beleuchtungsmodell beschrieben, der diffuse Reflexionskoeffizient ist kd . Die Szene wird von einer synthetischen Kamera an der Position pC betrachtet. Berechnen Sie die dem Punkt pS zugeordnete Intensität IS unter Verwendung der Angaben aus Tabelle B.3! Hinweis: der Punkt pS ist von der Kameraposition pC aus sichtbar, diese Bedingung muss nicht überprüft werden. [#0257] Parameter Formelzeichen Wert Kugelmittelpunkt Oberflächenpunkt Position der Lichtquelle Intensität der Lichtquelle diffuser Reflexionskoeffizient Position der Kamera mS pS pL IL kd pC (−2, 1, −4)T (−4, 5, −8)T (2, 7, −11)T 343 1 (−e2 , 13.7603, −4π)T Table B.3: Geometrie und Beleuchtungsparameter der Szene (Frage II/13 26. Juni 2001) • In Abbildung B.42(a) ist eine diskret approximierte Linie eingezeichnet. Erzeugen Sie daraus auf zwei verschiedene Arten eine drei Pixel dicke“ Linie und beschreiben Sie die von Ihnen ” verwendeten Algorithmen! Sie können die Ergebnisse direkt in die Abbildungen B.42(b) und B.42(c) einzeichnen. [#0258] (Frage II/15 26. Juni 2001, Frage II/11 28. September 2001, Frage 10 20. November 2001) • Geben Sie eine 4 × 4-Matrix für homogene Koordinaten in drei Dimensionen an, die eine perspektivische Projektion mit dem Projektionszentrum p0 = (2, 3, −1)T beschreibt! Hinweis: das Projektionszentrum wird in homogenen Koordinaten auf den Punkt (0, 0, 0, 0)T abgebildet. [#0260] (Frage II/17 26. Juni 2001) • Gegeben seien eine Kugel S (durch Mittelpunkt M und Radius r), ein Punkt pS auf der Kugeloberfläche und ein Dreieck T (durch die drei Eckpunkte p1 , p2 und p3 ). Berechnen Sie unter Verwendung der Angaben aus Tabelle B.4 1. den Oberflächennormalvektor nS der Kugel im Punkt pS , 2. den Oberflächennormalvektor nT des Dreiecks! Eine Normierung der Normalvektoren auf Einheitslänge ist nicht erforderlich. (Frage II/13 28. September 2001) [#0262] 318 APPENDIX B. FRAGENÜBERSICHT Figure B.35: zweidimensionale Polygonrepräsentation • Zeichnen Sie in Abbildung B.43 die zweidimensionale Figur ein, die durch den dort angeführten Kettencode definiert ist. Beginnen Sie bei dem mit ד markierten Pixel. Um welche ” Art von Kettencode handelt es sich hier (bzgl. der verwendeten Nachbarschaftsbeziehungen)? [#0268] (Frage II/15 28. September 2001) • Ein Laserdrucker hat eine Auflösung von 600dpi. Wie viele Linienpaare pro Millimeter sind mit diesem Gerät einwandfrei darstellbar (es genügen die Formel und eine grobe Abschätzung)? [#0269] (Frage II/12 28. September 2001, Frage II/15 9. November 2001, Frage 8 20. November 2001) • Gegeben seien ein Punkt pO = (3, −2, −1)T in Objektkoordinaten sowie die Matrizen 4 0 0 −3 1 0 0 0 0 2 0 4 0 1 0 0 M= 0 0 3 6 , P = 0 0 0 1 , 0 0 0 1 0 0 1 0 wobei M die Modellierungs- und P die Projektionsmatrix beschreiben. Berechnen Sie 1. den Punkt pW = M · pO in Weltkoordinaten, 2. den Punkt pS = P · pW in Bildschirmkoordinaten, 3. die Matrix M0 = P · M! B.2. GRUPPE 2 319 z 8 7 6 5 4 3 2 1 0 x −1 −2 −1 0 1 2 3 4 5 6 7 Figure B.36: Objekt und Kamera im Weltkoordinatensystem 0 256 x 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 Figure B.37: Graukeil Hinweis zu 3: die Multplikation mit P entspricht hier lediglich einer Zeilenvertauschung. [#0272] (Frage II/12 19. Oktober 2001) • In Abbildung B.44 sind vier Punkte A, B, C und D eingezeichnet. Transformieren Sie diese Punkte nach der Vorschrift x0 y0 = 2x − 3y + xy + 4 = 4x + y − 2xy + 2 und zeichnen Sie Ihr Ergebnis (A0 , B 0 , C 0 und D0 ) direkt in Abbildung B.44 rechts ein! Um welche Art von Transformation handelt es sich hier? [#0277] (Frage II/11 19. Oktober 2001) • Abbildung B.45 zeigt ein digitales Rasterbild, das als Textur verwendet wird. Durch die große Entfernung von der virtuellen Kamera erscheint die Fläche im Verhältnis 1:3 verkleinert, wobei aus Gründen der Effizienz der einfache Sub-Sampling-Algorithmus für die Verkleinerung verwendet wird. Zeichnen Sie in Abbildung B.45 rechts das Bild ein, wie es am Ausgabegerät 320 APPENDIX B. FRAGENÜBERSICHT (a) (b) Figure B.38: Anwendung der normalisierten Kreuzkorrelation Figure B.39: Anwendung der medial axis Transformation erscheint, und markieren Sie links die verwendeten Pixel. Welchen Effekt können Sie hier beobachten, und warum tritt er auf? [#0284] (Frage II/13 9. November 2001) • Abbildung B.46 zeigt drei digitale Grauwertbilder und deren Histogramme. Geben Sie für jedes der Bilder B.46(a), B.46(c) und B.46(e) an, welches das dazugehörige Histogramm ist (B.46(b), B.46(d) oder B.46(f)), und begründen Sie Ihre jeweilige Antwort! [#0285] (Frage II/14 9. November 2001) • Zeichnen Sie in Abbildung B.47 jene Pixel ein, die benötigt werden, um im Halbtonverfahren die angegebenen Grauwerte 0 bis 9 darzustellen! Verwenden Sie dazu die bei der Veranschaulichung des Halbtonverfahrens übliche Konvention, dass on“-Pixel durch einen ” dunklen Kreis markiert werden. Achten Sie auf die Reihenfolge der Werte 0 bis 9! [#0289] (Frage II/11 1. Februar 2002) • Zeichnen Sie in Abbildung B.48 jene Pixel ein, die benötigt werden, um im Halbtonverfahren die angegebenen Grauwerte 0 bis 9 darzustellen! Verwenden Sie dazu die bei der Veranschaulichung des Halbtonverfahrens übliche Konvention, dass on“-Pixel durch einen ” dunklen Kreis markiert werden. Achten Sie auf die Reihenfolge der Werte 0 bis 9! [#0294] (Frage II/13 1. Februar 2002) • Wenden Sie auf das Binärbild in Abbildung B.49 links die morphologische Operation Schließen“ ” mit dem angegebenen Formelement an! Welcher für das morphologische Schließen typische Effekt tritt auch in diesem Beispiel auf? B.2. GRUPPE 2 321 0 256 x 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 Figure B.40: Graukeil X Figure B.41: Anwendung des Hit-or-Miss-Operators auf ein Binärbild Weiße Pixel gelten als logisch 0“, graue Pixel als logisch 1“. Sie können das Ergebnis ” ” rechts in Abbildung B.49 eintragen. [#0297] (Frage II/12 1. Februar 2002) • Wenden Sie den Hit-or-Miss-Operator auf das Binärbild in Abbildung B.50 links an. Verwenden Sie das angebene Strukturelement X (Zentrumspixel ist markiert) und definieren Sie ein geeignetes Fenster W ! Sie können das Ergebnis direkt in Abbildung B.50 rechts eintragen. [#0301] (Frage II/12 1. Februar 2002) • Wenden Sie auf das Binärbild in Abbildung B.51 links die morphologische Operation Schließen“ ” mit dem angegebenen Formelement an! Welcher für das morphologische Schließen typische Effekt tritt auch in diesem Beispiel auf? Weiße Pixel gelten als logisch 0“, graue Pixel als logisch 1“. Sie können das Ergebnis ” ” rechts in Abbildung B.51 eintragen. [#0303] (Frage II/14 1. Februar 2002) • Geben Sie einen Binary Space Partitioning Tree“ (BSP-Tree) mit möglichst wenig Knoten ” 322 APPENDIX B. FRAGENÜBERSICHT (a) dünne“ Linie ” (b) dicke“ Linie (Variante 1) ” (c) dicke“ Linie (Variante 2) ” Figure B.42: Erstellen dicker Linien Parameter Formelzeichen Wert Kugelmittelpunkt Kugelradius Punkt auf Kugeloberfläche MS r pS (−2, 1, −4)T 6 (−4, 5, −8)T Dreieckseckpunkt Dreieckseckpunkt Dreieckseckpunkt p1 p2 p3 (2, 1, 3)T (3, 5, 3)T (5, 2, 3)T Table B.4: Geometrie der Objekte für das Polygon aus Abbildung B.52 an und zeichnen Sie die von Ihnen verwendeten Trennebenen ein! [#0305] (Frage II/15 1. Februar 2002) • Gegeben seien das Farbbildnegativ in Abbildung B.53 sowie die durch die Kreise markierten Farbwerte A, B und C laut folgender Tabelle: Farbe A B C Farbwert (RGB) (0.6, 0.4, 0.3)T (0.3, 0.2, 0.1)T (0.5, 0.3, 0.1)T Berechnen Sie die Farbwerte A0 , B 0 und C 0 , die das entsprechende Positivbild an den gleichen markierten Stellen wie in Abbildung B.53 aufweist! [#0312] (Frage II/12 15. März 2002) • In Abbildung B.54 sollen eine überwachte Klassifikation ( supervised classification“) anhand ” gegebener Trainingsdaten durchgeführt und auf ebenfalls gegebene neue Daten angewandt werden. Der Merkmalsraum ( feature space“) ist eindimensional, d.h. es ist nur ein skalares ” Merkmal ( feature“) zu berücksichtigen. Die Werte dieses Merkmals sind in Abbildung ” B.54(a) für ein 3 × 3 Pixel großes digitales Grauwertbild eingetragen, Abbildung B.54(b) zeigt die dazugehörigen Zuordnungen zu den Klassen A und B. Die Klassifikation soll unter der Annahme einer Normalverteilung (Gauss’sche Wahrscheinlichkeitsdichte) der Daten erfolgen. Bestimmen Sie die Klassenzuordnung der Pixel in Abbildung B.54(c) (tragen Sie Ihr Ergebnis in Abbildung B.54(d) ein) und geben Sie ebenso Ihre Zwischenergebnisse an! Hinweis: die Standardabweichung σ beider Klassen ist gleich und muss nicht berechnet werden. [#0315] B.2. GRUPPE 2 323 Figure B.43: Definition eines zweidimensionalen Objekts durch die Kettencode-Sequenz 221000110077666434544345“ ” y 10 y 10 9 9 8 8 7 6 7 6 5 5 4 3 2 4 D 3 2 C 1 A B 0 0 1 2 3 4 5 6 7 8 9 10 x 1 0 0 1 2 3 4 5 6 7 8 9 10 x Figure B.44: Transformation von vier Punkten (Frage II/13 15. März 2002) 324 APPENDIX B. FRAGENÜBERSICHT Figure B.45: Sub-Sampling B.2. GRUPPE 2 325 (a) Vancouver (b) Histogramm 1 (c) Kluane (d) Histogramm 2 (e) Steiermark (f) Histogramm 3 Figure B.46: drei digitale Grauwertbilder und ihre Histogramme 326 APPENDIX B. FRAGENÜBERSICHT 0 1 2 3 4 5 6 7 8 9 2 1 0 Figure B.47: Halbtonverfahren 9 8 7 6 5 4 3 Figure B.48: Halbtonverfahren Formelement Figure B.49: morphologisches Schließen X Figure B.50: Anwendung des Hit-or-Miss-Operators auf ein Binärbild B.2. GRUPPE 2 327 Formelement Figure B.51: Halbtonverfahren 2 1 3 4 Figure B.52: Polygon für BSP-Darstellung A C B Figure B.53: Farbbildnegativ 2 1 5 A A A 9 8 12 A A B 3 8 8 6 14 B B B 7 11 (a) Trainingsdaten (b) Klassifikation (c) neue Daten Figure B.54: überwachte Klassifikation (d) Ergebnis 328 APPENDIX B. FRAGENÜBERSICHT A Figure B.55: Rechteck mit Störobjekten Figure B.56: Pixelanordnung B.3 Gruppe 3 • Abbildung B.55 zeigt ein rechteckiges Objekt und dazu einige kleinere Störobjekte. Erläutern Sie bitte ein Verfahren des morphologischen Filterns, welches die Störobjekte eliminiert. Verwenden Sie bitte dazu Formelausdrücke und zeigen Sie mit grafischen Skizzen den Verfahrensablauf. Stellen Sie auch das Ergebnisbild dar. [#0008] (Frage III/20 14. April 2000) • Gegeben sei die in Abbildung B.56 dargestellte Pixelanordnung. Beschreiben Sie grafisch, mittels Formel oder in Worten einen Algorithmus zur Bestimmung des Schwerpunktes dieser Pixelanordnung. [#0010] (Frage III/18 14. April 2000) • Gegeben sei Abbildung B.57 mit den angebenen linienhaften weißen Störungen. Welche Methode der Korrektur schlagen Sie vor, um diese Störungen zu entfernen? Ich bitte um B.3. GRUPPE 3 329 Figure B.57: Bild mit Störungen die Darstellung der Methode und die Begründung, warum diese Methode die Störungen entfernen wird. [#0018] (Frage III/19 14. April 2000) • Gegeben sei die Rasterdarstellung eines Objektes in Abbildung B.58, wobei das Objekt nur durch seine drei Eckpunkte A, B und C dargestellt ist. Die Helligkeit der Eckpunkte ist IA = 100, IB = 50 und IC = 0. Berechne die Beleuchtungswerte nach dem GouraudVerfahren in zumindest fünf der zur Gänze innerhalb des Dreieckes zu liegenden kommenden Pixeln. [#0035] (Frage III/21 14. April 2000) • Gegeben sei das Grauwertbild in Abbildung B.59. Bestimmen Sie das Histogramm dieses Bildes! Mit Hilfe des Histogramms soll ein Schwellwert gesucht werden, der geeignet ist, das Bild in Hintergrund (kleiner Wert, dunkel) und Vordergrund (großer Wert, hell) zu segmentieren. Geben Sie den Schwellwert an sowie das Ergebnis der Segmentierung in Form eines Binärbildes (mit 0 für den Hintergrund und 1 für den Vordergrund)! [#0064] (Frage III/16 26. Mai 2000, Frage 9 20. November 2001) • Die Transformationsmatrix M aus Abbildung B.60 ist aus einer Translation T und einer Skalierung S zusammengesetzt, also M = T · S (ein Punkt p wird gemäß q = M · p in den Punkt q übergeführt). Bestimmen Sie T, S und M−1 (die Inverse von M)! [#0065] (Frage III/17 26. Mai 2000, Frage III/16 10. November 2000, Frage III/19 28. September 2001) • In der Vorlesung wurden Tiefenwahrnehmungshilfen ( depth cues“) besprochen, die es dem ” menschlichen visuellen System gestatten, die bei der Projektion auf die Netzhaut verlorengegangene dritte Dimension einer betrachteten Szene zu rekonstruieren. Diese Aufgabe wird in der digitalen Bildverarbeitung von verschiedenen shape from X“-Verfahren gelöst. Welche ” depth cues“ stehen in unmittelbarem Zusammenhang mit einem entsprechenden shape ” ” from X“-Verfahren, und für welche Methoden der natürlichen bzw. künstlichen Tiefenabschätzung kann kein solcher Zusammenhang hergestellt werden? [#0071] (Frage III/18 26. Mai 2000) 330 APPENDIX B. FRAGENÜBERSICHT B A C Figure B.58: Rasterdarstellung eines Objekts 1 5 6 6 3 6 7 4 6 6 5 1 2 1 2 0 Figure B.59: Grauwertbild • In Abbildung B.61 ist ein digitales Rasterbild gezeigt, das durch eine überlagerte Störung in der Mitte heller ist als am Rand. Geben Sie ein Verfahren an, das diese Störung entfernt! [#0076] (Frage III/19 26. Mai 2000) • Abbildung B.62 zeigt ein eingescanntes Farbfilmnegativ. Welche Schritte sind notwendig, um daraus mittels digitaler Bildverarbeitung ein korrektes Positivbild zu erhalten? Berücksichtigen Sie dabei, dass die optische Dichte des Filmes auch an unbelichteten Stellen größer als Null ist. Geben Sie die mathematische Beziehung zwischen den Pixelwerten des Negativund des Positivbildes an! [#0077] (Frage III/20 26. Mai 2000) • Auf der derzeit laufenden steirischen Landesausstellung comm.gr2000az“ im Schloss Eggen” berg in Graz ist ein Roboter installiert, der einen ihm von Besuchern zugeworfenen Ball fangen soll. Um den Greifer des Roboters zur richtigen Zeit an der richtigen Stelle schließen zu können, muss die Position des Balles während des Fluges möglichst genau bestimmt werden. Zu diesem Zweck sind zwei Kameras installiert, die das Spielfeld beobachten, eine vereinfachte Skizze der Anordnung ist in Abbildung B.63 dargestellt. Bestimmen Sie nun die Genauigkeit in x-, y- und z-Richtung, mit der die in Abbildung B.63 markierte Position des Balles im Raum ermittelt werden kann! Nehmen Sie der Einfachkeit halber folgende Kameraparameter an: – Brennweite: 10 Millimeter – geometrische Auflösung des Sensorchips: 100 Pixel/Millimeter B.3. GRUPPE 3 331 1 0 M= 0 0 0 2.5 0 0 0 4 0 −3 2 0 0 1 Figure B.60: Transformationsmatrix Figure B.61: Digitales Rasterbild mit zum Rand hin abfallender Intensität Sie können auf die Anwendung von Methoden zur subpixelgenauen Bestimmung der Ballposition verzichten. Bei der Berechnung der Unsicherheit in x- und y-Richtung können Sie eine der beiden Kameras vernachlässigen, für die z-Richtung können Sie die Überlegungen zur Unschärfe der binokularen Tiefenwahrnehmung verwenden. [#0078] (Frage III/17 30. Juni 2000) • Ein Koordinatensystem K1 wird durch Rotation in ein anderes Koordinatensystem K2 übergeführt, sodass ein Punkt mit den Koordinaten p in K1 in den Punkt q = M p in K2 transformiert wird. In Tabelle B.5 sind vier Entsprechungspunkte zwischen den beiden Koordinatensystemen gegeben. Bestimmen Sie die 3 × 3-Matrix2 M ! Hinweis: Beachten Sie, dass (da es sich um eine Rotation handelt) ||a|| = ||b|| = ||c|| = 1 und weiters a · b = a · c = b · c = 0, wobei ·“ das Skalarprodukt bezeichnet. [#0081] ” Punkt in K1 T (0, 0, 0) a = (a1 , a2 , a3 )T b = (b1 , b2 , b3 )T c = (c1 , c2 , c3 )T Punkt in K2 (0, 0, 0)T (1, 0, 0)T (0, 1, 0)T (0, 0, 1)T Table B.5: Entsprechungspunkte zwischen den zwei Koordinatensystemen K1 und K2 (Frage III/16 30. Juni 2000) • Geben Sie für homogene Koordinaten eine 3 × 3-Matrix M mit möglichst vielen Freiheitsgraden an, die geeignet ist, die Punkte p eines starren Körpers (z.B. eines Holzblocks) gemäß q = M p zu transformieren (sog. rigid body transformation“)! ” 2 Homogene Koordinaten bringen hier keinen Vorteil, da keine Translation vorliegt. 332 APPENDIX B. FRAGENÜBERSICHT Figure B.62: Farbfilmnegativ Hinweis: In der Fragestellung sind einfache geometrische Zusammenhänge verschlüsselt“ ” enthalten. Wären sie hingegen explizit formuliert, wäre die Antwort eigentlich Material der Gruppe I“. [#0090] ” (Frage III/18 30. Juni 2000) • Dem digitalen Rasterbild in Abbildung B.64 ist eine regelmäßige Störung überlagert (kohärentes Rauschen). Beschreiben Sie ein Verfahren, das diese Störung entfernt! [#0091] (Frage III/19 30. Juni 2000) • Auf das in Abbildung B.65 links oben gezeigte Binärbild soll die morphologische Operation Erosion“ angewandt werden. Zeigen Sie, wie die Dualität zwischen Erosion und Dila” tion genutzt werden kann, um eine Erosion auf eine Dilation zurückzuführen. (In anderen Worten: statt der Erosion sollen andere morphologische Operationen eingesetzt werden, die in geeigneter Reihenfolge nacheinander ausgeführt das gleiche Ergebnis liefern wie eine Erosion.) Tragen Sie Ihr Ergebnis (und Ihre Zwischenergebnisse) in Abbildung B.65 ein und benennen Sie die mit den Zahlen 1, 2 und 3 gekennzeichneten Operationen! Das zu verwendende Formelement ist ebenfalls in Abbildung B.65 dargestellt. Hinweis: Beachten Sie, dass das gezeigte Binärbild nur einen kleinen Ausschnitt aus der Definitionsmenge Z2 zeigt! [#0096] (Frage III/20 30. Juni 2000, Frage III/17 10. November 2000, Frage III/17 14. Dezember 2001) • Die Dualität von Erosion und Dilation betreffend Komplementarität und Reflexion lässt sich durch die Gleichung (A B)c = Ac ⊕ B̂ formulieren. Warum ist in dieser Gleichung die Reflexion (B̂) von Bedeutung? [#0107] (Frage III/18 13. Oktober 2000) • Das in Abbildung B.66 gezeigte Foto ist kontrastarm und wirkt daher etwas flau“. ” 1. Geben Sie ein Verfahren an, das den Kontrast des Bildes verbessert. 2. Welche Möglichkeiten gibt es noch, die vom Menschen empfundene Qualität des Bildes zu verbessern? Wird durch diese Methoden auch der Informationsgehalt des Bildes vergrößert? Begründen Sie Ihre Antwort. [#0108] (Frage III/20 13. Oktober 2000, Frage III/19 10. November 2000) 333 Kamera 2 B.3. GRUPPE 3 Roboter 2m x Kamera 1 z 4m s d hn ba urf W aktuelle Ballposition 2m alle B es y z Figure B.63: Vereinfachter Aufbau des bällefangenden Roboters auf der Landesausstellung comm.gr2000az • Wie äußern sich für das menschliche Auge 1. eine zu geringe geometrische Auflösung 2. eine zu geringe Grauwerteauflösung eines digitalen Rasterbildes? [#0113] (Frage III/19 13. Oktober 2000, Frage III/20 10. November 2000, Frage III/20 28. September 2001) • Welche Aussagen kann man über die Summen der Maskenkomponenten eines ( vernünfti” gen“) Tief- bzw. Hochpassfilters treffen? Begründen Sie Ihre Antwort. [#0114] (Frage III/17 13. Oktober 2000, Frage III/18 10. November 2000) • Ein Farbwert CRGB = (R, G, B)T im RGB-Farbmodell wird in den entsprechenden Wert CYIQ = (Y, I, Q)T im YIQ-Farbmodell gemäß folgender Vorschrift umgerechnet: 0.299 0.587 0.114 CYIQ = 0.596 −0.275 −0.321 · CRGB 0.212 −0.528 0.311 Welcher biologische Sachverhalt wird durch die erste Zeile dieser Matrix ausgedrückt? (Hinweis: Überlegen Sie, wo das YIQ-Farbmodell eingesetzt wird und welche Bedeutung in diesem Zusammenhang die Y-Komponente hat.) [#0120] 334 APPENDIX B. FRAGENÜBERSICHT Figure B.64: Bild mit überlagertem kohärentem Rauschen (Frage III/19 14. Dezember 2001) • Um den Effekt des morphologischen Öffnens (A ◦ B) zu verstärken, kann man3 die zugrundeliegenden Operationen (Erosion und Dilation) wiederholt ausführen. Welches der folgenden beiden Verfahren führt zum gewünschten Ergebnis: 1. Es wird zuerst die Erosion n-mal ausgeführt und anschließend n-mal die Dilation, also (((A B) . . . B) ⊕B) . . . ⊕ B | {z }| {z } n−mal n−mal ⊕ 2. Es wird die Erosion ausgeführt und anschließend die Dilation, und der Vorgang wird n-mal wiederholt, also (((A B) ⊕ B) . . . B) ⊕ B | {z } n−mal abwechselnd /⊕ Begründen Sie Ihre Antwort und erklären Sie, warum das andere Verfahren versagt! [#0126] (Frage III/16 15. Dezember 2000, Frage III/20 11. Mai 2001, Frage III/16 14. Dezember 2001, Frage III/17 15. März 2002) • In Aufgabe B.1 wurde nach geometrischen Oberflächeneigenschaften gefragt, die sich nicht zur Visualisierung mittels Textur eignen. Nehmen Sie an, man würde für die Darstellung solcher Eigenschaften eine Textur unsachgemäß einsetzen. Welche Artefakte sind für solche Fälle typisch? [#0129] (Frage III/17 15. Dezember 2000) • In Abbildung B.67 sehen Sie die aus der Vorlesung bekannte Skizze zur Auswirkung des morphologischen Öffnens auf ein Objekt (Abbildung B.67(a) wird durch Öffnen mit dem gezeigten Strukturelement in Abbildung B.67(b) übergeführt). Wie kommen die Rundungen in Abbildung B.67(b) zustande, und wie könnte man deren Auftreten verhindern? [#0149] 3 abgesehen von einer Vergrößerung des Maskenelements B B.3. GRUPPE 3 335 1 2 3 Formelement Figure B.65: Alternative Berechnung der morphologischen Erosion (Frage III/18 15. Dezember 2000, Frage III/23 30. März 2001) • In Abbildung B.68 sind ein Geradenstück g zwischen den Punkten A und B sowie zwei weitere Punkte C und D gezeigt. Berechnen Sie den Abstand (kürzeste Euklidische Distanz) zwischen g und den Punkten C bzw. D. [#0150] (Frage III/19 15. Dezember 2000) • In Abbildung B.69 sind ein digitales Rasterbild sowie die Resultate der Anwendung von drei verschiedenen Filteroperationen gezeigt. Finden Sie die Operationen, die auf Abbildung B.69(a) angewandt zu den Abbildungen B.69(b), B.69(c) bzw. B.69(d) geführt haben, und beschreiben Sie jene Eigenschaften der Ergebnisbilder, an denen Sie die Filter erkannt haben. [#0151] (Frage III/20 15. Dezember 2000, Frage III/19 19. Oktober 2001) • Es besteht eine Analogie zwischen der Anwendung eines Filters und der Rekonstruktion einer diskretisierten Bildfunktion. Erklären Sie diese Behauptung! [#0158] (Frage 4 16. Jänner 2001, Frage III/18 14. Dezember 2001) • In der Vorlesung wurden zwei Verfahren zur Ermittlung der acht Parameter einer bilinearen Transformation in zwei Dimensionen erläutert: 1. exakte Ermittlung des Parametervektors u, wenn genau vier Input/Output-Punktpaare gegeben sind 2. approximierte Ermittlung des Parametervektors u, wenn mehr als vier Input/OutputPunktpaare gegeben sind ( Least squares method“) ” Die Methode der kleinsten Quadrate kann jedoch auch dann angewandt werden, wenn genau vier Input/Output-Punktpaare gegeben sind. Zeigen Sie, dass man in diesem Fall das gleiche Ergebnis erhält wie beim ersten Verfahren. Welche geometrische Bedeutung hat diese Feststellung? 336 APPENDIX B. FRAGENÜBERSICHT Figure B.66: Foto mit geringem Kontrast (a) (b) Strukturelement Figure B.67: Morphologisches Öffnen Hinweis: Bedenken Sie, warum die Methode der kleinsten Quadrate diesen Namen hat. [#0163] (Frage III/23 2. Februar 2001) • In Abbildung B.70 ist ein Zylinder mit einer koaxialen Bohrung gezeigt. Geben Sie zwei verschiedene Möglichkeiten an, dieses Objekt mit Hilfe einer Sweep-Repräsentation zu beschreiben! [#0165] (Frage III/19 2. Februar 2001, Frage III/18 19. Oktober 2001) • In Aufgabe B.1 wurde nach einer Bildrepräsentation gefragt, bei der ein Bild wiederholt gespeichert wird, wobei die Seitenlänge jedes Bildes genau halb so groß ist wie die Seitenlänge des vorhergehenden Bildes. Leiten Sie eine möglichst gute obere Schranke für den gesamten Speicherbedarf einer solchen Repräsentation her, wobei – das erste (größte) Bild aus N × N Pixeln besteht, – alle Bilder als Grauwertbilder mit 8 Bit pro Pixel betrachtet werden, – eine mögliche Komprimierung nicht berücksichtigt werden soll! Hinweis: Benutzen Sie die Gleichung P∞ i=0 qi = 1 1−q für q ∈ R, 0 < q < 1. (Frage III/18 2. Februar 2001, Frage III/20 1. Februar 2002) [#0171] B.3. GRUPPE 3 337 9 B 8 7 g 6 5 4 C 3 2 A D 1 3 4 5 6 7 8 9 10 11 12 13 Figure B.68: Abstandsberechnung • Gegeben seien eine Ebene ε und ein beliebiger (zusammenhängender) Polyeder P im dreidimensionalen Raum. Wie kann man einfach feststellen, ob die Ebene den Polyeder schneidet (also P ∩ ε 6= {})? [#0172] (Frage III/21 2. Februar 2001) • Es sei p(x), x ∈ R2 die Wahrscheinlichkeitsdichtefunktion gemäß Gaussscher Normalverteilung, deren Parameter aufgrund der drei Merkmalsvektoren p1 , p2 und p3 aus Aufgabe B.2 geschätzt wurden. Weiters seien zwei Punkte x1 = (0, 3)T und x2 = (3, 6)T im Merkmalsraum gegeben. Welche der folgenden beiden Aussagen ist richtig (begründen Sie Ihre Antwort): 1. p(x1 ) < p(x2 ) 2. p(x1 ) > p(x2 ) Hinweis: Zeichnen Sie die beiden Punkte x1 und x2 in Abbildung B.28 ein und überlegen Sie sich, in welche Richtung die Eigenvektoren der Kovarianzmatrix C aus Aufgabe B.2 weisen. [#0174] (Frage III/22 2. Februar 2001) • Das digitalen Rasterbild aus Abbildung B.71 soll segmentiert werden, wobei die beiden Gebäude den Vordergrund und der Himmel den Hintergrund bilden. Da sich die Histogramme von Vorder- und Hintergrund stark überlappen, kann eine einfache Grauwertsegmentierung hier nicht erfolgreich sein. Welche anderen Bildeigenschaften kann man verwenden, um dennoch Vorder- und Hintergrund in Abbildung B.71 unterscheiden zu können? [#0181] (Frage III/20 2. Februar 2001, Frage III/20 14. Dezember 2001) • In der Vorlesung wurde darauf hingewiesen, dass die Matrixmultiplikation im Allgemeinen nicht kommutativ ist, d.h. für zwei Transformationsmatrizen M1 und M2 gilt M1 ·M2 6= M2 · M1 . Betrachtet man hingegen im zweidimensionalen Fall zwei 2 × 2-Rotationsmatrizen R1 und R2 , so gilt sehr wohl R1 ·R2 = R2 ·R1 . Geben Sie eine geometrische oder mathematische Begründung für diesen Sachverhalt an! Hinweis: Beachten Sie, dass das Rotationszentrum im Koordinatenursprung liegt! [#0192] (Frage III/18 30. März 2001, Frage III/16 9. November 2001) • Nehmen Sie an, Sie müssten auf ein Binärbild die morhpologischen Operationen Erosion“ ” bzw. Dilation“ anwenden, haben aber nur ein herkömmliches Bildbearbeitungspaket zur ” 338 APPENDIX B. FRAGENÜBERSICHT (a) Originalbild (b) Filter 1 (c) Filter 2 (d) Filter 3 Figure B.69: verschiedene Filteroperationen Verfügung, das diese Operationen nicht direkt unterstützt. Zeigen Sie, wie die Erosion bzw. Dilation durch eine Faltung mit anschließender Schwellwertbildung umschrieben werden kann! Hinweis: die gesuchte Faltungsoperation ist am ehesten mit einem Tiefpassfilter zu vergleichen. [#0197] (Frage III/19 30. März 2001) • Gegeben sei ein zweidimensionales Objekt, dessen Schwerpunkt im Koordinatenursprung liegt. Es sollen nun gleichzeitig“ eine Translation T und eine Skalierung S angewandt ” werden, wobei 1 0 tx s 0 0 T = 0 1 ty , S = 0 s 0 . 0 0 1 0 0 1 Nach der Tranformation soll das Objekt gemäß S vergrößert erscheinen, und der Schwerpunkt soll gemäß T verschoben worden sein. Gesucht ist nun eine Matrix M, die einen Punkt p des Objekts gemäß obiger Vorschrift in einen Punkt p0 = M · p des transformierten Objekts überführt. Welche ist die richtige Lösung: B.3. GRUPPE 3 339 Figure B.70: Zylinder mit koaxialer Bohrung 1. M = T · S 2. M = S · T Begründen Sie Ihre Antwort und geben Sie M an! [#0198] (Frage III/22 30. März 2001) • In Abbildung B.72 sehen Sie ein perspektivisch verzerrtes schachbrettartiges Muster. Erklären Sie, wie die Artefakte am oberen Bildrand zustandekommen, und beschreiben Sie eine Möglichkeit, deren Auftreten zu verhindern! [#0205] (Frage III/21 30. März 2001, Frage III/20 19. Oktober 2001) • Warum ist die Summe der Maskenelemente bei einem reinen Hochpassfilter immer gleich null und bei einem reinen Tiefpassfilter immer gleich eins? [#0212] (Frage III/20 30. März 2001) • In Aufgabe B.1 wurde nach den Begriffen Phong-shading“ und Phong-illumination“ ” ” gefragt. Beschreiben Sie eine Situation, in der beide Konzepte sinnvoll zum Einsatz kommen! [#0218] (Frage III/19 11. Mai 2001) • Wendet man in einem digitalen (RGB-)Farbbild auf jeden der drei Farbkanäle einen MedianFilter an, erhält man ein Ergebnis, das vom visuellen Eindruck ähnlich einem Mediangefilterten Grauwertbild ist. Welche Eigenschaft des Median-Filters geht bei einer solchen Anwendung auf Farbbilder jedoch verloren? Begründen Sie Ihre Antwort! [#0221] (Frage III/18 26. Juni 2001) • Wenden Sie wie bei Frage B.2 ein 3 × 3-Medianfilter F3 auf den Graukeil in Abbildung B.37 an und begründen Sie Ihre Antwort! [#0222] (Frage III/18 11. Mai 2001) • 1. Kommentieren Sie die Wirkung des hohen Rauschanteils von Abbildung B.38(a) (aus Aufgabe B.2) auf die normalisierte Kreuzkorrelation! 340 APPENDIX B. FRAGENÜBERSICHT Figure B.71: Segmentierung eines Grauwertbildes Figure B.72: Artefakte bei einem schachbrettartigen Muster 2. Welches Ergebnis würde man bei Anwendung der normalisierten Kreuzkorrelation mit dem selben Strukturelement (Abbildung B.38(b)) auf das rotierte Bild in Abbildung B.73 erhalten? Begründen Sie Ihre Antwort! [#0224] (Frage III/21 11. Mai 2001) • Welche Farbe liegt in der Mitte“, wenn man im RGB-Farbraum zwischen den Farben gelb ” und blau linear interpoliert? Welcher Farbraum wäre für eine solche Interpolation besser geeignet, und welche Farbe läge in diesem Farbraum zwischen gelb und blau? [#0227] (Frage III/23 11. Mai 2001) • Abbildung B.74(a) zeigt das Schloss in Budmerice (Slowakei), in dem alljährlich ein Studentenseminar4 und die Spring Conference on Computer Graphics stattfinden. Durch einen 4 Für interessierte Studenten aus der Vertiefungsrichtung Computergrafik besteht die Möglichkeit, kostenlos an diesem Seminar teilzunehmen und dort das Seminar/Projekt oder die Diplomarbeit zu präsentieren. B.3. GRUPPE 3 341 Figure B.73: Anwendung der normalisierten Kreuzkorrelation auf ein gedrehtes Bild automatischen Prozess wurde daraus Abbildung B.74(b) erzeugt, wobei einige Details (z.B. die Wolken am Himmel) deutlich verstärkt wurden. Nennen Sie eine Operation, die hier zur Anwendung gekommen sein könnte, und kommentieren Sie deren Arbeitsweise! [#0228] (a) Originalbild (b) verbesserte Version Figure B.74: automatische Kontrastverbesserung (Frage III/22 11. Mai 2001) • In Frage B.1 wurde festgestellt, dass die Abbildung eines dreidimensionalen Objekts auf die zweidimensionale Bildfläche durch eine Kette von Transformationen beschrieben werden kann. Erläutern Sie mathematisch, wie dieser Vorgang durch Verwendung des Assoziativgesetzes für die Matrixmultiplikation optimiert werden kann! [#0237] (Frage III/19 26. Juni 2001) • In der Vorlesung wurden die Operationen Schwellwert“ und Median“, anzuwenden auf ” ” digitale Rasterbilder, besprochen. Welcher Zusammenhang besteht zwischen diesen beiden Operationen im Kontext der Filterung? [#0244] 342 APPENDIX B. FRAGENÜBERSICHT 1 1 1 1 3 7 7 7 7 1 1 1 1 3 7 7 7 7 1 1 1 1 3 7 7 7 7 1 1 1 1 3 7 7 7 7 Figure B.75: unscharfe Kante in einem digitalen Grauwertbild (Frage III/20 26. Juni 2001) • Um einem Punkt p auf der Oberfläche eines dreidimensionalen Objekts die korrekte Helligkeit zuweisen zu können, benötigen alle realistischen Beleuchtungsmodelle den Oberflächennormalvektor n an diesem Punkt p. Wird nun das Objekt einer geometrischen Transformation unterzogen, sodass der Punkt p in den Punkt p0 = Mp übergeführt wird5 , ändert sich auch der Normalvektor, und zwar gemäß n0 = (M−1 )T n. Geben Sie eine mathematische Begründung für diese Behauptung! Hinweis: die durch p und n definierten Tangentialebenen vor bzw. nach der Transformation sind in Matrixschreibweise durch die Gleichungen nT x = nT p bzw. n0T x0 = n0T p0 gegeben. [#0250] (Frage III/21 26. Juni 2001) • In Abbildung B.75 sehen Sie einen vergößerten Ausschnitt aus einem digitalen Grauwertbild, der eine unscharfe Kante darstellt. Beschreiben Sie, wie diese Kante aussieht, wenn 1. ein lineares Tiefpassfilter 2. ein Medianfilter mit Maskengröße 3 × 3 mehrfach hintereinander auf das Bild angewendet wird. Begründen Sie Ihre Antwort! [#0251] (Frage III/22 26. Juni 2001, Frage III/19 9. November 2001, Frage III/16 1. Februar 2002) • Im Vierfarbdruck sei ein Farbwert durch 70% cyan, 20% magenta, 50% gelb und 30% schwarz gegeben. Rechnen Sie den Farbwert in das RGB-Farbmodell um und beschreiben Sie den Farbton in Worten! [#0252] (Frage III/23 26. Juni 2001) • Im Vierfarbdruck sei ein Farbwert durch 70% cyan, 0% magenta, 50% gelb und 30% schwarz gegeben. Rechnen Sie den Farbwert in das RGB-Farbmodell um und beschreiben Sie den Farbton in Worten! [#0261] (Frage III/20 9. November 2001) • Skizzieren Sie das Histogramm eines 1. dunklen, 2. hellen, 3. kontrastarmen, 4. kontrastreichen 5 Dieses konkrete Beispiel ist in kartesischen Koordinaten leichter zu lösen als in homogenen Koordinaten. Wir betrachten daher nur 3 × 3-Matrizen (ohne Translationsanteil). B.3. GRUPPE 3 343 monochromen digitalen Rasterbildes! [#0263] (Frage III/16 28. September 2001) • Bei vielen Algorithmen in der Computergrafik ist eine Unterscheidung zwischen der Vorder” und Rückseite“ eines Dreiecks notwendig (z.B. BSP-Baum, back face culling etc.). Wie kann der Oberflächennormalvektor eines Dreiecks genutzt werden, um diese Unterscheidung mathematisch zu formulieren (d.h. mit welcher Methode kann man für einen gegebenen Punkt p feststellen, auf welcher Seite eines ebenfalls gegebenen Dreiecks T er sich befindet)? Geben Sie außerdem an, ob der Vektor nT aus Aufgabe 2 unter dieser Definition in den der Vorder- oder Rückseite des Dreiecks zugewandten Halbraum weist. Begründen Sie Ihre Antwort! [#0264] (Frage III/17 28. September 2001) • Erläutern Sie, wie ein monochromes digitales Rasterbild, das ein Schwarzweißfilm-Negativ repräsentiert, durch Manipulation seines Histogramms in das entsprechende Positivbild umgewandelt werden kann! [#0267] (Frage III/18 28. September 2001) • In Abbildung B.76 sind die Histogramme von zwei verschiedenen digitalen Grauwertbildern A und B gezeigt. Nehmen Sie an, es würde nun auf beide Bilder die Operation His” togrammäqualisierung“ angewandt werden, sodass die neuen Bilder A0 bzw. B 0 daraus entstehen. 1. Skizzieren Sie die Histogramme von A0 und B 0 . 2. Kommentieren Sie die Auswirkung der Histogrammäqualisierung bei den Bildern A und B bzgl. Helligkeit und Kontrast! Begründen Sie Ihre Antworten! (a) Histogramm von Bild A [#0270] (b) Histogramm von Bild B Figure B.76: Histogramme von zwei verschiedenen Bildern (Frage III/17 19. Oktober 2001) • Bei der perspektivischen Transformation werden entfernte Objekte zwar verkleinert abgebildet, Geraden bleiben jedoch auch in der Projektion als Geraden erhalten. Geben Sie eine mathematische Begründung dieser Eigenschaft anhand der Projektionsmatrix 1 0 0 0 0 1 0 0 M= 0 0 0 1 , 0 0 1 0 die einen Punkt p gemäß p0 = Mp in den Punkt p0 überführt! Hinweis: die x- und z-Koordinate einer Geraden stehen über die Gleichung x = kz + d 344 APPENDIX B. FRAGENÜBERSICHT zueinander in Beziehung (Sonderfälle können vernachlässigt werden). Zeigen Sie, dass nach der Transformation x0 = k 0 z 0 + d0 gilt, und verfahren Sie analog für y. [#0271] (Frage III/16 19. Oktober 2001) • In Abbildung B.77 ist ein Torus mit strukturierter Oberfläche gezeigt, wobei sich die Lichtquelle einmal links (Abbildung B.77(a)) und einmal rechts (Abbildung B.77(b)) vom Objekt befindet. Zur Verdeutlichung sind in den Abbildungen B.77(c) und B.77(d) vergrößerte Ausschnitte dargestellt. Welche Technik wurde zur Visualisierung der Oberflächenstruktur eingesetzt, und was sind die typischen Eigenschaften, anhand derer man das Verfahren hier erkennen kann? [#0282] (a) Beleuchtung von links (b) Beleuchtung von rechts (c) Detail aus Abbildung B.77(a) (d) Detail aus Abbildung B.77(b) Figure B.77: Torus mit Oberflächenstruktur (Frage III/18 9. November 2001) B.3. GRUPPE 3 345 • Die morphologische Dilation A ⊕ B kann als A⊕B = [ Bx x∈A geschrieben werden, also als Mengenvereinigung des an jedes Pixel x ∈ A verschobenen Maskenelements B. Zeigen Sie unter Verwendung dieser Definition die Kommutativität der Dilation, also A ⊕ B = B ⊕ A! Hinweis: Schreiben Sie A ⊕ B = A ⊕ (B ⊕ E), wobei E das 1 × 1 Pixel große Einheits” maskenelement“ ist, das das Objekt bei der Dilation unverändert lässt. [#0283] (Frage III/17 9. November 2001) • Welche der folgenden Transformationen sind in homogenen Koordinaten durch eine Matrixmultiplikation (x0 = M · x) darstellbar? Begründen Sie Ihre Antwort! – Translation – perspektivische Projektion – Rotation – bilineare Transformation – Scherung – Skalierung – bikubische Transformation [#0290] (Frage III/18 15. März 2002) • Gegeben sei die Matrix 2 −2 3 M = 2 2 −4 , 0 0 1 mit deren Hilfe ein Punkt p im zweidimensionalen Raum in homogenen Koordinaten in einen Punkt p̃0 = M · p̃ übergeführt wird. Diese Operation lässt sich in kartesischen Koordinaten alternativ als p0 = s · R(ϕ) · p + t anschreiben, wobei s der Skalierungsfaktor, R(ϕ) die Rotationsmatrix (Drehwinkel ϕ) und t der Translationsvektor sind. Ermitteln Sie s, ϕ und t! [#0292] (Frage III/19 1. Februar 2002) • Das Auge des kanadischen Bergschafes in Abbildung B.78(a) ist in den Abbildungen B.78(b) bis B.78(d) vergößert dargestellt6 . Zur Interpolation wurden das nearest neighbor Verfahren, bilineare und bikubische Interpolation verwendet. Ordnen Sie diese Interpolationsverfahren den drei Bildern B.78(b) bis B.78(d) zu und begründen Sie Ihre Antwort! [#0293] (Frage III/16 1. Februar 2002) 6 Der Ausschnitt wurde zur Verdeutlichung der Ergebnisse einer Kontraststreckung unterzogen. 346 APPENDIX B. FRAGENÜBERSICHT • Nehmen Sie an, Sie seien Manager der Firma Rasen&Mäher und sollen für eine Werbekampagne Angebote von Druckereien für ein einfärbiges grünes Plakat einholen. Die Druckerei (1) 1 bietet das Plakat in der Farbe CCMYK an, die Druckerei 2 legt ein Angebot für ein Plakat (2) der Farbe CCMYK , wobei (1) = (0.6, 0.1, 0.7, 0.0)T , (2) = CCMYK CCMYK (0.2, 0.0, 0.3, 0.3)T . Welcher Druckerei würden Sie den Auftrag erteilen, wenn 1. möglichst geringe Herstellungskosten 2. ein möglichst intensiver Farbton das Auswahlkriterium ist? Begründen Sie Ihre Antwort! [#0295] (Frage III/19 1. Februar 2002) • Nehmen Sie an, Sie seien Manager der Firma Rasen&Mäher und sollen für eine Werbekampagne Angebote von Druckereien für ein einfärbiges grünes Plakat einholen. Die Druckerei (1) 1 bietet das Plakat in der Farbe CCMYK an, die Druckerei 2 legt ein Angebot für ein Plakat (2) der Farbe CCMYK , wobei (1) = (0.5, 0.0, 0.6, 0.1)T , (2) = (0.5, 0.3, 0.6, 0.0)T . CCMYK CCMYK Welcher Druckerei würden Sie den Auftrag erteilen, wenn 1. möglichst geringe Herstellungskosten 2. ein möglichst intensiver Farbton das Auswahlkriterium ist? Begründen Sie Ihre Antwort! [#0300] (Frage III/17 1. Februar 2002) • Das Auge des kanadischen Bergschafes in Abbildung B.79(a) ist in den Abbildungen B.79(b) bis B.79(d) vergößert dargestellt7 . Zur Interpolation wurden das nearest neighbor Verfahren, bilineare und bikubische Interpolation verwendet. Ordnen Sie diese Interpolationsverfahren den drei Bildern B.79(b) bis B.79(d) zu und begründen Sie Ihre Antwort! [#0304] (Frage III/18 1. Februar 2002) • Der in Abbildung B.80 gezeigte BSP-Baum beschreibt ein zweidimensionales Polygon. Die Trennebenen (bzw. -geraden, da wir den zweidimensionalen Fall betrachten) in jedem Knoten sind durch Gleichungen der Form ax+by = c gegeben, wobei die Außenseite jeweils durch die Ungleichung ax + by > c und die Innenseite durch ax + by < c charakterisiert sind. Weiters führen (wie in Abbildung B.80 gezeigt) die Außen“-Pfade nach links und die Innen“-Pfade ” ” nach rechts. Zeichnen Sie in einem geeignet beschrifteten Koordinatensystem das Polygon, das durch diesen BSP-Baum beschrieben wird, und kennzeichnen Sie, welche Kante zu welcher Gleichung gehört! [#0307] (Frage III/20 15. März 2002) 7 Der Ausschnitt wurde zur Verdeutlichung der Ergebnisse einer Kontraststreckung unterzogen. B.3. GRUPPE 3 347 • Erklären Sie die Begriffe Grenzfrequenz“ (cutoff frequency) und ideales vs. nicht ideales ” Filter im Zusammenhang mit digitalen Rasterbildern! In welchem Zusammenhang stehen diese Konzepte mit dem Aussehen des Ausgabebildes eines Filters? [#0309] (Frage III/19 15. März 2002) • In Abbildung B.81 wurde der bekannte Stanford-Bunny mit drei verschiedenen Beleuchtungsmodellen dargestellt. Um welche Beleuchtungsmodelle handelt es sich in den Abbildungen B.81(a), B.81(b) und B.81(c)? Anhand welcher Eigenschaften der Bilder haben Sie die gesuchten Beleuchtungsmodelle erkannt? [#0310] (Frage III/16 15. März 2002) 348 APPENDIX B. FRAGENÜBERSICHT (a) Originalbild (b) Verfahren 1 (c) Verfahren 2 (d) Verfahren 3 Figure B.78: Vergrößerung eines Bildausschnittes unter Verwendung verschiedener Interpolationsverfahren B.3. GRUPPE 3 349 (a) Originalbild (b) Verfahren 1 (c) Verfahren 2 (d) Verfahren 3 Figure B.79: Vergrößerung eines Bildausschnittes unter Verwendung verschiedener Interpolationsverfahren 350 APPENDIX B. FRAGENÜBERSICHT out out out in Figure B.80: BSP-Baum (a) Modell 1 (b) Modell 2 (c) Modell 3 Figure B.81: Darstellung eines 3D-Modells unter Anwendung verschiedener Beleuchtungsmodelle Index xy-stitching, 50 z-Buffer, 209 Weber-Ratio, 87 8-Code, 190 Casteljau-Algorithmus, 181, 307 cavalier, 171 chain code, 312, 317 chain-code, 189 chromaticity, 91 classification supervised, 291 unsupervised, 291 Clipping, 167 clipping, 165 CMY-Farbmodell, 95, 302 CMYK-Farbmodell, 93, 95, 289, 302 Cohen-Sutherland, 167, 300, 301 color model, 107, 292 Computer Graphics/Visualization, 57 computer-aided tomographic, 207 Computergrafik und Bildanalyse, 266, 284 Computertomografie, 53, 286 cone-tree, 263 cones, 45 control points, 180, 289 convolution, 130 cornea, 45 CSG, 199, 296 cut-of-frequency, 134 cutoff-Frequenz, 135, 287 absolute transformation, 232 Abtastung, 34, 117 Active Vision, 265 active vision, 265 Affine matching, 285 anaglyphs, 230 anterior chamber, 45 Anti-Blur filter, 259 Approximation, 179, 287, 289 approximation, 155 Auflösung, geometrische, 34, 117, 283, 284 Auflösung, radiometrische, 34, 284 Augenabstand, 229 Augmented Reality, 256, 285 augmented reality, 56 Augmented Relity, 57, 256, 287 back-face culling, 208 basis matrix, 177 Basisfunktion, 177 Bezier-Kurve, 181, 307 Bezier-Kurven, 179 bi-directional reflectivity function, 222 Bilderkennung, 35 Bildmodell, 34, 283 bilineare Transformation, 164, 329 binokulares Sehen, 231, 297 blending functions, 177 Blending Funktionen, 177 blind spot, 45 Boundary-Representation, 196, 298 bounding boxes, 208 box function, 134 Bresenham-Algorithmus, 65, 283, 297, 301 BSP-Tree, 198, 300, 315 Bump-Mapping, 148, 285 bump-mapping, 148 data garmets, 56, 292 data-garments, 56 DDA-Algorithmus, 65, 283 density, 88 density slicing, 105 depth cues, 207, 323 descriptive geometry, 171 direct capture system, 52 Diskretisierung, 34 distance between pixels, 38 dither matrix, 88 dots, 107 dynamic range, 88 dynamischer Bereich, 88, 115, 284 cabinet, 171 calibration, 256 edge-image, 132 Elektronenstrahlschirm, 35 351 352 Entzerrung, 287 Erosion, 75, 326 exterior orientation, 48, 56 Füllen von Polygonen, 65, 289 Farbfilmnegativ, 107, 324 Farbmodell, CIE, 92, 93, 283, 284 Farbmodell, CMY, 95, 286 Farbmodell, CMYK, 95, 286 Farbmodell, RGB, 93, 95, 284 feature, 290 feature space, 290 feature vector, 46 Fenster, 40 fiducial marks, 173 Filter, 40 filter high pass Butterworth, 136, 292 ideal, 136, 292 filter mask, 127 Fourier-Transformation, 289, 326 fovea, 45 Freiheitsgrad, 168, 325 gamma, 49 Gauss-filter, 129, 286 Geometrievektor, 177 geometry vector, 177 Gouraud-shading, 219, 287, 323 gradation curve, 119 Gradientenbild, 134, 301 Grafik-Pipeline, 265, 285 Grauwertzuweisung, 287 gross fog, 49 Halbraumcodes, 167 half tone, 314 half-space codes, 166 halo, 212 head-mounted displays, 56 hierarchical matching, 234 histogram, 337 equalization, 50, 105, 117, 119 spreading, 119 Histogramm, 120, 323 Hit-or-miss Operator, 80, 289 Hochpassfilter, 132, 324 homogene Koordinaten, 157, 168, 285, 299, 300, 325 homologue points, 232 HSV-Farbmodell, 97, 305 hue, 96 Human-Computer-Interfaces, 263 INDEX hyper-spectral, 46 ideal filter, 134 illuminate, 51 image black & white, 87 color, 87 false color, 90 half tone, 88 image flickering, 230 Image Processing/Computer Vision, 57 image quality, 115 immersive visualization, 263 information slices, 263 inner orientation, 232 intensity slicing, 105 Interpolation, 179, 287, 289 interpolation, 155, 176 Interpolation, bilineare, 252, 297 Kante, 38, 286 Kantendetektion, 134, 306 Kell-factor, 117 Kettenkodierung, 190 Klassifikation, 240, 244, 287, 304 Klassifizierung, 242 Koeffizientenmatrix, 162 Koordinatentransformation, 325 Korrelation, normalisiert, 234, 303 leaf, 192 Least Squares, 180 Least Squares Method, 164 Least squares method, 164, 329 Level of Detail, 199 light, 90 line pairs per millimeter, 50 Linie, 38 Linienpaar, 117 listening mode, 53 logische Verknüpfung, 40 luminance, 90 luminosity, 117 Man-Machine Interaction, 263 Maske, 40 masked negative, 106 median filter, 128 Median-Filter, 129, 299 Medianfilter, 129, 285, 323 Mehrfachbilder, 46, 287 Merkmalsraum, 242 mexican hat, 131 MIP-maps, 249 INDEX mirror stereoscope, 230 Moiree effect, 107, 292 moments, 145 morphological closing, 314, 315 erosion, 75, 283 filtering, 79, 291, 322 opening, 77, 78, 284, 305, 328 morphology, 75, 82, 326 mosaicing, 249 motion blur, 259, 291 Motion Picture Expert Group, 274 multi illumination, 56 multi-images, 46 multi-position, 46 multi-sensor, 46 multi-spectral, 46 multi-temporal, 46 multiple path, 50 Multispektralbild, 32 Multispektrales Abtastsystem, 52 Nachbarschaft, 38, 283 nearest neighbor, 165, 251 negative color, 91 nicht-perspektive Kamera, 51, 53, 286 node file, 251 nodes, 251 normal equation matrix, 164 offset print, 107, 292 one-point perspective, 172 Operationen, algebraische, 40 operator Marr-Hildreth, 293 optische Dichte, 107, 324 parallactic angle, 229 parallax, 229 parallel difference, 229 Parametervektor, 164, 329 parametrische Kurvendarstellung, 177 paraphernalia, 48 passive Radiometrie, 54, 289 Passpunkte, 289 Phong-Modell, 218, 299 Phong-shading, 219, 287 photo detector, 49 photo-multiplier, 49 photography negative, 337 Photometric Stereo, 206 pigments, 90 pipeline, 265 353 polarization, 230 pose, 48, 56 preprocessing, 120 projection, oblique, 171 projection, orthographic, 171 Projektionen, planar, 172, 284 prozedurale Texturen, 150, 289 pseudo-color, 90, 105 push-broom technology, 49 Quadtree, 193, 296 Radar, 54, 287 Radiosity, 285 radiosity, 222 Rasterdarstellung, 35 Rasterkonversion, 36, 284 ratio imaging, 107 Ratio-Bild, 113, 283 Rauschen, kohärentes, 326 ray tracing, 210, 291 ray-tracing, 210 Raytracing, recursive, 208, 298 Rectangular Tree, 199 Region, 38 relative orientation, 232 remote sensing, 52 Resampling, 287 resampling, 165, 291 Resampling, geometrisches, 249, 285 resolution, 45 RGB-Farbmodell, 93, 95, 97, 289, 302, 305 rigid body transformation, 168, 325 ringing, 135 Roberts-Operator, 134, 306 rods, 45 Rotation, 157, 303 Sampling, 34 Scannen, 50, 287 scanning electron-microscopes, 55 Schwellwert, 120, 323 Schwellwertbild, 32 Schwerpunkt, 73, 322 screening, 88 Segmentierung, 120, 323 sensor non-optical, 233, 292 sensor model, 46 Sensor-Modell, 48, 289 Shape-from-Focus, 206 Shape-from-Shading, 206 Shape-from-X, 207, 323 sinc-filter, 128 354 Skalierung, 157, 303 Sobel-Operator, 133, 299 sound, navigation and range, 54 spatial partitioning, 197, 289 spatial-domain representation, 130 spectral representation, 130 Spektralraum, 147, 286 Spiegelreflexion, 218, 284 splitting, 190, 306 spy photography, 116 starrer Körper, 168, 325 step and stare, 49 Stereo, 230, 232, 285, 324 Stereo, photometrisches, 207, 284 stereo-method, 206 stereopsis, 175, 233, 292 Structured Light, 206 structured light, 55 Strukturelement, 82 support, 137 sweep, 195 Sweeps, 195, 287 table-lens, 263 template, 127 texels, 147 Textur, 147, 286 Texture-Mapping, 285 Tiefenunterschied, 229 Tiefenwahrnehmungshilfen, 207, 323 Tiefpassfilter, 135, 287 total plastic, 231 track, 55 Tracking, 57, 256, 287 transform medial axis, 68, 308 Transformation, 157, 162, 252, 297, 299 transformations conform, 170, 292 Transformationsmatrix, 157, 168, 299, 300, 303, 323 tri-chromatic coefficients, 91 tri-stimulus values, 91 trivial acceptance, 166 Trivial rejection, 166 undercolor removal, 95 Unsharp Masking, 132, 284 unsharp masking, 131 US Air Force Resolution Target, 50 vanishing point, 171 Vektordarstellung, 35 View Plane, 174 INDEX View Plane Normal, 174 view point, 232 view point normals, 232 View Reference Point, 174 View-Frustum, 199 Virtual Reality, 256, 285 vitreous humor, 45 volume element, 53 voxel, 53 Voxel-Darstellung, 285 Wahrscheinlichkeitsdichtefunktion, 244, 304 window, 127 wire-frame, 194 XOR, 40 YIQ-Farbmodell, 96, 286, 327 Zusammenhang, 38 List of Algorithms 1 Affine matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2 Threshold image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3 Simple raster image scaling by pixel replication . . . . . . . . . . . . . . . . . . . . 42 4 Image resizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5 Logical mask operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6 Fast mask operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 7 Digital differential analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 8 Thick lines using a rectangular pen . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 9 Dilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 10 Halftone-Image (by means of a dither matrix) . . . . . . . . . . . . . . . . . . . . . 95 11 Conversion from RGB to HSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 12 Conversion from HSI to RGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 13 Conversion from GRB to HSV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 14 Conversion from HSV to RGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 15 Conversion from RGB to HLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 16 Conversion from HLS to RGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 17 hlsvalue(N1,N2,HLSVALUE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 18 Masked negative of a color image . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 19 Histogram equalization 20 Local image improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 21 Weighted Antialiasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 22 Gupta-Sproull-Antialiasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 23 Texture mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 24 Casteljau . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 25 Chain coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 26 Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 27 Quadtree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 28 Creation of a BSP tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 355 356 LIST OF ALGORITHMS 29 z-buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 30 Raytracing for Octrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 31 Gouraud shading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 32 Phong - shading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 33 Shadow map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 34 Implementation of Atheron-Weiler-Greeberg Algorithm . . . . . . . . . . . . . . . . 227 35 Radiosity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 36 Feature space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 37 Classification without rejection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 38 Classification with rejection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 39 Calculation with a node file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 40 Nearest neighbor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 41 z-buffer pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 42 Phong pipeline 43 Pipeline for lossless compression 44 Pipeline for lossy compression 45 JPEG image compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 46 MPEG compression pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 List of Definitions 1 Amount of data in an image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2 Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3 Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4 Perspective camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5 Skeleton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6 Difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7 Erosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 8 Open . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 9 Closing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 10 Morphological filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 11 Hit or Miss Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 12 Contour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 13 Conversion from CIE to RGB 98 14 CMY color model 15 CMYK color model 16 YIQ color model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 17 Histogram stretching 18 Conformal transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 19 Rotation in 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 20 2D rotation matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 21 Sequenced rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 22 Affine transformation with 2D homogeneous coordinates . . . . . . . . . . . . . . . 168 23 Bliniear transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 24 Rotation in 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 25 Affine transformation with 3D homogeneous coordinates . . . . . . . . . . . . . . . 176 26 Bezier-curves in 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 27 2D morphing for lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 28 Wireframe structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 357 358 LIST OF DEFINITIONS 29 Boundary representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 30 Cell-structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 31 Ambient light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 32 Lambert model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 33 total plastic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 List of Figures 4.1 Morphologische Erosion als Abfolge Komplement→Dilation→Komplement . . . . . 82 4.2 morphologisches Öffnen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.1 Histogramm von Abbildung B.29 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.2 eine Ebene im HSV-Farbmodell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6.1 Histogramm eines Graukeils . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.2 Histogramme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 7.1 Anwendung eines Median-Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 7.2 Tief- und Hochpassfilter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 7.3 Tief- und Hochpassfilter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 7.4 Roberts-Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 9.1 rotated coordinate system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 9.2 Konstruktion einer Bezier-Kurve nach Casteljau . . . . . . . . . . . . . . . . . 187 11.1 grafische Auswertung des z-Buffer-Algorithmus . . . . . . . . . . . . . . . . . . . . 216 B.1 wiederholte Speicherung eines Bildes in verschieden Größen . . . . . . . . . . . . . 294 B.2 dreidimensionales Objekt mit verschiedenen Darstellungstechniken gezeigt . . . . . 300 B.3 Überführung einer Vektorgrafik in eine andere . . . . . . . . . . . . . . . . . . . . . 301 B.4 Prozesskette der Abbildung eines dreidimensionalen Objekts auf die zweidimensionale Bildfläche . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 B.5 Pixelraster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 B.6 binäres Rasterbild . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 B.7 Tisch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 B.8 Inputbild . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 B.9 Die Verbindung zweier Pixel soll angenähert werden . . . . . . . . . . . . . . . . . 304 B.10 Objekt bestehend aus zwei Flächen . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 B.11 Aufteilung des Primärstrahls bei recursive raytracing“ ” 359 . . . . . . . . . . . . . . . 305 360 LIST OF FIGURES B.12 Lineare Transformation M eines Objekts A in ein Objekt B . . . . . . . . . . . . . 305 B.13 Anwendung des Sobel-Operators auf ein Grauwertbild . . . . . . . . . . . . . . . 306 B.14 Anwendung eines Median-Filters auf ein Grauwertbild . . . . . . . . . . . . . . . . 306 B.15 Beleuchtetes Objekt mit spiegelnder Oberfläche nach dem Phong-Modell . . . . . 307 B.16 Grauwertbild als höchstauflösende Ebene einer Bildpyramide . . . . . . . . . . . . 308 B.17 Polygon für BSP-Darstellung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 B.18 Anwendung des Clipping-Algorithmus von Cohen-Sutherland . . . . . . . . . . 309 B.19 Clipping nach Cohen-Sutherland . . . . . . . . . . . . . . . . . . . . . . . . . . 309 B.20 Verbindung zweier Punkte nach Bresenham . . . . . . . . . . . . . . . . . . . . . 310 B.21 Anwendung eines Gradientenoperators . . . . . . . . . . . . . . . . . . . . . . . . . 310 B.22 Auffinden der Kantenpixel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 B.23 Rand einer Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 B.24 Boolsche Operationen auf Binärbildern . . . . . . . . . . . . . . . . . . . . . . . . 312 B.25 Ermittlung der normalisierten Korrelation . . . . . . . . . . . . . . . . . . . . . . . 312 B.26 Konstruktion eines Kurvenpunktes auf einer Bezier-Kurve nach Casteljau . . . 313 B.27 allgemeine Rotation mit Skalierung . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 B.28 drei Merkmalsvektoren im zweidimensionalen Raum . . . . . . . . . . . . . . . . . 314 B.29 digitales Grauwertbild (Histogramm gesucht) . . . . . . . . . . . . . . . . . . . . . 314 B.30 leere Filtermasken . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 B.31 morphologisches Öffnen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 B.32 eine Ebene im HSV-Farbmodell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 B.33 Graukeil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 B.34 Roberts-Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 B.35 zweidimensionale Polygonrepräsentation . . . . . . . . . . . . . . . . . . . . . . . . 318 B.36 Objekt und Kamera im Weltkoordinatensystem . . . . . . . . . . . . . . . . . . . . 319 B.37 Graukeil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 B.38 Anwendung der normalisierten Kreuzkorrelation . . . . . . . . . . . . . . . . . . . 320 B.39 Anwendung der medial axis Transformation . . . . . . . . . . . . . . . . . . . . . . 320 B.40 Graukeil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 B.41 Anwendung des Hit-or-Miss-Operators auf ein Binärbild . . . . . . . . . . . . . . . 321 B.42 Erstellen dicker Linien . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 B.43 Definition eines zweidimensionalen Objekts durch die Kettencode-Sequenz 221000110077666434544345“ 323 ” B.44 Transformation von vier Punkten . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 B.45 Sub-Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 B.46 drei digitale Grauwertbilder und ihre Histogramme . . . . . . . . . . . . . . . . . . 325 B.47 Halbtonverfahren . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 B.48 Halbtonverfahren . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 LIST OF FIGURES 361 B.49 morphologisches Schließen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 B.50 Anwendung des Hit-or-Miss-Operators auf ein Binärbild . . . . . . . . . . . . . . . 326 B.51 Halbtonverfahren . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 B.52 Polygon für BSP-Darstellung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 B.53 Farbbildnegativ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 B.54 überwachte Klassifikation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 B.55 Rechteck mit Störobjekten . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 B.56 Pixelanordnung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 B.57 Bild mit Störungen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 B.58 Rasterdarstellung eines Objekts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 B.59 Grauwertbild . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 B.60 Transformationsmatrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 B.61 Digitales Rasterbild mit zum Rand hin abfallender Intensität . . . . . . . . . . . . 331 B.62 Farbfilmnegativ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 B.63 Vereinfachter Aufbau des bällefangenden Roboters auf der Landesausstellung comm.gr2000az333 B.64 Bild mit überlagertem kohärentem Rauschen . . . . . . . . . . . . . . . . . . . . . 334 B.65 Alternative Berechnung der morphologischen Erosion . . . . . . . . . . . . . . . . . 335 B.66 Foto mit geringem Kontrast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336 B.67 Morphologisches Öffnen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336 B.68 Abstandsberechnung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 B.69 verschiedene Filteroperationen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 B.70 Zylinder mit koaxialer Bohrung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 B.71 Segmentierung eines Grauwertbildes . . . . . . . . . . . . . . . . . . . . . . . . . . 340 B.72 Artefakte bei einem schachbrettartigen Muster . . . . . . . . . . . . . . . . . . . . 340 B.73 Anwendung der normalisierten Kreuzkorrelation auf ein gedrehtes Bild . . . . . . . 341 B.74 automatische Kontrastverbesserung . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 B.75 unscharfe Kante in einem digitalen Grauwertbild . . . . . . . . . . . . . . . . . . . 342 B.76 Histogramme von zwei verschiedenen Bildern . . . . . . . . . . . . . . . . . . . . . 343 B.77 Torus mit Oberflächenstruktur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 B.78 Vergrößerung eines Bildausschnittes unter Verwendung verschiedener Interpolationsverfahren . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 B.79 Vergrößerung eines Bildausschnittes unter Verwendung verschiedener Interpolationsverfahren . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 B.80 BSP-Baum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350 B.81 Darstellung eines 3D-Modells unter Anwendung verschiedener Beleuchtungsmodelle 350 362 LIST OF FIGURES Bibliography [FvDFH90] James D. Foley, Andries van Dam, Steven K. Feiner, and John F. Hughes. Computer Graphics, Principles and Practice, Second Edition. Addison-Wesley, Reading, Massachusetts, 1990. Overview of research to date. [GW92] Rafael C. Gonzalez and Richard E. Woods. Digital Image Processing. Addison-Wesley, June 1992. ISBN 0-201-50803-6. 363