COMPUTER VISION
Transcription
COMPUTER VISION
COMPUTER VISION MEI/1 University of Beira Interior, Departament of Informatics Hugo Pedro Proença, hugomcp@di.ubi.pt, 2014/2015 Class Time & Etc Tempos Mon. R Tue. R Wed. R Thu R Fri R 8-9 - - - - - - - - - - 9-10 - - - - - - - - - - 10-11 - - - - - - - - - - 11-12 - - - - - - - - - - - - - - Theorethical 6.17 12-13 - - 13-14 - - - - - - - - - - 14-15 - - - - - - - - - - 15-16 - - - - - - - - - - 16-17 - - - - - - - - 17-18 - - - - - - - - 18-19 - - - - - - - - Practical 6.19 - - - Office: 4.11; Bloco 6 - Office Hour: Wednesdays, 14:00 – 16:00 - Course URL: http://www.di.ubi.pt/~hugomcp/visaoComp - Marks, Announcements, Exercises, Data sets Evaluation Criteria (PT) ! Assiduidade (A) ! ! Trabalho Prático (P) ! Os trabalhos práticos da disciplina contribuem em 12 valores para a classificação final na disciplina. ! A aprovação à disciplina requer a nota mínima de 6 valores (6/20) nos trabalho práticos. ! ! " Entrega P2 (5 valores): 24 de Abril de 2015, 23:59, via email. Apresentações (A): As aulas práticas serão parcialmente destindas à apresentação pelos alunos de trabalhos científicos seleccionados. (Mínimo: 2 trabalhos p/ aluno) (2 valores) Prova (F1) - 3ª feira, 26 de Maio de 2015, 11:00-13:00, Sala 6.17 C=P1*5/20+P2*5/20+A+F*8/20 Admissão a Exame ! ! Entrega P1 (5 valores): 20 de Março de 2015, 23:59, via email. Classificação Ensino/Aprendizagem (C) ! ! " Frequência ! ! A aprovação à disciplina está condicionada a níveis mínimos de assiduidade de 80% nas aulas. Consideram-se admitidos a Exame os alunos que obtiverem classificação mínima de 6 valores na componente de Ensino-Aprendizagem. Exames ! A nota do trabalho prático é sempre considerada para atribuição da nota final Course Projects Part I ! ! Head Detection Part II ! ! Object Tracking 3.5 + 3.5 values = 7 ! Requirements ! In order to be succeeded in this course, it is strongly suggested that students have the following skills and knwoledges: ! Programming experience in structured language: " " " " ! ! Functions Parameters Iterative Blocks Conditional Blocks MATLAB will be mainly used during the pratical classes. Elementary notions about: " " " " Linear Algebra Probability and Statistics Geometry Artificial Intelligence Course Summary ! ! Introduction ! Computer Vision (CV). What is it? ! Goals of CV: Why are they so hard? ! Applications of CV ! Biologic Perspective Vision Cameras and Images ! Optics ! Digital Images ! Sampling ! Calibration Course Summary Filtering ! Convolution, Correlation Spatial and Frequency Domains ! ! ! Fourier Transform Noise Sources ! ! ! ! ! Gaussian Noise Impulse Noise Median Filter Gaussian Filter Image Representation ! ! ! ! ! ! Features Image Derivatives First Derivatives: Edges Sobel Detector Canny Detector Course Summary Shape detection ! ! Data Descriptors ! ! ! ! ! ! ! Color Texture Shape Motion Optical Flow Clustering Stereo Vision Classifiers ! ! ! ! Hough Transform Probabilistic Models Detection, Segmentation and Recognition Books, Textbooks ! ! Main ! David A. Forsyth and Jean Ponce; Computer Vision: A Modern Approach, Prentice-Hall, 2002. ! Dana Ballard and Chris Brown; Computer Vision, Online. ! J. R. Parker; Algorithms for Image Processing and Computer Vision, Wiley, 1995. Complementary ! ! ! Torras, C.; Computer Vision, Theory and Industrial Applications, New York, Springer, 1992. Davies, E.R.; Machine Vision: Theory, Algorithms, Practicalities, Third Edition, Morgan Kaufmann, 2005. List of on-line books http://homepages.inf.ed.ac.uk/rbf/CVonline/books.htm#online Links(cont) ! Compendium about Computer Vision: http:// homepages.inf.ed.ac.uk/rbf/CVonline/ ! Dicitionary of Computer Vision and Image Processing: http://homepages.inf.ed.ac.uk/rbf/CVDICT/ ! Vision research groups: http://www.cs.cmu.edu/~cil/v-groups.html ! C / C++ library for image/video processing: http://cimg.sourceforge.net/ ! MATLAB help : http://www.mathworks.com/access/ helpdesk/help/techdoc/matlab.shtml Computer Vision? What is It? ! ! ! ! ! ! Trucco and Verri: “Computing properties of the 3-D world from one or more digital images”. Sockman and Shapiro: “To make useful decisions about real physical objects and scenes based on sensed images”. Ballard and Brown: “The construction of explicit, meaningful description of physical objects from images”. Forsyth and Ponce: “Extracting descriptions of the world from pictures or sequences of pictures”. English Dictionary: “The use of digital computer techniques to extract, characterize, and interpret information in visual images of a threedimensional world”. Wikipedia: “Computer vision is the science and technology of machines that see. As a scientific discipline, computer vision is concerned with the theory for building artificial systems that obtain information from images”. Computer Vision? What is It? ! ! It can be considered a full Artificial Intelligence problem As such, it is possible to regard it “simply” as a “signal-tosymbol” converter. ! In oppostition to “Computer Graphics”, which can be regarded as a “symbol-to-signal” converter Perception Computer Vision Symbols(s) ß Π Þ Ø Computer Vision? What is It? ! Intercepts a broad range of disciplines: Optics Robotics Image Processing Pattern Recognition Optics ! Is related to the biological sensing process of vision (the way light is handled by human brain). ! Describes the behavior of light and its interaction with matter. ! Three maisn types of light can be identified, using as reference visible wavelength: infra-red, visible a ultra-violet. ! However, being a radiation, similar fenomena occur in x-rays, micro-waves, radio waves or any other type of radiation (interaction between charged particles) Image Processing ! ! Way of processing signals, having as input bi-dimensional signals. Transformation of the original data, in order to make easier further interpretation phases. ! ! ! ! ! ! ! Geometric transforms (scale, rotation, translation, affine and projective transforms). Color or intensity adjustement Data / region reconstruction Data registration Detection Segmentation recognition Pattern Recognition ! ! ! It is often considered the core of the Vision system Pattern Recognition aims at classify / labelling the input data There are, typically, 3 variants: ! Statistics ! Structural ! Neural Robotics ! ! ! Domain of knowledge that evolves planning and development of phisical automata (robots) It intersects electronical engineering, mechanics, computer science and cybernetics areas. Even though a precise definition is hard to find, a robot is a machine that: ! ! Has sensorial abbilities It has the ability to actuate in the environment, and change its state. Computer Vision: Aplications ! Biometric Recognition ! Iris, face, gait, … Computer Vision: Aplications ! Autonomous Driving, Navigation ! “Google car”: ! Stanford University: Computer Vision: Aplications ! Medical Diagnosis Computer Vision: Aplications ! Robotic Production/ Inspection Systems ! NASA’s autonomous walker: ! Subsea 7 inspector Computer Vision: Aplications ! Automatic Character recognition (OCR) ! NOKIA multi-scanner: Computer Vision: Aplications ! Surveillance / Security Systems ! Current state: Computer Vision: Aplications ! Defense Systems (Ballistics) ! China Dongfeng 21D with inflight autonomous updates: Computer Vision: Research ! ! ! One of the more active domains of knowledge, in the Computer Science area. Its in the earliest development stage, as there are not autonomous and generic systems, with vision abilities close to the human being. We know that this type of vision-problems can be solved, as humans do it for thousands of years. However: ! ! ! How to represent knowledge? What inference mechanisms should be created? What is intelligence? Computer Vision: Research ! International Conferences ! ! ! ! ! ICCV: International Conference on Computer Vision CVPR: Computer Vision and Pattern Recognition International Conference ICPR: International Conference on Pattern Recognition International Journals ! Elsevier Image and Vision Computing ! Elsevier Computer Vision and Image Understanding ! IEEE Transactions on Image Processing Hundreds of Research Groups ! ! ! Académicos: MIT, Stanford, Cambridge, UCLA, ... Comerciais: Microsoft, IBM, Honda, Sarnoff, Panasonic, ... http://www.cs.cmu.edu/~cil/v-groups.html Computer Vision: Cohesive Perspective 3D World Semantic Information Data Acquisition Data Processing Computer Vision: Typical Stages Pre-Processing Computer Vision: Typical Stages Detection Computer Vision: Typical Stages Segmentation Computer Vision: Typical Stages Normalization Computer Vision: Typical Stages Encoding 011010001010101010010101010 010101000101010101010101010 101010101010010101010101010 000100101010101010100010011 010100010101010010101010101 110101001010100101010010101 Computer Vision: Typical Stages Matching 011010001010101010010101010 010101000101010101010101010 101010101010010101010101010 000100101010101010100010011 010100010101010010101010101 110101001010100101010010101 11010101010101001010101010100 00010101010101010101010101010 01010101000101010101010101010 01010101000001011110001010101 01010000101010010101001010100 10101010010101010010101010100 Computer Vision: Typical Stages Classification It’s a Dear! 11010101010101001010101010100 00010101010101010101010101010 01010101000101010101010101010 01010101000001011110001010101 01010000101010010101001010100 10101010010101010010101010100 Vision: Ilusions ! What is the relationship between the color of regions “A” and “B”? Visão: Ilusão ! What kinds of motion this figure has? Visão: Ilusão ! Classical M.C. Escher ilusions: Vision: Why is it so Hard? ! ! Most of the vision problems are ill-posed (bad formed), in opposition to well-posed problems Hadamard defined well-posed mathematic models those with the following properties: ! ! ! ! Not even considering other factors: ! ! ! There is a solution; The solution is unique; Solution depends exclusively from data. Representation of 3D world by 2D data The variation / noise associated with data acquisition turn most vision problems ill-posed. As such, errors are expected (should be simply minimized) Vision: Why is it so Hard? ! The process of representing 3D data in two dimensions brings many ambiguities to the represented data: Vision: Why is it so Hard? ! Ambiguities: