Patent Images
Transcription
Patent Images
Patent Images A Glass-encased Tool Mihai Lupu, Allan Hanbury, Florina Piroi Vienna University of Technology Tobias Schleser m2n consulting and development GmbH Roland Mörzinger, René Schuster Joanneum Research Motivation • Many patents contain images. On a dataset of 16 million patents: – 28% of patents have at least one image – Average #images/patent: 9.4 • Often images contain important information about the innovation • Very important in a variety of engineering fields (mechanics, electronics, chemistry) • Current patent search tools use only text Motivation 2 • Almost all image retrieval work has focussed on images containing colour/ texture • Useless in the patent domain Outline • Related work • IMPEx • Patent Image Processing – Figure Segmentation – Specific processing (Flowcharts) • Text and Images • The Integrated System (demo) • Conclusion Image Similarity in Patents • PATSEEK – Extract lines – Graph-based similarity using a “softened” variant of the Hausdorff distance – Huet et al. 2001 • Patmedia – Informatics and Telematics Institute, Greece – Adaptive Hierarchical Density Histogram • Focus on geometry • Sidiropoulos et al., 2011 6 http://mklab-services.iti.gr/patmedia/ PatMedia Image Mining for Patent Exploration • Make information in patent images accessible • Automatic interlinking of patent text and drawing parts by sub-part segmentation and label identification • Interactive user-guided search with manual feedback provided by the patent expert • Variety of technical drawings: flow charts, block diagrams, time charts and graph plots • Prototype integration into m2n Knowledge Discovery Suite based on MAREC dataset extract and patent pdfs FIT-IT project IMPEx Austrian Research Promotion Agency (FFG), No. 825846 Figure Extraction • Separation of multiple figures on one page Variability 10 Types of Patent Image Figure Photo Diagram BlockDiagram State Flowchart Circuit TechnicalDrawing PlaneView BottomView ElevationalView Graph SideView TopView Waveform Response TimeChart SectionalView From: S. Vrochidis, S. Papadopoulos, A. Moumtzidou, P. Sidiropoulos, E. Pianta, and I. Kompatsiaris. Towards contentbased patent image retrieval: A framework perspective. World Patent Information, 32(2):94-106, 2010. 12 Chemical Structure Abstract Drawing Flow Chart Mathematical Formula Program Code Graph Gene Sequence CLEF-IP 2011 Patent Image Classes Character Table Flowchart Analysis • Identify – – – – – Number of nodes Types of nodes Text in nodes Edges Types of edges • Specific to the domain – Node annotations Text and Images The Integrated System Patent PDFs without metadata for claims, technical devices, measurements, images, …. The Integrated System Pages of PDFs converted to bitmap images The Integrated System Filtering of pages with images and other page types vs. The Integrated System Optical character recognition, reference detection (Fig., Tab., …) The Integrated System Segmentation of images into individual figures for linking figure labels with image content M. Lupu, R. Mörzinger, T. Schleser et al: "Patent Images - a Glass encased Tool"; 12th International Conference on Knowledge Management and Knowledge Technologies (i-KNOW 2012) The Integrated System Classification of figure type e.g. abstract drawing, graph, gene sequence, table, maths, program listing, flow chart, … R. Mörzinger et al: "Classifying Patent Images"; Conference on Multilingual and Multimodal Information Access Evaluation (CLEF 2011) The Integrated System Flow chart analysis with recognition of node types, edges and annotations for semantic processing Current work for CLEF-IP: Information Retrieval in the Intellectual Property Domain, Flowchart recognition task 2012 The Integrated System • Demo – References as Finding Objects – Semantic Patent Viewer The Integrated System • Demo The Integrated System • Demo Conclusions • Patent images need special treatment – From other images – Between themselves • First step: integrate text & images through references • Ultimate goal: full semantic search on all types of images • Evaluation efforts at CLEF-IP (in two weeks in Rome)