Patent Images

Transcription

Patent Images
Patent Images
A Glass-encased Tool
Mihai Lupu, Allan Hanbury, Florina Piroi Vienna University of Technology
Tobias Schleser m2n consulting and development GmbH
Roland Mörzinger, René Schuster Joanneum Research
Motivation
• Many patents contain images. On a dataset of
16 million patents:
– 28% of patents have at least one image
– Average #images/patent: 9.4
• Often images contain important information
about the innovation
• Very important in a variety of engineering
fields (mechanics, electronics, chemistry)
• Current patent search tools use only text
Motivation 2
• Almost all image retrieval work has focussed
on images
containing
colour/
texture
• Useless in the patent domain
Outline
• Related work
• IMPEx
• Patent Image Processing
– Figure Segmentation
– Specific processing (Flowcharts)
• Text and Images
• The Integrated System (demo)
• Conclusion
Image Similarity in Patents
• PATSEEK
– Extract lines
– Graph-based similarity using a “softened” variant of
the Hausdorff distance
– Huet et al. 2001
• Patmedia
– Informatics and Telematics Institute, Greece
– Adaptive Hierarchical Density Histogram
• Focus on geometry
• Sidiropoulos et al., 2011
6
http://mklab-services.iti.gr/patmedia/
PatMedia
Image Mining for Patent Exploration
• Make information in patent images accessible
• Automatic interlinking of patent text and drawing parts by
sub-part segmentation and label identification
• Interactive user-guided search with manual feedback
provided by the patent expert
• Variety of technical drawings: flow charts, block diagrams,
time charts and graph plots
• Prototype integration into m2n Knowledge Discovery Suite
based on MAREC dataset extract and patent pdfs
FIT-IT project IMPEx
Austrian Research Promotion Agency (FFG), No. 825846
Figure Extraction
• Separation of multiple figures on one page
Variability
10
Types of Patent Image
Figure
Photo
Diagram
BlockDiagram
State
Flowchart
Circuit
TechnicalDrawing
PlaneView
BottomView
ElevationalView
Graph
SideView
TopView
Waveform
Response
TimeChart
SectionalView
From: S. Vrochidis, S. Papadopoulos, A. Moumtzidou, P.
Sidiropoulos, E. Pianta, and I. Kompatsiaris. Towards contentbased patent image retrieval: A framework perspective.
World Patent Information, 32(2):94-106, 2010.
12
Chemical Structure
Abstract Drawing
Flow Chart
Mathematical Formula
Program Code
Graph
Gene Sequence
CLEF-IP 2011 Patent Image Classes
Character
Table
Flowchart Analysis
• Identify
–
–
–
–
–
Number of nodes
Types of nodes
Text in nodes
Edges
Types of edges
• Specific to the domain
– Node annotations
Text and Images
The Integrated System
Patent PDFs without
metadata for claims,
technical devices,
measurements, images,
….
The Integrated System
Pages of PDFs
converted to bitmap
images
The Integrated System
Filtering of pages with
images and other page
types
vs.

The Integrated System
Optical character
recognition, reference
detection (Fig., Tab., …)
The Integrated System
Segmentation of images into
individual figures for linking
figure labels with image content
M. Lupu, R. Mörzinger, T. Schleser et al:
"Patent Images - a Glass encased Tool";
12th International Conference on Knowledge
Management and Knowledge Technologies
(i-KNOW 2012)
The Integrated System
Classification of figure type
e.g. abstract drawing, graph, gene
sequence, table, maths, program
listing, flow chart, …
R. Mörzinger et al:
"Classifying Patent Images";
Conference on Multilingual and
Multimodal Information Access
Evaluation (CLEF 2011)
The Integrated System
Flow chart analysis with
recognition of node types,
edges and annotations for
semantic processing
Current work for
CLEF-IP: Information Retrieval
in the Intellectual Property
Domain, Flowchart recognition
task 2012
The Integrated System
• Demo
– References as Finding Objects
– Semantic Patent Viewer
The Integrated System
• Demo
The Integrated System
• Demo
Conclusions
• Patent images need special treatment
– From other images
– Between themselves
• First step: integrate text & images through
references
• Ultimate goal: full semantic search on all types
of images
• Evaluation efforts at CLEF-IP (in two weeks in
Rome)