ATAPY Software in Brief

Transcription

ATAPY Software in Brief
”Working with ATAPY has
been a pleasure. We have
been impressed with the
high level of concern for
producing the best possible
text of the works and the
accuracy of the results.”
Virginia Laursen
The Royal Danish Library
Copenhagen,
Denmark
“Russia’s history is a history
of great thinkers. Look at
the chess players for
example. ATAPY employs
highly educated engineers
gathered together in an
efficient company. The result is quality tailor-made
software at a competitive
price.”
Rob Camerlink
EasyData B.V.
Apeldoorn, the
Netherlands
“ATAPY reached 99.992%
accuracy in the GermanRussian Dictionary, and
99.997% quality in the
Spanish-Russian Dictionary
project. They also corrected
many mistakes in the source
dictionary text, including typographical misprints and
even mistakes in special
marks that are almost impossible to detect without
special programming tools
and profound knowledge of
linguistics.”
Anna Zhavoronkova
ABBYY Software House
Moscow, Russia
ATAPY Software in Brief
ATAPY Software is a software development company with offices
in Russia and Germany specializing in OCR/document imaging and
data capture solutions. The company was established in 2001 with
active support from its main partner ABBYY Software House.
The major specialization of the company is outsourced software
development services in OCR/document imaging and data capture
fields; the company also possesses its own unique know-how in
the area of document imaging and applied OCR solutions.
ATAPY Software specializes in development of software solutions
of the following types:
Scanning front-end applications
5
Document imaging solutions - graphical OCR enhancement
filters, page skew/orientation correction tools, image zoning
tools
5
Document management applications, pre- and post-OCR
routines to solve a variety of tasks: intuitive manual data
verification, document classification, grouping, routing,
validation, conversion, EDMS integration, etc.
5
PDF tools and solutions
5
Complete data input solutions on the basis of contemporary
OCR SDKs
5
Applied industry-specific solutions based on OCR technology
(ANPR/LPR solutions for traffic control systems, media clipping
systems for PR agencies, etc.)
5
Professional services based on products/technologies by ABBYY
Software House in the areas of data capture, OCR/ICR,
computer linguistics, search/information retrieval engines
The main part of the company's own know-how lies in the
following fields:
Document imaging: a number of pre-OCR quality enhancement
filters, and a layout analysis tool for
segmentation of non-standard pages
5
ANPR/LPR: a car license plate recognition SDK capable of reading Russian,
German, Swedish and Dutch number
plates
The company's first projects were implemented for ABBYY Software House and
involved customizing their enterprise-class products to end-user requirements.
Today, ATAPY has a considerable track record of projects for customers
worldwide, which includes solutions based on ABBYY products and other OCR
platforms and toolkits. Among our customers are:
ABBYY Russia: runs a development team and outsources occasional PS projects
5
ABBYY Europe and ABBYY USA: run technical support teams and outsource
occasional PS projects
5
Notable Solutions Inc. (NSi, developer of AutoStore EDMS system): runs a
dedicated team of software engineers and testers
5
EasyData, Lucom, PrePress Systeme: run small software development teams at
ATAPY
5
Springer Verlag: for this publishing house ATAPY is converting a large scientific
encyclopedia to electronic format (an ongoing project since 2003)
5
Five other smaller clients currently run software development and media
service projects at ATAPY
ATAPY Software also provides a range of digitization and data entry services (scanning, OCR, character
repair, KFI, data format conversion, etc.). The company employs experienced software engineers,
linguists, and multilingual operators.
The track record of the company includes more than 150 completed projects for clients from the US,
many European countries, Russia, and Middle East. Directly or through our strategic partnerships we
have had the privilege of supplying our software development experience to such companies and
organizations as Oce, RICOH, Fujitsu, Toshiba, Hewlett-Packard, Captiva, Apple Computers, the
Government of the Netherlands, and the Meta-E consortium of 17 European and American universities
funded by the European Commission.
ATAPY Software is a Microsoft Certified Partner since 2007. ATAPY has two
Microsoft Partner competencies: the Software Development competency and the
Data Platform competency.
From European car license plates to Dutch penitentiary system surveys, from American healthcare
forms to Turkish printed media, our software works around the world and around the clock for our
clients to forget about paper entry issues and to concentrate on their core businesses.
Sergey Borovoy
CEO
©2011 ATAPY Software. All rights reserved.
All trademarks used are the property of their respective owners.
ATAPY Software
630090, Engineernaya Street, 4a, 522
Novosibirsk, Russia
Tel. +7 383 33 56 56 9 Fax +7 383 33 56 56 1
www.atapy.com office@atapy.com
Virtual Image Processing
for AUTOSTORE
NSi AUTOSTORE transforms
paper - and manual-based
data capture processes into
easy and efficient electronic
workflows. Companies and
organizations that use
AUTOSTORE streamline
their business processes,
improve productivity, reduce operating costs and
enforce compliance with
laws and regulations.
AUTOSTORE enables users
to classify documents, extract just the information
that's needed, and send it
wherever necessary, all with
.
a few easy clicks.
AUTOSTORE provides a
smart business automation
solution for a range of information-intensive industries, including:
Image optimizing component for the powerful document capture system by Notable Solutions, Inc. (NSi) marked the starting
point in partnership between ATAPY and NSi
NSi’s AUTOSTORE application is a leading product for document capture,
processing and distribution. Adopted by many top manufacturers of onramp devices, such as MFPs and digital copiers, AUTOSTORE’s framework
is becoming the standard of server-based document capture for devices
by HP, RICOH, Kyocera, Xerox, Canon and many other internationallyrenowned companies.
ATAPY has enhanced AUTOSTORE with a new image treatment tool
named Virtual Image Processing (VIP). In terms of AUTOSTORE, VIP is an
integrated process component placed between the “capture” and
“route” sources in a capture workflow. Such workflows are created
visually in the AUTOSTORE Process Designer (an Administrative tool)
ABBYY
FormReader ICR
AutoCapture
Healthcare
5
Financial services
5
Retail
5
Legal
Virtual Image
Processing
Microsoft
Exchange
5
Utilities
5
Manufacturing
5
Transportation and
logistics
The capture workflow creation process
5
Professional services
5
Local and state
government
5
Non-profit
using drag-and-drop technique. VIP provides for optimizing document
image quality for subsequent recognition via several configurable filters
and their combinations. The number of filters is growing with every new
version of the AUTOSTORE application.
All original image modifications are reflected “on the fly” in the preview
window of the component configuration interface. Once the desired
image quality is achieved, the filter combination profile can be saved and
used in other AutoStore workflows containing VIP. This eliminates the
need to configure filters every time a new capture-process-route
sequence is designed. This is extremely valuable for companies with large
workflows of template-based documents.
VIP implements a large number of different
image filters, out of which the most basic are:
Color Extraction filter - keeps the selected color,
while all other colors are dropped. Users can
also select a color to replace the dropped ones
in the resulting image
Deskew filter - corrects the skewed angle of the
image
Despeckle filter - eliminates small speckles and
garbage on the image
Color Dropping filter - drops the selected color
and replaces it with the substitute color
VIP filter profile
configuration dialog
Color dropping
configuration dialog
Thresholding filter - converts an image to blackand-white. All pixels with the brightness level less than the specified level become white, while others
become black
Adaptive Thresholding filter - allows improving recognition results for source images with irregular
background using a sophisticated binarization algorithm
Notable Solutions, Inc. (www.nsius.com) was founded in 1995 to provide complete
technology solutions in the fields of Software Development, System Engineering,
Technology Training and Business Consulting. Since that time NSi has evolved
towards software and hardware design, development, network integration, and support of document
management systems. NSi prides itself on a commitment to quality and a reputation for excellence. It
is now a leading provider of content capture software with products in use by Canon, HP, Kodak,
Kyocera, RICOH, Sharp, Xerox and others.
©2011 ATAPY Software. All rights reserved.
AutoStore is a registered trademark of Notable Solutions, Inc.
All the other trademarks are the property of their respective owners.
ATAPY Software
Notable Solutions, Inc. (NSi)
630090, Engineernaya Street, 4a, 522
Novosibirsk, Russia
Tel. +7 383 33 56 56 9 Fax +7 383 33 56 56 1
www.atapy.com office@atapy.com
9715 Key West Ave., Suite 200
Rockville, MD 20852, USA
Tel.: +1 240 683-8400
Fax: + 1 240 683 8420
www.nsius.com
sales@nsius.com
ABBYY FineReader for Fujitsu
ScanSnap!TM
“The software supplied is
also very good. ABBYY
FineReader recognizes text,
tables and images in
documents and can import
to Word or Excel for editing.
It will automatically rotate
scanned pages to their
correct orientation and
leave out any blank pages.”
Tim Smith,
“Computeractive”
UK
Fujitsu (www.fujitsu.com) is
a multinational computer
hardware and IT services
company based in Tokyo,
Japan. The company specializes in semiconductors, air
conditioners, computers
(supercomputers, personal
computers, servers), telecommunications, and services. Fujitsu employs around
400,000 people and has
~500 subsidiary companies.
Under a contract with ABBYY Europe
GmbH, ATAPY Software has completed
ABBYY FineReader for ScanSnap!™
software package to be bundled with
Fujitsu scanners
ScanSnap!™ by Fujitsu is a family of highspeed desktop office scanners with the key
idea of one-step approach to document
conversion. Normally, one push of a button on
the scanner's faceplate is enough to see the
document image on the screen. And if any
scanning parameters need to be modified, ScanSnap Monitor software
enables users to do that in just a few mouse clicks.
But the image is not always the answer - no scanning software suite is
complete without a good OCR application. For Fujitsu, the choice was
obvious: ABBYY FineReader. ABBYY European office which negotiated
the deal transferred the development to ATAPY Software, ABBYY's
technology partner. The application implemented by ATAPY comprised
four components. The Scan2Word, Scan2Excel, and Scan2PDF modules
converted scanned images to the corresponding file formats as a result of
pushing the button on the scanner. The intentional simplicity of the
solution didn’t mean lack of flexibility, as the fourth component - the
Exporter Settings module - provided access to a variety of parameters:
turning on and off preservation of page and line breaks, retaining text
color for Word documents, replacing uncertain words with images and
reducing picture resolution for PDF, and overall practically the entire
wealth of FineReader settings and options.
The package offered a wide choice of recognition and interface languages. A user could select from 7 languages of the program interface:
English, German, French, Russian, Spanish, Portuguese, or Italian. Also
there were tools for adding new interface languages. And, thanks to the
built-in capabilities of ABBYY FineReader, the number of recognition
languages was so large (177) that they had to be categorized into 5
groups for manageability.
ABBYY FineReader for ScanSnap!™ combines the simplicity of the
brilliant Fujitsu one-step scanning approach and the power of ABBYY
FineReader technology. ATAPY's expertise enabled these components to
work in synergy for the best performance and user satisfaction.
©2011 ATAPY Software. All rights reserved.
ABBYY and ABBYY FormReader are registered trademarks of ABBYY Software House.
All the other trademarks are the property of their respective owners.
ATAPY Software
ABBYY Europe Software House
630090, Engineernaya Street, 4a, 522
Novosibirsk, Russia
Tel. +7 383 33 56 56 9 Fax +7 383 33 56 56 1
www.atapy.com office@atapy.com
80687 Munich, Germany
Elsenheimerstrasse 49
Tel. +49 89 511 159 0
Fax +49 89 511 159 59
www.abbyyeu.com info@abbyyeu.com
FineReader-based OCR module
for Captiva InputAccel
Captiva Software Corporation, a standard setter in
enterprise input solutions, chose ABBYY Software
House OCR technologies for leveraging in its flagship
information input solution.
Captiva InputAccel is used
by hundreds of companies around the world,
helping them to collect and integrate external
information in their systems. InputAccel works around
the globe and the clock to transform the deluge of
external data into usable, business-ready content, no
matter what its format or point of origin is.
Needless to say that accuracy in processing data
streams flowing in and out 24/7 is a vital matter for any
paper-intensive company’s success. To ensure high
data capture accuracy, Captiva Software Corporation
entrusted creation of an OCR module for InputAccel to
ATAPY, ABBYY's software development partner
experienced in ABBYY FineReader-based solutions for
S O F T WA R E
C O R P O R AT I O N
where information lives
Companies of all sizes.
Thorough study of InputAccel by specialists from
ABBYY resulted in complete project documentation
passed to ATAPY for implementation. Besides that,
synergetic joint project management (ABBYY+ATAPY)
allowed to improve the module in terms of usability
and productivity right in the course of development,
without running into additional work investment or
stretching the project timelines.
Feature highlights
OCR module integrates into InputAccel workflows as
configurable instances with user-defined settings.
Settings are provided by FineReader Engine and cover
pre-OCR optimization, recognition options, and output document formatting. Each instance can be
inserted into the workflow to envelop a task, be it
processing a single page or digitizing large volumes of
incoming documentation.
Captiva Software Corp. manufactures software products for document processing
and data capture from paper and electronic documents and provides related
services. In 2005 Captiva was acquired by EMC Software Group, a division of EMC
Corporation (www.emc.com). This acquisition represented a natural extension to
the EMC Documentum enterprise content management platform and added
existing integrated technology to the EMC software portfolio.
©2011 ATAPY Software. All rights reserved.
ABBYY and ABBYY FineReader are registered trademarks of ABBYY Software House.
All the other trademarks are the property of their respective owners.
ATAPY Software
630090, Engineernaya Street, 4a, 522
Novosibirsk, Russia
Tel. +7 383 33 56 56 9 Fax +7 383 33 56 56 1
www.atapy.com office@atapy.com
EasyData and ATAPY Software
for Seamless Data Capture
Through innovative use of ABBYY FineReader products and
long-term cooperation with ATAPY Software, EasyData B.V.
is becoming the leading supplier of customized OCR/ICR
solutions in the Netherlands
EasySeparate is the solution
for today’s demand for
converting and combining
the output of an MFP.
With EasySeparate it is
possible to simplify the data
input process by configuring
a profile for mass input of
each particular type of
forms. The configurable
presets include:
Combining or separating
documents based on a
barcode or text string/regular expression
Blank page removal, a
number of image processing options
Indexing, adding time and
date stamp for extra traceability, collecting metadata in XML ticket
Setting a pre-defined output format, including file
type and file name
EasySeparate supports integration with ABBYY Form
Reader – a powerful data
capture product by ABBYY
Software House allowing to
automatically read, convert,
and export unstructured
documents, such as invoices, to any external system.
EasySeparate provides a
variety of customization
options by means of its
Scripting module.
FormReader
As a subcontractor to EasyData B. V., ATAPY has
applied its experience in a number of projects
for the Dutch IT industry. ATAPY solutions are
now part of several efficient software products
which streamline the data capture routines for
end users in a number of vertical markets.
ATAPY develops a valuable add-on to
EasyData's Flagship Product EasySeparate
ATAPY Software contributed to the
development of EasySeparate by implementing Visioneer OneTouch® Link – a
solution that integrates the product with
the industry-leading Visioneer and XEROX scanners supporting the
®
®
OneTouch technology. The ATAPY's Visioneer-certified OneTouch
Link works with the device driver, allowing users to enjoy the
benefits of quick and straightforward scanning combined with the
outstanding EasySeparate's data capture capabilities. Preparing to
scan multiple documents of similar type or structure, a user selects
one of his/her pre-configured EasySeparate profiles in the scanner
interface settings and specifies EasySeparate as a destination. All
the rest of work, apart from pushing or clicking “Scan”, is done by
EasySeparate – including intelligent barcode- and text-based
document flow separation, sorting, blank page removal, OCR,
document indexing, metadata processing, grouping, format
conversion, and export.
The solution provides a powerful add-on to EasySeparate, making
specific and familiar EasySeparate processing choices immediately
available to users.
"With EasySeparate we've been striving to bring a new degree of
ease and transparency into the document management process of
small to middle organizations, allowing them to increase operating
efficiency and cut costs. This new integration takes us one step
further in this direction; I'm sure it'll be demanded by our
customers,” says Robert Camerlink, the EasyData B.V. CEO.
Participation in Development of the Scan2IT Component for Océ
Among EasyData's customers is Oce, one of the world leaders in hardware and
software for document processing. Cooperation had started with a relatively simple
“Oce Document Interpreter” application which detected separator pages within the
incoming image stream and split the stream into multi-page documents. Stability,
short development time, and low cost of this initial application led Oce to order
additional features, such as barcode search and logging, empty page detection, and
many others. Another application, titled FineRead, packs the power of ABBYY OCR technologies into a
silent “image gobbler”, lurking as an NT service program and monitoring selected catalogues for new
images. Incoming images get recognized and exported according to a sophisticated set of instructions
composed by the user. Merged together, these two applications formed the basis for Scan2IT (”Scan
to Intelligent Text”) – a complex image processing system now offered by Oce offices around the world.
About Visioneer:
Visioneer (www.visioneer.com) is a world-class developer of intelligent imaging solutions that provide
a faster and easier way to capture documents and photographs and integrate them with popular
Windows and document imaging applications.
About Oce:
Oce NV (www.oce.com) is a Netherlands-based company that manufactures and sells production
printing and copying hardware and related software. Oce N.V. has been a listed company since 1958
and is the holding company for the international Oce Group. This group has operating companies in
25 industrialized countries.
EasyData B.V. (www.easydata.nl) is a Netherlands-based company specializing in data
capture, document management solutions, and the associated consulting services.
Serving the needs of the Dutch and Belgian SMB markets, EasyData’s innovative
applications make the user experience more compelling by changing the way paperintensive organization manage their document flows.
ATAPY Software (www.atapy.com) is a provider of on-demand software solutions in
the fields of OCR/ICR, document imaging, and data capture. In addition to its main
activity, ATAPY has been taking part in various archive digitization and knowledge
preservation endeavors through offering a range of media services (scanning, data
capture, key-from-image, mark-up, verification, etc.) to libraries and data archives all
over the world.
©2011 ATAPY Software. All rights reserved.
EasyData and EasySeparate are registered trademarks of EasyData B.V.
ABBYY and ABBYY FormReader are registered trademarks of ABBYY Software House.
All the other trademarks used are the property of their respective owners.
ATAPY Software
EasyData B.V.
630090, Engineernaya Street, 4a, 522
Novosibirsk, Russia
Tel. +7 383 33 56 56 9 Fax +7 383 33 56 56 1
www.atapy.com office@atapy.com
Koninginnelaan, 16
7315 BS Apeldoorn, the Netherlands
Tel. +31 55 53 44 886
www.easydata.nl
sales@easydata.nl
sense your media
PRNet is a Media Monitoring
and Analysis company
serving over 300 corporate
clients in Turkey.
The company acts as a
strategic partner for communication specialists and
executives, who aim to develop corporate reputation
and who need to assess the
results of their communication strategies. PRNet
provides access to their
online database where customers can search among
more than 25 thousand clips
and 80 million results stored
since 2000, survey 4,500
pages of newspapers and
magazines, view videos of
74 TV channels recorded on
a 24/7 basis, and access
more than 1,000 Internet
portals.
According to ISO 500
research, 7 of the top 10
companies of Turkey, and 84
of the top 100, prefer PRNet
for serving their mediamonitoring and industrial
information needs.
©2011 ATAPY Software. All rights
reserved. ABBYY, the ABBYY logo
and ABBYY FineReader are registered trademarks of ABBYY Software
House. PRNet and the PRNet logo are
registered trademarks of PRNet.
A Networked Media
Clipping System
For more than a century, daily, systematic analysis of printed media
has been an important tool for successful businesses worldwide.
Media clipping companies, using tools suited to the last century,
provided the analysis business demanded. All those years, the
rustling of pages and jingle of scissors were the constant audio
background of media clipping companies' operations.
Arrival of the digital age healed the callused hands of operators.
Fewer and fewer scissors were used as companies switched to
scanning printed material. Paper no longer left the scanner room,
and reading was done from computer monitors.
But overall processing of newspapers and magazines still required
too much human input to automate, so the amount of labor spent
by media clipping companies remained largely the same. Early 90s
OCR programs worked for letters and faxes, but turned out to be
useless when confronted with the complex layout and font variety
of newspapers.
workload was largely shifted to unattended
computers: OCR PCs had to be rackmounted 10
units tall to fit into a single room, with one hotswitchable monitor for control.
In 1997, a Turkish media research company
named PRNet approached ABBYY Software
House, the manufacturer of FineReader OCR
products, with the request to design a system to
streamline the media clipping process. Dalian
1.0 went into operation in 1998, delivering
subscribers a service previously unheard of. As
early as nine in the morning, subscribers could
log on to PRNet's web site, click on their own
customized albums, and view a new page with
clippings from that very day's morning
newspapers. Only clippings containing this
subscriber's keywords went to his/her albums.
Content was delivered as text and pictures in
HTML format, allowing the subscriber to copy &
paste it into other software for distribution or
editing. Pictures were delivered as well.
Keywords were highlighted. All major Turkish
publications were covered (50 titles). The clippings were preserved in MS SQL Server database
for long-term storage and future reference.
All this was achieved with an average staff
presence of 14 operators - a fantastic efficiency
compared to less sophisticated systems. The
When the new version of FineReader OCR came
out, PRNet invited ABBYY to migrate Dalian to
this new platform. Pursuant to new corporate
outsourcing policies, ABBYY transferred the
project to ATAPY Software, an IT development
company specializing in custom OCR tools.
Besides migration, PRNet asked ATAPY to add
web-based administration, system statistics and
reports, a web client for extended media search,
improved output for clippings, and many other
features and enhancements.
The new Dalian 2.0 went into operation in
2003, providing media insights to about 80
clients, including the Turkish offices of Alcatel,
Compaq, Toyota, Uniliver, Vestel, CNN, Reebok,
and Siemens, as well as such local giants as
members of the Koñ Group and the leading
banks of Turkey.
The dramatic improvement in recognition rate,
the possibility to employ home-based operators
working through web interfaces, and other
serious advancements in system functionality
and manageability place Dalian 2.0 in the top
rank of modern media clipping software
solutions.
ATAPY Software
PRNet
630090, Engineernaya Street, 4a, 522
Novosibirsk, Russia
Tel. +7 383 33 56 56 9 Fax +7 383 33 56 56 1
www.atapy.com office@atapy.com
Spring Giz Plaza B Blok 17/18, Maslak 80670
Istanbul, Turkey
Tel. +90 212 328 18 09 Fax +90 212 328 18 07
www.prnet.com.tr info@prnet.com.tr
EasyData and ATAPY Software:
Continued Partnership since 2001
EasyData B.V. is a Netherlands-based company specializing in data capture, document management solutions and the associated
consulting services. Serving
the demands of the Dutch
and Belgian SMB market,
EasyData makes the user experience more compelling
by changing the way paperintensive organizations manage their document flows.
EasyData flagship data capture product, EasySeparate,
allows to reduce the cycle
times of mission critical
business transactions by
providing quick and userfriendly document capture.
www.easydata.nl
Merging together the exceptional power of ABBYY OCR
technology and the experience of ATAPY engineers,
EasyData provides Dutch companies with made-to-order
yet affordable document management software
applications
Among EasyData's customers are such institutions as Amsterdam
International Airport Schiphol, the largest Dutch hospital AMC,
Dutch Institute of War documentation NIOD, University of
Nijmegen, Technical University of Eindhoven, University of
Wageningen and others. All those, and many more, have
benefited from EasyData's commitment to innovation and
excellence. Some of the most notable projects completed with
ATAPY participation are described below.
Data Capture Solutions for
Logistics Industry
A cost-effective document management solution
for Van Gend & Loos
Van Gend & Loos, one of the largest Dutch
logistics and cargo companies, turned to
EasyData for a solution to automate its
document processing through OCR
technology. The goal was to extract certain textual and graphical
information from an incoming stream of printed forms. Having
considered the prohibitive cost of purchasing and operating the
full-scale form-processing systems, EasyData and ATAPY offered
and implemented a custom solution which works only with the
documents received by Van Gend & Loos, but does it better and
for a fraction of the cost as compared to any ready-to-use product.
EasyData and ATAPY help to deliver frozen food
to multinational customers
Frigolanda is a transport and logistics
company specializing in transporting of
refrigerated cargo with offices in Belgium, the Netherlands,
Germany, and India.
In 2003, EasyData negotiated a contract with Frigolanda under
which ATAPY designed a custom export automation module for
ABBYY FormReader 6.0. Commercial order forms were fed to a
FRIGOLANDA
cold logistics group
scanner, and images were automatically read. Then ATAPY's export module received recognized data
from FormReader and converted it into custom-format files for further analysis and processing.
In 2004, the customer returned to EasyData with a request for further evolvement of the program.
The input documents were 2 pages double-sided; sometimes due to scanning mistakes (double feeds,
face-down feeds) the data was getting mixed up. The new version of the program watched the page
order and corrected it when possible, or warned the user of a non-recoverable situation.
Both parts of the project were implemented quickly and to the full satisfaction of the customer. The
program is currently working on the customer's site backing up an important part of its business
process.
Customizing ABBYY FineReader for processing construction
shipping forms in Belgium
The Fernand Georges company (www.georges.be), construction and industrial
equipment dealer, was looking for means to automate its document flow processing.
The task was to detect specific spots on shipping forms, then read the data at these
spots and export it for further processing. In addition, the forms had to be re-grouped
into multi-page TIFF files, each file representing one form with its attachments.
EasyData and ATAPY proposed ABBYY FineReader 6.0 Engine as the OCR basis. This
F E R N A N D
choice was additionally justified by the fact that the forms were being prepared with
a matrix printer, for which FineReader provides the special recognition mode to
increase OCR quality. The solution was successfully implemented and is now participating in heavyduty production at Fernand Georges.
About Van Gend & Loos:
Van Gend & Loos was a Dutch distribution company. It was established in 1809 by the Antwerp-based
innkeeper and carriage driver J.B. van Gend. It was sold to Deutsche Post in 1999. The three daughter
companies of Deutsche Post (Danzas, DHL Worldwide Express and Van Gend & Loos) were merged
to form DHL in 2003, ending the almost 200-year history of Van Gend & Loos.
About FRIGOLANDA:
FRIGOLANDA (www.frigolanda.com) is a cold logistics group with a storage capacity of more than
75,000 pallet places - 65,000 deep frozen and 10,000 chilled - in Europe. The company possesses
offices in the Netherlands, Germany, Belgium and Poland, and a high quality distribution and
transport fleet of 65 lorries. FRIGOLANDA drivers deliver to some 1,000 addresses throughout the
Benelux, Germany and a growing number of addresses in France, Italy, Great Britain, Switzerland,
Austria and Scandinavia every day.
©2011 ATAPY Software. All rights reserved.
EasyData and EasySeparate are registered trademarks of EasyData B.V.
ABBYY, ABBYY FormReader and ABBYY FineReader Engine are registered trademarks of ABBYY Software House.
All the other trademarks used are the property of their respective owners.
ATAPY Software
EasyData B.V.
630090, Engineernaya Street, 4a, 522
Novosibirsk, Russia
Tel. +7 383 33 56 56 9 Fax +7 383 33 56 56 1
www.atapy.com office@atapy.com
Koninginnelaan, 16
7315 BS Apeldoorn, the Netherlands
Tel. +31 55 53 44 886
www.easydata.nl
sales@easydata.nl
Automated Document Processing Tool
for PPS PrePress Systeme GmbH
PPS PrePress Systeme GmbH is a digital paper solutions and media service provider headquartered in Germany.
The company specializes in converting paper archives into digital form, allowing full-text information retrieval
over decades of newspaper issues and other digitized data. PPS delivers high quality images and recognized text
of newspaper pages, which they receive in paper form or in microfilm. For knowledge retrieval, PPS offers flexible
and powerful tools from interface projects.
PPS contacted ATAPY Software for creation of a custom document processing tool. The goal was to automate
four main tasks:
accept and route images for subsequent OCR
6
use ABBYY FineReader Scripting Edition to recognize the images
6
export recognition results to a variety of document formats
6
save resulting documents in user-defined output directories
The tool designed by ATAPY detected scanned documents in the
user-defined input directories, sent them to working directories,
and submitted them to ABBYY FineReader for recognition.
Recognized documents were stored in the output directories while
problematic images went to the special "error directories". An
important feature of the application was the capability of exporting
each recognized document into several formats: a user was getting
multiple documents as a result of a single processing phase.
CSV
DOC
RTF
HTM
HTML
DBF
PDF
XLS
TXT
High flexibility and configurability were the key points of the solution designed and implemented by ATAPY. For
each output format, the application allowed a user to set its individual parameters, such as page size for RTF/DOC,
picture resolution for PDF, codepage for HTML, etc. State-of-the-art OCR technology by ABBYY Software House
combined with ATAPY's engineering expertise allowed this application to come out fast, highly usable, and costeffective.
PPS PrePress Systeme GmbH (www.prepress-systeme.de) provides the Publishing
industry with state-of-the-art software solutions since 1992, and offers the services of
newspaper archives digitization since 1999. PPS PrePress Systeme GmbH offers a line
of innovative search solutions, including an enterprise-class intelligent search system «inter: gator» and
a semantic search engine «PPS Finder».
©2011 ATAPY Software. All rights reserved.
ABBYY, ABBYY FineReader and ABBYY FineReader
Scripting Edition are registered trademarks of ABBYY Software House.
All the other trademarks are the property of their respective owners.
ATAPY Software
PPS PrePress Systeme GmbH
630090, Engineernaya Street, 4a, 522
Novosibirsk, Russia
Tel. +7 383 33 56 56 9 Fax +7 383 33 56 56 1
www.atapy.com office@atapy.com
Hohemarkstrasse 20
D-61440 Oberursel, Germany
Tel. +49 6171 7085725
www.prepress-systeme.de pps@prepress-systeme.de
Auto-Import Station for DM Dokumenten
Management GmbH
DM Document Management GmbH is a Germanybased provider of end-to-end document management
solutions.
One of the company`s clients
is Wustenrot-Gruppe - an
Austrian company providing
financial and real-estate
services. Its main two subsidiaries are the Wustenrot
construction and the Wustenrot insurance companies.
For this client DM developed and installed a mass
document capture system. This system was based on
ABBYY FormReader 6.0 Enterprise Edition - a
distributed data capture product comprising a number
of computer “stations” performing different tasks.
Some stations are responsible for scanning, others for
OCR, verification, data validation, system administration, and release.
Therefore, the images were captured using the
software bundled with the scanners, and stored as
multi-page TIFF files. After that they needed to be
automatically imported.
Although ABBYY FormReader 6.0 Enterprise Edition
Scanning Station has the “import-from-folder”
feature, its functionality did not satisfy the system’s
requirements.
In search of a solution, DM contacted ATAPY Software.
Thanks to its profound knowledge of ABBYY products,
ATAPY was able to write a new component for the
system named Auto-Import Station (AIS). AIS replaced
the Scanning Station and, as far as the rest of the
system is concerned, imitated its behavior.
The solution was successfully installed at Wustenrot
and demonstrated impressive performance.
As its input front-end, the system used industrial
scanners that weren’t directly compatible with ABBYY
FormReader 6.0 Enterprise Edition Scanning Station.
ÒÌ
DM Dokumenten Management GmbH (www.dokumenten-management.de) develops
efficient solutions for document management and revision-safe archiving for over 15 years.
DM Dokumenten Management GmbH designed a completely new product generation lobo dms which meets the latest requirements of the Document Management industry. DM
customers include Aventis AG, Deutsche Post, BMW AG, Deutsche Borse, Linde AG.
©2011 ATAPY Software. All rights reserved.
ABBYY and ABBYY FineReader are registered trademarks of ABBYY Software House.
All the other trademarks are the property of their respective owners.
ATAPY Software
DM Dokumenten Management GmbH
630090, Engineernaya Street, 4a, 522
Novosibirsk, Russia
Tel. +7 383 33 56 56 9 Fax +7 383 33 56 56 1
www.atapy.com office@atapy.com
Dornierstrasse 4 82178 Puchheim, Germany
Tel. +49 89 800 613 0
Fax +49 89 800 613 99
www.dokumenten-management.de
info@dokumenten-management.de
ATAPY installs ABBYY FormReader at Inmarko, Inc.
Novosibirsk-based Inmarko is
the largest ice-cream manufacturer in Russia with an annual
production volume of nearly
31,000 tonnes (2003). Today
Inmarko, Inc. employs several
thousand people and sells icecream across the entire country, from street kiosks to
international retail chains like Metro and Auchan.
Inmarko had developed a complicated procedure for
processing product requests from retailers. Request
forms were distributed to supply agents who filled in
the numbers coming from each retail point and
submitted them to the Data entry Department.
Employees of the Department manually entered the
forms into “1C:Enterprise”™ ERP system, from which
purchase orders, delivery truck routes, and other
documents were automatically generated and passed
to the Logistics department.
This procedure had only one bottleneck. The formkeying operators could finish their work only by late
night, which delayed and obstructed further business
processes. The reason was the tremendous volume of
information, as each form contained more than 1,000
fields filled with handprinted text. Another serious
problem was the entry mistakes. Single typos were
irritating, but much worse were situations when
operators skipped or duplicated table columns, or
even the entire forms. As a result of such mistakes,
some retailers received twice the quantity of each ice
cream sort that they actually ordered, while others
received nothing.
In search of the solution, Inmarko's IT Department
discovered FormReader by ABBYY Software House, a
product for automatic input of data from printed and
handprinted forms. As FormReader is a “box” product, no costly integration was needed; most of the
installation and tune-up work was done in-house,
with minimum intervention from ATAPY Software, the
local ABBYY dealer. FormReader required no changes
in application form design and no special staff training, therefore the costs of printing, distribution and
collection of the forms remained the same. Just as
realistic were FormReader's hardware require-ments: a
regular flatbed scanner and a common office
computer.
Once FormReader went into operation, work that
previously required three typists now required just one
operator, and even with that reduction, the form entry
process was completed much earlier. Input
productivity increased 6 times, and the entire
distribution logistics of the company got improved
considerably. Especially important was the fact that
the number of mistakes decreased very significantly,
the most “disastrous” mistakes going away
completely. This successful experience has moved
Inmarko to use FormReader for capturing other types
of corporate documents. This new challenge has
required no additional investment at all, as
FormReader can be configured for processing up to 99
document types in one batch, automatically telling
one document type from another.
Now the same operator, using the same hardware and software, processes Inmarko's documents of different
types. A similar system was installed at Inmarko's plant in another city, and more installations are underway.
Besides Inmarko, the efficiency and stability of ABBYY FormReader is acknowledged by hundreds of users around
the world, including the Federal Tax Service of Russia (Personal Income Statements, Taxpayer Identification
Number application, and other tax forms), the Russian Ministry of Education (examination papers in the
centralized all-Russian students' testing program), the Russian State Pension Fund (insurance application forms
and premium reports), the Ministry of Rural Development of Malaysia (agriculture statistic reports), Adidas
(retailer order sheets and questionnaires), Phillip Morris (sweep-stakes entry forms), Finansbank, Turkey (credit
card application forms), Target Media/UK (checkbox questionnaires), Allianz Poistovna, Poland (vehicle insurance
applications).
FormReader contains an Application Program Interface for interaction with other applications. This makes it
possible to use FormReader not only as a standalone application, but to create entire production lines for mass
input of documents. The three largest lines are installed at the Moscow Tax Inspectorate. In each of them 1
computer is responsible for scanning, up to 10 computers provide automatic OCR, and other 10 computers allow
operators to proofread recognition results and correct mistakes. Each line can process up to 3,500 pages of
handprinted forms per hour.
Scanning
stations
Recognition
stations
Verification
stations
Main database
Filled Forms
The ASYS Softwareenwicklung GmbH company (Germany) integrated ABBYY FormReader into SMARTscan,
their workflow automation solution for pharmacies. With ABBYY FormReader, all the work associated with input
and processing of handprinted medicine prescriptions is reduced to several clicks and can practically be done by a
pharmacy salesperson while talking to the client.
ATAPY Software, a strategic partner of ABBYY Software House, specializes in programming tools and add-ons for
ABBYY products, including FormReader. ATAPY employs ABBYY-trained experts in the cutting-edge FlexiForm
technology for processing document types traditionally considered “non-automatable”, such as phone bills,
invoices, job applicant resumes, library cards and many more. ATAPY can also integrate FormReader into any
Enterprise Document Management System for the benefit of prospective customers and partners.
Inmarko, Inc. (www.inmarko.ru/en) is the Number One company in the Russian ice cream market for
its output and sales volume. It has a domestic market share of over 16%. Established in 1993, Inmarko
has its own factories and cold storage facilities in Omsk, Tula, and Novosibirsk. Today it employs a
staff of over 5,000.
©2011 ATAPY Software. All rights reserved.
ABBYY and ABBYY FormReader are registered trademarks of ABBYY Software House.
All the other trademarks are the property of their respective owners.
ATAPY Software
Inmarko, Inc.
630090, Engineernaya Street, 4a, 522
Novosibirsk, Russia
Tel. +7 383 33 56 56 9 Fax +7 383 33 56 56 1
www.atapy.com office@atapy.com
630050 Elitnoe
Novosibirsk region, Russia
Tel. +7 3832 599 799
Fax +7 3832 48 13 12
www.inmarko.ru inm@rko.ru
EasyData and ATAPY: Document
Imaging Tools
Since 2001, EasyData B.V. (www.easydata.nl) has been partnering with
ATAPY Software to provide companies in Western Europe with highquality data input solutions based on the OCR/ICR technology of their
mutual partner ABBYY Software House. As a subcontractor to EasyData,
ATAPY has competed more than 40 projects, each one solving a specific
task not answered by off-the-shelf data capture products available on the
market.
"Russia's history is a history
of great thinkers. Look at
the chess players for
example. ATAPY employs
highly educated engineers
gathered together in an
efficient company. The result is quality tailor-made
software at a competitive
price.
This partnership has always
been a source of innovative
solutions for us and our
customers. ATAPY has also
taken part in the development of our flagship
product EasySeparate by
implementing the Visioneer
OneTouch® Link scanner integration component.”
Robert Camerlink
CEO
EasyData B.V.
Document Processing Solutions
for Education Industry
A forms processing solution for MEMIC
When MEMIC, the Center for Data and Information Management of the University of Maastricht, faced a huge number of questionnaire images to be picked up page-by-page
from different catalogues, merged into multi-page TIFF documents and indexed, it relied on EasyData to deliver the right
tool for the job. And just as before, EasyData knew the right
people for it. The application developed by ATAPY has now been
operating flawlessly at MEMIC for several years and has already processed
tens of thousands of forms up to the customer's expectations.
EasyData and ATAPY streamline the document flow for
Wageningen University
ATAPY's experience with barcode
recognition came in useful when
EasyData received a request from
one of its clients for a program that generates PDF files from a flow of
scanned document pages.
Wageningen University used a variety of scanners to provide convenient
access to academic materials. ATAPY designed a program that launched
scanning process through its own interface, stored the images in a specific
folder, and combined one-page images into multi-page PDF files. The
program employed ABBYY FineReader 6.0 Engine to find barcodes and
used the barcoded page as the cover page of the PDF. Then it appended
the subsequent images as pages to the PDF until it encountered the next
barcoded page. The program provided an efficient GUI for selecting a
scanner, specifying the output resolution, setting other important
parameters, and controlling the process.
The competency of ATAPY engineers and the power of ABBYY
FineReader resulted in a reliable and stable application for converting
large volumes of incoming material.
EasyData and ATAPY Deliver an Image Cleaning
Solution for Legal Industry
"Raad voor Rechtsbijstand" (www.rvr.org), a Dutch legal agency, faced
an unexpected problem while attempting to convert a large archive of
various legal records to digital format.
Many documents were printed on intensively colored paper. The same background that made the paper
documents look nice and distinct, on black-and-white scans turned up as heavy jitter, inflating the file sizes,
making them difficult to read and impossible to OCR.
The agency sought professional advice from EasyData B.V. Having analyzed the issue, EasyData concluded that no
ready-made solution, such as despeckling facilities of the modern OCR packages, was able to produce acceptable
results, as the jitter was significantly heavier than what those tools were capable of removing. EasyData
outsourced the problem for a more thorough research to ATAPY Software. ATAPY engineers designed and
implemented a custom algorithm that approached the task more intelligently, taking into account not only the
linear characteristics of each dot cluster but also its context (characteristics of the neighboring clusters). The result
was a tool which produced nearly-clean images that took up to 10 times less disk space:
After (OCR rate = 98.9%)
Before (OCR rate = 1.9%)
EasyData and ATAPY contribute to the IT program
of the Dutch Government
The Dutch Ministry of Justice carried out the assessment program for the living conditions at the
correctional institutions in the Netherlands. For this purpose, it distributed and collected large
volumes of multi-page questionnaires, reading them automatically with ABBYY FormReader. As
the proprietary format of the statistics-and-reporting system did not allow direct export, the
Ministry turned to ABBYY's most experienced integrator in the country for a solution.
EasyData, together with ATAPY, designed a custom export module accessible from ABBYY FormReader toolbar.
The module saved recognition results into a special intermediate file format importable into the target system.
About Wageningen University:
Wageningen University and Research Centre (www.wur.nl) is a research and higher education
institution which trains specialists (BSc, MSc and PhD) in life sciences.
About the Dutch Ministry of Justice:
The Dutch Ministry of Justice (english.justitie.nl) sees its mission of maintaining order in Dutch society,
while ensuring that justice, safety and unity come first. The Ministry employs almost 30,000 civil
servants; its main office is located in The Hague.
©2011 ATAPY Software. All rights reserved.
EasyData and EasySeparate are registered trademarks of EasyData B.V.
ABBYY, ABBYY FormReader, ABBYY FineReader Engine are registered trademarks of ABBYY Software House.
All the other trademarks used are the property of their respective owners.
ATAPY Software
EasyData B.V.
630090, Engineernaya Street, 4a, 522
Novosibirsk, Russia
Tel. +7 383 33 56 56 9 Fax +7 383 33 56 56 1
www.atapy.com office@atapy.com
Koninginnelaan, 16
7315 BS Apeldoorn, the Netherlands
Tel. +31 55 53 44 886
www.easydata.nl sales@easydata.nl
ATAPY Software Participates in
Development of International
Computer Dictionaries
“ATAPY reached 99.992%
text accuracy in the German-Russian Dictionary (1
mistake per 8,760 symbols),
and 99.997% quality for the
Spanish-Russian Dictionary
project (1 mistake per
31,500 symbols). They also
corrected many mistakes in
the source dictionary text,
including typographical
misprints and even mistakes
in special marks that are
almost impossible to detect
without special programming tools and profound
knowledge of linguistics.”
Anna Zhavoronkova
Project Manager,
ABBYY Software House
Electronic dictionaries and translation systems are an area of great
practical importance in the ever-globalizing world. ABBYY Software
House, a world leader in OCR/ICR and linguistic technologies, develops
and sells Lingvo electronic dictionaries. For many years Lingvo has been
known as the best English-Russian dictionary on the market. In version
8.0, ABBYY planned to add 3 more languages; to introduce those to
Lingvo, it was required to digitize world's latest best-of-breed dictionaries
reflecting the modern state of the new languages to be supported.
The ABBYY Lingvo 8.0 product line
includes ABBYY Lingvo 8.0 Multilingual Edition, ABBYY Lingvo 8.0
for Pocket PC, as well as an updated and expanded version of
ABBYY Lingvo English-Russian Edition.
5
ABBYY Lingvo 8.0 Multilingual Edition supports eight translation
directions: English-Russian, German-Russian, French-Russian,
Italian-Russian, Russian-English, Russian-German, RussianFrench, and Russian-Italian. This Edition of ABBYY Lingvo
includes more than 40 dictionaries containing more than
2,400,000 entries.
ABBYY turned to ATAPY Software, its outsourcing partner in Novosibirsk,
for digital conversion of two dictionaries from the list picked out by the
Linguistics Department. The 3-volume 1750-page Leping GermanRussian Dictionary and the 830-page Narumov Spanish-Russian Dictionary were to be recognized and proofread for subsequent automatic
conversion into the ABBYY Lingvo database.
Highest possible text recognition accuracy was obviously a must. A single
mistake could break the words' alphabetical order and tear the word
away from its paradigm. If the number of such mistakes were above even
a very modest threshold, the dictionary would have become unsearchable.
Adequate interpretation of special dictionary marks was no less vital for
the project. They were used as field delimiters in the automatic database
conversion process and had to be recognized 100% accurately. Special
marks appeared either as text characteristics (bold/italics), or as special
symbols (brackets, asterisks), or as a combination of the two (e.g., italics +
brackets indicated a dictionary comment). Omitting a single bracket or
missing italization would break the article's structure. This is why the
project required both intelligent programming and highly qualified
manual effort - a true challenge for any contractor in the media service
area.
The dictionaries were scanned and automatically
recognized with ABBYY FineReader specially tuned-up
for processing this material. Then a team of qualified
operators proofread and cross-checked the results
using the Double verification technique to ensure
recognition accuracy. Double verification allowed to
detect certain unexpected cases, such as typos in the
source dictionary text, which have been corrected
according to the ABBYY's guidelines. In its effort to
automate the proofreading work to the maximum of
possible extent, ATAPY developed and customized a
number of in-house utilities. One of them was
Glyphica, a tool for quick input of characters that
cannot be found on the keyboard. For Leping
Dictionary ATAPY developed a custom converter with
built-in spellchecking and punctuation checking
utilities which allowed to weed out mistakes
unspotted during the previous stages and finally
convert the material into the Lingvo vocabulary
database.
ABBYY Software House (www.abbyy.com) is based in Moscow, Russia. The
company was founded in 1989. Today ABBYY has over 880 employees worldwide,
including offices in Russia, USA, Ukraine, UK, Germany, Taiwan, Japan and
Cyprus. ABBYY develops software products in the fields of artificial intelligence, document recognition,
data capture and applied linguistics. ABBYY is most notable for their optical character recognition
package ABBYY FineReader.
©2011 ATAPY Software. All rights reserved.
AutoStore is a registered trademark of Notable Solutions, Inc.
All the other trademarks are the property of their respective owners.
ATAPY Software
ABBYY Software House
630090, Engineernaya Street, 4a, 522
Novosibirsk, Russia
Tel. +7 383 33 56 56 9 Fax +7 383 33 56 56 1
www.atapy.com office@atapy.com
P.O. Box #20, Moscow,
Russia, 127273
Tel. +7 (495) 783 3700
Fax +7 095 783 2663
www.abbyy.com office@abbyy.com
Meeting the Challenge of Time
ATAPY Software participates in the development of ABBYY
FineReader XIX - an OCR system for reading old European books
“I've got FineReader 7.0 installed here on my computer.
The Frakturschrift recognition is very good. Even
though old text recognition
is not a large and growing
market, I am sure all the
service bureaus here in
Germany will be ordering 1
or 2 copies and have it run
7x24”
Johannes Stöpetie
CEO,
ABBYY Europe GmbH
Meta-E (http://meta-e.uibk.ac.at) is a collaborative initiative undertaken
by a consortium of 14 universities from 7 European countries and the US,
co-funded by the European Union. The project is focused on providing
technology basis for digitization and web-publishing of valuable old
printed sources spanning several centuries of European history. For this
purpose, an OCR system was required, capable of recognizing historical
texts for the period 1800-1938, including those printed with
Frakturschrift (an old-styled black-letter typeface prevalent at that time).
At that point no omnifont-Frakturschrift systems were available: all OCR
products had to be trained on each individual book before processing it.
Meta-E coordinators started looking for a high quality OCR package to be
augmented according to their requirements. ABBYY FineReader was
chosen due to its unrivalled recognition accuracy, support for 176 modern
languages, and user-friendliness. ABBYY Software House, the international manufacturer of FineReader product line, took up the project as a
direct contractor to carry out the development of the omnifont part
(introducing the Frakturschrift graphics to FineReader). The linguistic part
of the project was subcontracted to ATAPY Software, ABBYY's long-term
partner in OCR and computer linguistics development.
Based on FineReader 7.0
ATAPY's role in the Meta-E project was constructing Old Language Models (Lms) for 5 European languages: English, French, German, Italian,
and Spanish. LM is a computer database that describes the vocabulary of
a language. FineReader uses LMs during recognition for building OCR
hypotheses and spellchecking. LMs are not just full lists of words in all
possible grammar forms: such a database would be enormous in size and
hardly manageable. FineReader LMs store only stems of each word, and
describe the grammar as a set of flexing rules (paradigms). Each stem is
assigned a list of paradigms; applying them to the stem produces all
possible forms of the word. ATAPY was to study a large amount of
authentic dictionaries and original old European texts dating back to the
targeted time span, review the word stock, add the words that got phased
out of the languages, and correct the paradigm assignments to
synchronize the LMs with the actual grammatical practice used at that
time.
To complete this task, ATAPY's linguists carefully selected 10 dictionaries
reflecting the state of the 5 languages, published between 1808 and
1930. ATAPY had also thoroughly analyzed 105 authentic books of that
period, comprising more than 50 MB of text. The next step was to build
FineReader LMs. ATAPY's linguists manually compared the information
from authentic dictionaries and texts - about 500,000 entries in total - to
the existing FineReader vocabularies. This work turned up a total of
458,767 words, from which 61% remained unchanged, and 36% were
added to the vocabularies from the analyzed sources. About 3% of the
words had their paradigms corrected towards the XVIII-early XX century
grammar rules. To carry out such correction, the linguists had to add 159
historic grammar paradigms that were missing in the contemporary
models.
Finally, the LMs were compiled and tested on the
control text corpus. They manifested 98.91% vocabulary coverage for Old English, 99.16% for Old French,
96.58% for Old German, 98.58% for Old Italian, and
98.79% for Old Spanish languages.
To illustrate the above, let’s look at a few samples. A
regular FineReader package, or any other contemporary OCR system, will make a lot of mistakes here.
For example, “Alterthumskunde” may become “Allerlhumskunde“ on the first fragment; on the second
fragment, “UEBERSICHT” (“Ubersicht” in modern
German) gets recognized as two words “UEBER
SICHT”, etc. These mistakes occur because of two
factors. The first is the low printing quality, but there is
nothing that can be done about it. The second is the
old spelling used in those incorrectly-recognized
words. All existing OCR systems are targeted at modern texts and therefore only know modern spelling.
Once the five LMs were merged into FineReader shell,
ABBYY was able to offer a special version of
FineReader which knows the spelling specifics of old
European languages. This version has a much lesser
chance of making mistakes in places similar to those
shown above. In effect, users will be able to OCR old
texts with higher quality, saving much of the time
which previously had to be spent on error correction.
The special version of ABBYY FineReader, officially released by ABBYY under the name ABBYY FineReader
XIX (http://www.frakturschrift.com), became a powerful tool assisting Meta-E consortuim in its largescale digitization work. The product is the industry's
first box OCR product to recognize Renaissance and
Late Medieval sources, a product specially targeted at
European libraries and public organizations engaged
in preservation and publishing of cultural assets, and
at service bureaus helping them fulfill this mission.
ABBYY Europe GmbH is a European department of ABBYY Software House based
in Munich, Germany. ABBYY Software House is the manufacturer of software
products in the fields of artificial intelligence, document recognition and applied
linguistics. One of the most notable products by ABBYY Software House is the optical character
recognition package ABBYY FineReader.
©2011 ATAPY Software. All rights reserved.
ABBYY, ABBYY FineReader and FineReader XIX are registered trademarks of ABBYY Software House.
All the other trademarks are the property of their respective owners.
ATAPY Software
ABBYY Europe Software House
630090, Engineernaya Street, 4a, 522
Novosibirsk, Russia
Tel. +7 383 33 56 56 9 Fax +7 383 33 56 56 1
www.atapy.com office@atapy.com
80687 Munich, Germany
Elsenheimerstrasse 49
Tel. +49 89 511 159 0
Fax +49 89 511 159 59
www.abbyyeu.com info@abbyyeu.com
International media analysis company serves its clients
using a suite of PDF tools by ATAPY Software
Presse+, a large media monitoring and research company (500 clients among 800
top French businesses and administrations, over 2,000,000 users daily) dealt with
incoming printed media sources, audio and video materials, and digital media to
produce synthetic summary documents. To secure its annual growth of 25% and to
be able to process over 7,000 media sources 24 / 7, Presse+ had to constantly adopt innovative technologies and
perfect its business procedures. This explains why the company had been heavily investing in the electronic
support of its production chain.
The summary documents were delivered in PDF format. Accuracy, efficiency, and speed of PDF generation were
crucial for the company's success. However, the software packages available on the market failed to meet the
particular needs of Presse+. This is why Presse+ turned to ATAPY Software in search of a solution. ATAPY’s
experience with OCR and image processing allowed Presse+ to fill the gaps left by off-the-shelf products. ATAPY
developed a suite of customized, reliable, and fast tools built around ABBYY FineReader OCR technology.
The highlights included:
batch processing for quick and convenient conversion
6
integration with the existing technology
6
reduced software licensing and maintenance costs; minimal training
required
6
non-stop processing of input images in multiple formats containing
graphics and text, in many languages
6
export of recognized data to PDF with user-defined keywords highlighted
6
recognition of “image only” PDF as a set of images
6
decreasing the sizes of the PDF files by compressing the illustrations stored
inside
The result was an effective customized solution that eliminated the need for
expensive generic products. Through innovative approach, ATAPY engineers
built a robust and scalable system which met the high quality standards of
Presse+.
Presse+ is a leading French provider of Media Monitoring (Press, Broadcast, Internet, News Wires, etc.)
and International Press Analysis Services (covering major European, US, and Asian publications). In
2005 the company was acquired by TNS Media Intelligence (www.tns-mi.com), a company of the TNS
Media Group. The acquisition became a considerable boost to TNS’s news monitoring service in
France.
©2011 ATAPY Software. All rights reserved.
ABBYY and ABBYY FineReader are registered trademarks of ABBYY Software House.
All the other trademarks are the property of their respective owners.
ATAPY Software
630090, Engineernaya Street, 4a, 522
Novosibirsk, Russia
Tel. +7 383 33 56 56 9 Fax +7 383 33 56 56 1
www.atapy.com office@atapy.com

Similar documents

ATAPY Software: Participation in the Development of FineReader XIX

ATAPY Software: Participation in the Development of FineReader XIX Models (LM) for 5 European languages: English, French, German, Italian, and Spanish. LM is a computer database that describes the vocabulary of a language. FineReader uses LMs during recognition to...

More information

Media Service Profile

Media Service Profile All the other trademarks used are the property of their respective owners.

More information