Untitled

Transcription

Untitled
Challenges and Experiences in the Mass Digitization
of Manuscripts and Rare Books
at the Bavarian State Library"
Markus Brantl\ and lrmhild Schafer 2
Direc/or 0/ the Munich Digitizotioll emltr / Digital Library, BtII.!Il17·(}1I Slale LjbrmJi
ufdu:igslr. 16, 80539 Munich, GernJa/!)1 markus,brallt@bsb-lIllfel1c!Jen.de
1 Director 0/ tbe I"slitute of Book and MtJlIHScrip' COllIertiotion, 13(/l'(uiall Stale Libra')l
LJldJ/!igsfl: 16, 80539 Municb, German)) irJJJbi/d..rciJaifer@bsb-IlJllellcbell.de
I
Keywords: Mass digitization, cultural heritage, manuscripts, tare books,
incunabula, preventive conservation, books materiality, work flows, automated
book scanner, manually-operated book scanner, scanning throughput. digital
long-term preservation, digital collections, workflow, digital aSSCl management,
mobile Web, public-private parrncrship.
Abs tract: T he Bavarian State Library is one of the largest European universal
libraries. T he library's uni'1ue collection profile is characte rized by extremely
precious manuscripts, rate printed books and comprehensive special collections
from dlOusands of years of cultural heritage. The internationally renowned
manuscript, incunabula and rare book collections encompass more than 200,000
titles up to end of 16th century. As an international resea rch lib rary it offers
services to scholars and students worldwide. In this respect the World Wide Web
is the natural medium for the Bavarian State Library, since it allows 24/7-access
to infor mation for everybody around the world. It is dlerefore a primary strategic
objective of the library to digitize as soon as possible its unigue holdings and to
make them accessibly free of charge on the \Y/\Y/\Y/..
This article provides an overview of the technical and organisational
challenges and experiences encountered in the mass digitization of manuscripts
• This articlc is a modified \'crsion of t-.-I. Brant] and I. Schafer, "i\hsscndigitalisierung vo n
Handschriftcn und Altcn Drucken, Hcrausfordcrungen und Erfahrun&.-en an der Baycrischen
StaatsbibHothek", in Ei/e(lf/opoiin. SpnpoliunJ 011 Digilal/!!Iagillg 0/ AntitlllTtxlfla/ I-Itlifttgt: TfChIiO/ogiclI/
Chll/ltligu and JO/lfh·O/IS, 28-29 Orlokr, 2010, I-it/lilllei, f-i ·nkllld, Helsinki 2010, 175- 197.
224
M. Brand, I. Schafer
and rare books at the Bavarian Srate Library, J\'lunich , Germany. Section I
introduces the Bavarian State Library and its I.nscitutc of Book and !\4anuscript
Conservation and 1\!lunich Digitization Ceorer. Section 2 discusses the
digitization strategies and mass digitization projects. Section 3 is concerned with
the digitization process of manuscripts and rare books with the Central Digital
Asset Nlanagemcnt System. This is followed by a case study of the bibliography
of books printed in German-speaking countries in the 16th century, Projects
VD16 /1 and VD16 /2, in Section 4. Finally, Section 5 offers conclusions, with
the vision and plans for future digitization activities at the Bavarian State Library.
Introduction
T hough inconceivable until the widespread use of the internet in the mid 1990's,
it is now technologically possible to scan the collections of entire libraries,
archives, and museums to provide them in digital form on the internet. In
September 201 0, the website of the Bavarian State Library (BSB) already offered
400,000 copyright-free medieval manuscripts, rare books, and books from the
6th to the 20 th century (about 267 terabytes of data in 537 million files as of Dec.
2010). Digitally providing this internationally significant collection for worldwide
access is o ne of the main tasks of the Munich DigitiZation Center (MDZ), in
collaboration with the Institute of Book and lVlanuscript Conservation (1BR),
both departments of the Bavarian State Library. The IBR ensures safe handling
of the items, and the MDZ cmploys and develops innovative tcchnology toward
creating the "digital library" (\Xlaters 1998).
The Bavarian State Library, digitization a nd preventive con servation
As a treasure trove of cultural heritage, a multimedia information service provider
for scholarly research, and an innovative force in the field of digital services,
the Bavarian Scate Library is one of the primary national and international
resources for reseatchers, students and all those seeking information. In 1558,
Duke AJbrecht V of Bavaria purchased twO important private libraries, which
constitute dle foundation of the Bavarian State Library. Today, the central srate
library and repository lib rary of the Free State of Bavaria owns close to 10
miUion volumes and 55,000 current periodicals in hardcopy o r electronic format.
With its unique collections of 93,000 manuscripts, 20,000 incunabula from
Challenges and Experiences in the J\hss Digitization ...
225
1450- 1500, and 900,000 rare books from 1500-1850, the institution is one of
the world's leading libraries. Together with the State Library in Berlin and the
German National Library in Frankfurt and Leipzig, the Bavarian State Library
constitutes Ger many's virtual national library (Griebel and Ceynowa 2008).
The Munich DigitiZa tion Center (MDZ) was established in 1997, in
the early stages of internet use in Germany, as one of twO national centers
of excellence. The MDZ develops, evaluates and commissions new scanning
technologies and production processes in order to create a "digital library". 1n
close collaboration with universities and other research institutions, the J\{D Z
has successfuJJy developed and implemented more than twO hundred innovative
projects: From the retrodigitization (i.e. the creation of digital copies of
handwritten or printed copyright-free documents) of full texts or images, as
well as audio, and 3-dimensional (3-0 ) objects, in order to implement electronic
publishing and virtual portals, as well as digital long-term preservation of digitial
objects. The MDZ has also developed new processes and methods for the creation,
management, storage, preservation and delivery of digital objects. Furthermore,
the MDZ functions as a service provider, including providing support for
digitization and development work for the open-source community in the digital
libra f)'.
The Scanning Center of the MDZ has gained significant experience and
expertise with its extensive scanning of diverse collections of objects spanning
over founeen centuries. Digitization of valuable objects is always carried out
in close collaboration with the lnstitute of Book and Manuscript Conservation
(IBR) as an important partner. The IBR was established in 1963 for the means
of preser ving the unique cultural heritage of the Bavarian State Library. In
addition to conservation and preventive conservation projects, the IBR trains
conservators in Bachelor's and Master's degree programs in cooperation with
the Technische UniversitatJ\'lunchen (fUM). Since the early days of digitization
at the Bavarian State Library, the I BR and the MDZ have worked closely together
to inco rporate aU necessary conservation reclui rements into the digitization
processes. In particular, this collaboration includes the foll owing contributions
of the IBR:
- Advice on the purchase of scanners avai.lable on the market
- Advice on the development of new scanning devices in collaboration
wi th manufacturers
- Assessing the condition of each item to be scanned as part of the project
226
M. Brant!, 1. Schafer
preparation phase
- Training the scan operatOrs in carefully handling of the objects
- Providing functional devices to support the scanning process (e.g., the so
called "Munich finger", sec Figure 5)
- Hands-on support for the scanning of sensi tive and high value books.
Digitization strategy and mass digitiz3tion projects
Ooe of the traditional functions of the Bavarian State Library is to maintain
and provide public access to d,C written cultural heritage that has been collected
over the course of centuries. Thus, the strategic goal in the internet age is to
provide the entire collection for quick, independent and ubiquitous access free of charge - to copyright-free library holdings. 1n order to achieve this,
about 1.2 million objects have to be scanned and uploaded OntO the l'"fD Z web
servers. This ambitious project is currently funded by the OFG, the Free State
of Bavaria, and the European Union, as weU as a public-private partnership with
Google and book publishers since Z007.
Funding resources for this digitization project are allocated to support
the digitization of objects from three time periods: the 6 th _16th centur}', 17'h_
19 th century, and ZO'h_Z1 " century. Specifically, the three respective funding
collaborations are as follows:
- German Research Foundation (DPG), Free State of Bavaria, and
European Union: Manuscripts and rare books and special collections
from the 6,h to the 16,h centu ry
- Public-private partnership with Google Book Search: Items from about
the l7'h to the 19 th century
- OFG and publishers: Special programs for the 20,h and rhe ZI ,h centuries
Until 2005, the selection of digitization projects was exclusively focused
on the Bavarian State Library's profile as an international research center, as well
as its archiving function as tlle Bavarian central state library. Initially, "bou tique
digitization" proiects predominated, i.e. the digitization of "masterpieces" as
weU as of smaller collections (l\'1ilne Z008), such as:
- T he OPG Special Subject Collections of the Bavarian State Library, in
particular on History, Eastern Europe and Musicology,
- Resources for the central artS and humanities on the portal on tlle history
Challenges and Experiences in the r+. lass Digirization ...
227
and culture of Bavaria "Ba),erische Landesbibliothek Online" (fiLO Bavarian Regional Library Online)
- Manuscripts and rare books due to the Bavarian State Ijbrary's status as
a research library of international importance.
As a consequence, initially the digital collection development was focused
on the digitization of individually selected collections. With the completion of
the learning and experimentation phase of German (retro-)digitization at the
Bavarian State Library in 2005, the technical and organizational prerequisites
(sec below) were ramped up for the second phase: the mass digitization. The
definition of this term varies considerably, i.e. with respect to the amount
of books or pages that are being digitized. In Ger man), this term is used if
significantly more than one miLlion pages are scanned in a limited period, usually
24 months. ror comparison: the public-private partnership between the Bavarian
State Library and Google encompasses about 5,000 books with a tOlal of about
1.9 million pages digitized every week.
In early 2006, the MOZ Scanning Center started the first mass digitization
project in Germany, the so-called VD 16/ I, co\'ering the period from 1500 to 1517.
\,(/ith manually-operated book scanners, 4,300 titles with a tOtal of 600,000 pages
were scanned. These fi les were linked with simple strucrure data (digital tables of
coments) and madc available via the internet. Through funding from the OFG,
several mass digitization projects arc being carried Out subsequently, including:
- Rare books from 151 8 to 1600 (p roject VD 16/2). Within 18 months,
starting in mid-2007, 12,000 titles with a total of 1.7 million pages
were digitized using the world's first robotic scanning system for 16'"
century books. This project was partially funded through a development
partnership with the manufacturer, Trevennls (Vienna).
- Incunabula from 1450 to 1500. In this 48-month project, about 9,700
incum.bula will be digitized. Th e estimated total is around 2 million
pages. prom November 2008 to December 20'10, 4,100 titles with 1.1
million pages have already been djg1tized.
- 20,h century literature on historical topics: a cooperative project with
publishing houses, who have assigned the publishing rights to the Bavarian
Scate Library for digitization and internet portable document format
(pdf) download free of charge. About 4,700 titles with 1.4 million pages
wi ll be digitized in 24 months.
~1.
228
Bran u, I. Schiifer
The Google proj ect will email digi tizing about 1.2 million books from
the 17 th up into the 19 th ccntury with a total of about 300 million pages. Google
conducts the digital production at its own expense, while the Bavarian State
Ubrary receives copies of all digital data (" Digital Ubrary Copy"), which arc
made available to the public for free on the J\rDZ servers (Ccynow3 2007).
The digitization workflow and ZEND
Under the term "digitiz3tion" the following major process steps arc normally
summarized:
I) Preparation
2) Scanning, i.c. Im age Capture with different scanning devices
3) Indexing or production of mctadata (enrichment), including admini strative, bibliographical, technical and structural (inel. full -text)
metadata
4) StOrage and long-term preservation
5) Access, to suppo rt the material-specific presentation on the internet;
nowadays ready for mu lti-devices, like smartphones, tablets and deskTOps.
The individual processes always depend on the Structure and composition
of the original object and the intended presentation mode o n the internet (e.g.,
browsing, search, 3D animation) (Brantl 2008). The empirical formula is: the
older an object is or the more metadata it needs - for example, information
for sea rch and retri eval - the more compl ex and costly the entire production
process will be. To standardize the production processes and ensure a consistent
production of digital data, the Zentrnle E rfassungs- und Nachweisdatenbank
(ZEN D - Central D igital Asse t[ Management System), a softwa re [001 with
different document and workAow management components was developed
b y the MDZ in 2003. ZEND comrols the entire production process from the
preparation up to the au tomated transfer of the digital ma ster files to archival
storage. The introduction of ZEN D significantly reduced the time reCJuired
for the worki ng process. ZEN D also estabushed the basis for all current mass
digitization projects. Today, ZEN D processes around 1.5 million images of
book pages every week. Due to the high degree of au[Omation, the production
p rocess without scanning takes only about 30 to 60 minutes per title, depe nding
on d,e metadata creation . ZEN D - which is to be continuously improved -
Challenges and Experiences in the i\lass Digitization ...
229
is the backbone of the entire (retro-)digital production at the Bavarian State
Library, and includes for example the follow ing feature s:
- Processing of all types of documents and their presentation
- A standardi~ed electronic workAow for the different types of documents
- The delivery of digital images and full text data from an unlimited
number of service providers
- Significant time and cos t savings due to automatic processing.
In detail, ZEND is an electronic publishing system for (retro-)digitization
based on twO open-source sofrware modules, namely a LAMP system (Unux,
Apache, MySQL and PHP / Pe rl) and the EXtensible Markup Language
(XML) publishing framework Cocoon with the Lucene/Solr-search platform. It
combines different components of the document, web content and workAow
management, and creates the ideal comprehensive production environment. In
daily production, ZEND is very successful, as the over 500 million processed
files prove (including in the Google project). With its open, modular, l1exible
and scalable architecture, ZEND is equally suitable for the demands of scientific
retrodigitization, as well as the image-oriented, automated mass digiti~ation .
The development of ZEN D Started after the previously mentioned learni ng
and experimentation phase in German digitization from 1997 to 2003. During
this phase, a significant number of recurrent and complex manual steps were
required, such as creating checklists for the books with aCNa] page CountS
Oogical page counts), and the number of image files of a work (physical page
colmts) . ZEND offers - inter alia - the following main features, which are
embedded as integral components in the wotkAow:
- A web-based user and admini strative interface which can be accessed
from any Internet-connected computer
- Definition of different user-roles
- Generation of an object-specific digitization order (end use r or library
stafQ with plausibility check
- Automatic Universal Resource Name (Ufu'\J) generntion with a Un iversal
Reference Locator CURL) [Q the central link -resolving server of the
German National Library
- Open Archives Initiative (OAI) data provider
- Import of an necessary production dara, such as bibliographic data via
Z.39.50 interface from the Online Public Acce ss Catalog (O PAq
230
\ 1. Branu, 1. Schafer
• Support (import. export, generation) of all current XML document
formats, especially Text Encoding Initiative (rE I) and Metadata
Encoding and T ransmission Standard (METS), including the so-called
DFG -VlEWER
- Cooperative interfaces, e.g. for indexing of dig1tal objects
- Poss ibility to separate between public (open) and internal access, e.g. a
d igital reading room
- Connecting an optical character recognition (OCR) server for automatic
conversion of digital images to texl (currently only for Roman script, a
subsequent usc of OCR for Frahur fonts in old prints is planned)
- Search and retrieval of full text
- PD F on the Ay output of entire e-books via the Web
- CustOmizable Really Simple Sy nd ication (RSS) feeds, c.g., the daily
production of digitized items.
T hese features arc embedded as integral components of the overall
production workAow. They are presented in the following step-by-step overview
of the ZEN D work Aow.
'. ,..
...
O rde<
XMl·{d,tor
-~
Collt<:toon
M~n~gem~nt
W_D.l_
DaVeDi
St-a.( h/8<owle
POF ·DownIo.od
Ma nagem~ nt
Adm lnlstrati(ln
.....
,, _'u......
".,
Sublec t Portall
UNO w ..... rlow M<;>dulf!,
Image Conversion
Dal' Manaq~ment
InterfaCH
cw.n9l<l
_'\Ma....
Monitoring
...... 10 !!""lD9
-,Dat;,bale
Fih'llrc I. )\Iodulcs of ZENO.
RfPOlltooy
Maintenance
ArChival Storage
(TSM)
Challenges and Expe riences in the Mass D igitization ...
ZEND order and unique identifier
231
--_ _"
- - --,._--,,.,,,
- .-_. - '''~
........
~ R~_OnIIne ( RR&O l
...
Based on the electronically genera ted digital order form (by a cusromcr, project staff, etc.), ZEND crcates and prints an order slip. \'</ith
the help of the imprinted barcode,
"
it provides all necessary information about the entire process, up
to the quality control of the digital
images at the end, while also tracking the individual process steps.
The digital order form already defines the relevant scan
parameters (e.g., the resolution
N "
. _
for the digitization, i.e. 400 pixels per inch and 24-bit color for
!Ill 1111111
lUll
manuscripts and rare books). With
the creation of the order form,
a unique identifier is produced, Figure 2. ZEND order slip with barcode.
the so called "ZEND 10" (e.g.,
bsbOOOO"l119). The running count is appended to the identifier with an underscore (e.g., bsbOOOOI 119_0000I.tif). T he ID is also part of the URN, which
is anchored by a resolving URL (e.g., http: // nbn-resolving.de/ urn/ resolver.
pl?urn= urn:nbn :de:bvb: ']2-bsbOOOO 1119) in the catalogue system s. T he central
administration of the identifiers is managed by ZEN D, which ensures that each
unique identifier is assigned only once per obj ect.
....
--
-~~
1l7orkfloll' aNd prelltlllil!e cOI/Sen/(ltiON reqllirements
The preventive conservation requirements fOf scanning, as defined by the IBR,
have an impact on the process. T hus, the condition survey of manuscripts and
rare books before digitization is a.n indispensable component o f any digitization
project at the Bavarian State Library. The sUfvey is carried Out by the lBR. T he
goal is to identify objects that require intervention, thus avoiding the risk of
increasing existing damage or the loss of historical components. The suitable
232
M. Brantl, I. Schafer
scanning device is determined based on this survey. Additionally, object-specific
instructions arc given to the scan operator, including the pcrmincd book opening
angle and additional tOols for supporting the document in its position when
requ ired, as well as informacion about particularly sensitive parts of the book
and risks.
Figure 3. Examples of rare book materials.
D etailed conservation requirements may vary from institution to
institution. In general, the main risks in scanning arc the mechanical stress on
the book, the light exposure, and the indoor environment WIth relative humidity
and temperature. The scanning process must include "a good opening angle"
for the book (as defined by the lB R) to ensure that the complex spine structure
with endbands, sewing, spine lining, and covering material (e.g., leather) wiU nOl
be damaged. It turns out that only vcry few rare books allow an opening angle
of 180". To prescrve the historic significance of the objects and to avoid high
conservation costs for repairing damaged covers, scanners have to adapt to the
physical state of histOric books and offer a "good opeOlng angle" for scanning.
Figure 4. l300k opening angle of 120" and ISO" - potencial cover damage.
Challenges and Experiences in the Mass Digitization ...
233
ror preventive conservation reasons, the iVlD Z does not use the regular
glass plates of conventional manual book scanners. These scanners have holders
for a book opening angle of 180 0 in order to apply plane pressure all. the opened
book by the glassplate. The inability to use the glass plates creates two significant
disadvantages in the scanning process:
- The scanning throughput is reduced by 50 percent, as recto and verso
pages can only be scanned in twO separate processes, using custom book
cradles with opening angles of about 1200 or less.
- \,(/ithout the comact pressure of a glassplate all. the book, the depth of
focus of the images varies.
However, these disadvantages can
be overcome by the use of other tools,
such as the "tl.'lunchncr Finger" ~ Iun ich
finger) or laser-assisted control to
visually check focus-depth. In addition,
the MDZ and the ffiR are working with
manufacturers all. the optimization of
manual scanning systems that allow
scanning panels to work efficiently even
withou t a glass plate (see below).
Another well-known potential
source of damage is light. For correct
figure 5. T he "j\[unich finger, " created by
the I13R for the smooth stabilization of
pages during the scanning process.
Fib'Urc 6. Testing of focus depth with laser during the book setup on the cradle.
234
1\ 1. Brand, I. Schafer
color reproduction requires precise light exposure. Achieving accurate color
values is a key objective of the scan philosophy of the Bavarian Scate Library.
The image '1uality must be as high as possible in order to ensure that the digital
image provides maXimlllTI reusability and full benefit for the users and that
rescanning is avoided. Since illumination initiates chemical changes of organic
materials (e.g. colors) that accumuJate with repeated exposure, the intensity and
duration of the light levels associated with scanning need to be kept as low as
possible. Today, ultraviolet and infrared filters are a basic requi remem. In recem
years, the cold light emitting diode (LED) lighting in scanning systems has
replaced conventional fluorescent tubes. In addition to these lighting techni'lucs,
elcctronic flash can be used directly or indirectly, with a Pyrex dome as a fi lter
and burst protection. In the J\{DZ Scanning Center currently three ligh ting
techniques are used after approval by the IBR: electronic Rash (digital cameras
and ScanRobor@), fluorescent lamps with indirect or synchronized lighting, and
LEDs.
Figure 7. The differcm light sources at the r-,IDZ: electronic flash. fluorescent tubes, LEDs.
T he evaluation and assessment of these crucial elements are carried
Ollt by the JBR, wiul in their framework of preventive conservation. T hus, the
conservation activities in the preparation process include the fo Uowing steps:
- Condition survey
- Selection of the appropriate scanning device, control of room climate
- Training of scan operators in carefuUy handling the books
- Hands-on support fo r the scanning of valuable objects.
Challenges and Experiences in the j\ l ass Digitization ...
235
In Figu re 8, the involvemcnt of conscrvators in thc digitization process
at the Scanning Centcr is shown: Thc scanning of an illuminated Renaissance
manusc ript with the genealogy of the famous Augsburg merchant famil y of
Fugger. The heavy-weight (approximately 50 lb), large- format manuscript
retains its o riginal leather bindi ng ornamented with delicate enamel medallion s.
To ensure ca reful scanning, t\VO conservators provide hands-on suPPOrt for the
scan operator, who only operates the computer in this casco
r ib'lire 8. Example of hands-on suppOrt of conservators with scanning of r ugger's "Ehrenbuch".
S (tIfIIljllg process
The scanning of sensitivc, precious objects (manu scripts, rare books, special
collections, etc.) is carried o ut exclusively in the Scanning Center of the J\lDZ.
The Scanning Center was created during a reorganization in 2005-2007, which
transformed the former analogue pho tographic reproduction unit into the digital
scan ning center of the future. With special funds from thc Free State of Bavaria,
extensive scanning hardware and so ftwa re were pu rchased. Technology has
changed, bllt the main task of th e in-house photographic reproduction unit has
remained unchanged : to digitize the mOSt valuable, precious and sensitive books.
T he objective is to scan a highly valuable object only once and so well ("do it once,
do it right'~, that the digital \'crsion is widely reusable for different purposes in
terms of cross-media publishing, including internet, catalog, facsim ile or poster
printing. This of course rC'lui.res a corrcspondingly strong infrastructure and
workAows for the long-term prescrvation of the digital master files and their
236
1\1. Brantl, I. Schafer
mctadata, as discussed below. In order to meet aU the prc"cmi\-c conservation
rC'Iuiremcms and intrinsic material requirements of a universal library, like the
Bavarian State Library, with such heterogeneous objects (e.g., small wooden
sticks or long silk scrolls from the Far East, historical maps of about 1.80 sqarc
meters), the Scanning Cemer coday is equipped with 21 differenr scanning
devices, including book scan ners and special equipment scanners (including four
automatic book scanners, a thermography scanner for watermarks, and a 30
scanner for book covers), that afC able (Q scan format sizes up to Deut sc hcs
Institut fur Normung (DI N) standard AD (e.g., histOrical maps) with high
optical resolution.
Book or overhead scan ners consist of a fixed set-up of a digital camera
with a book cradle (i.e. the distance/resolution between camera and cradle
is fixed). Th e entire digi tal reproduction unit of a book scanner is optimized
to provide constant illumination, resolution and depth of focus during the
digitization of each object. Such a system is complemented by speciall}' adapted
software for the reproduction of bound documents. This increases productivity
substantially in comparison to digital studio or consumer cameras. Book scanners
are most frequently used at the fo. ID Z. However with the rccent development of
high-quality CCD-chips, the MDZ also uses a high-end 200 megapixcl digital
camera for special cases. For the implementation of all prevcntive conscf\·ation
rcquirements, a pool of different book cradles arc already in usc, and every day
new challenges arise. Here are some examples of book cradles with different
openi ng angles. Sometimes the cradles arc scanne r-specific, sometimes they are
more versatile for use with a range of scanners:
The conventional book holder
illustrated in Figure 9, part of the
standard package for all book scan ners,
can o nly scan books that allow a
(damage-free) 1800 opcning angle. T his
set-up can bc used with less than 30
percent of the Bavarian State Librarys
books collections. As previously
mentioned, the usc of other cradles
cuts the scanning throughput in half.
Other examples of retrofits include Figure 9. Com'cnuonal book holder with a
ISO-degree opening :mglc.
those illustratcd in Figurcs 10 to 15:
Challenges and Experiences in the l'o lass Digitization ...
237
Figure 10. /\ngle bracket for an opening angle Figure 1I. Traverse angle bracket for a
from 90 up to 140 degrees.
II O-degree angle.
Often, these cradles alone are not enough to suppOrt the books suffic iently,
and tailored constructions must be made, as seen in the fo llowing picture:
Fib'llrc 12. Variable foam cradIc covered wilh acid
board, e.g., 130-degrce angle.
T he following examples of newer
developments all o f wh ich are used in the MD Z
provide book cradles which can be adapted very
well to the physical conditions of books.
Figure 13. Camera table "Mooel Graz", from 90 up (0
140.degrec angle, with a VlIcuum arm for single page
positioning of illuminated manuscripts.
238
M. Brand, I. Schafer
Figure 14. [I. [obile crndlc "The Grnz Traveller" Figure 15. ScanRooot li book cr:tdlc wilh
with a ! 1O-degree angle.
opening angles froln 60 up to 140-dcgrce.
Images of manuscripls, rare books and special coUections arc usually
produced using the abovc·named book cradles with the following parameters:
- Optical resolution of 400 dpi to 600 dpi, relative to the original size of
the object (1: 1; due to the differem sizes of the book scanners)
- Digital master storage format: T agged Image File Format (rIFF)
(version 6.0) uncompressed. This results in color scanning file sizes of
20 to 800 megabytes per image, depending on the format, the size of the
book, and the image resolution
- Color depth of 24-bit (duc to the TI FF-format)
- Color management and media neutral image production: the entire
production is based o n the use of a color management system (in RCB
and LAB as transformations color space), i.e. scanners and monitOrs are
color calibrated. Therefore, the main objective is the media-independent
production of the images. The International Color Consortium (ICC)
standard profiles arc always stored together within the images. Typically,
the images arc saved without any further edi ting, in order to preserve
high image quality in the digital archive (without any loss due image
enhancement). In order to ensure high '1uality color management and to
avoid color-interferences (e.g. caused by the aging of some light sources) ,
all scanners and monitors are weU maintained.
-Targets (Figure 16) for visual conrrol of color, grayscale, depth of
sharpness and scale are scanned once per object and stored with the digital
master of the object.
Challenges and Experiences in the J\'lass Digitization ...
239
- Authentic cligital scanning, whkh
means cligitizing that is true to
the original: The scanning of the
work includes the outside of the
front cover through to the outside
of the back cover, including aU
blank pages, insert sheets, etc. In
addition, each page is digitized with
a narrow surrounding rim, to show
that nothing was added or omitted.
An additional process step in
scanning is the set-up of the document on
the appropriate scanning device - before,
during and after the process. Depending Figure 16. Color, grcyscale and scale
on the value, size, and condition of a targets.
document, this can take from a few minutes up to an hour. T he following steps
describe the use of a manual book scanner to scan a book allowing an opening
angle under 1DO-degrees:
1) Position the book on the book cradle of the scanner.
2) Scan color, grayscale, sharpness and scale targets.
3) Constantly re-adjust the depth of focus, due to the varying thickness of
the book as the pages are turned. This is inevitable when working without
a glass plate. Only the conStant readjustment of the book cradle height
will compensate for the changing focus.
4) Sepa rately scan recto and verso pages that are linked afterwards by the
software (this is a cri tical, error-prone process due to the often lack of
pagination), if reguired due to tl1e scanning device and opening angle of
the book.
5) Scan the book covers (Out- and inside).
6) Perform "what you see is what you get" (\WS J\X/YG)-guality control
throughout the scan using the monitor.
7) Remove the book from the cradle after finishing the work.
The total time reguired to perform these steps is referred as set-up time.
T he creation of the digital image is the scan or exposure time. The integration
240
jv!,
Brant!, 1. Schafer
time is always very short relative to the set-up time. The set-up time for a
sensitive, old book increases by exponentially.
Book scanner Ihroughpul
In this context a few words about throughput of book scanners. The calculation
of throughput can be made in many different ways. In the case of manuscripts
and rare books there is always a gap between the calculated values presented in
glossy brochures of scanner manufacturers and the real achievable throughput.
There are many reasons for this, including the preventive conservation
requirements and the real daily working times. Unfortunately, there is an absence
of a standardized metrics in this area: when does the time measurement starts,
when does it end? A statement, declaring the throughput of one object per
hour, does nOt really provide verifiable metrics. The MDZ measurement of the
scan ning throughput has therefore been a process of evaluation over time, and
includes the following relevant factors:
- Technical development of the hardware
- Numbe r of scanners
- Scan parameters
- \'(/orkAow
- Formats, materiality of the objects, and preventive conservation
rC'luirements
- Staff expertise.
The following processes are involved in determining the throughput at
MDZ,
1) Preprocessi ng: Transport of the books to the Scanning Center, creating
a checklist, selection of a suitable book scanner, etc.
2) Scan Operation: Creating a scan job, positioning of the book, scanning,
target scanning, storage on the PC, data operations, etc.
3) Posrprocessing: Quality control, scanning, and rescanning if recluired
by feedback, delivery to the \'(Ieb, return transport, etc.
4) Long-term preservation: Automated data transfer to the archival system,
quality control, deletion of the digital master files on the production
servers after the successful long-term preservation, erc.
Challenges and Experiences in the ]\'fass D igitization ...
241
We achIeve the following average throughput rates based on real working
time of 6 hours per day:
- i\ lanuscripts/rare books and difficult, sensitive objects up to ca. 200
pages/day
- Manuscripts/rare books with manual operated book scanner ca. 380
pages/day
- Rare books with ScanRobot® ca . 1,000 pages/day.
This rakes into account:
- Our strict conservation reguirements (more tha n 70 % of manuscripts
and rare books cannot be opened to 180j
- The technical state of our scanning equipment (among them 8 older
devices from 2005).
To improve the throughput for fragile objects, the J'vlDZ and lBR are
working together with several manufacturers to optimize the book scanning
systems, including the book cradJes. T he valuable holdings of the Bavarian
State Library, with their heterogeneous material properties, provide a more
than suitable test repository for these developments. Recent examples o f such
partnerships are:
- The adaption of an automatic book scanner, the ScanRobor®, for the
scannjng of 16th century rare books, in partnership with the Treventus
company in Vienna
- T he book2net Cobra scanner developed by the !Vucrobox company, Bad
Nauheim (Germany) . T his scanner will increase the throughput in manual
scannjng of manuscripts
and rare books.
Figure 17. The Cobra scanncr, product of a de\'clopmcm partnership.
242
l",J. Brand, I. Schafer
Storage, indexing and digital long-Ier", preservation
During the digicizacion process it is essential to save the generated digital master
files and theirderivacives, like images, metadata and full text, within an appropriate
storage solution as the primary storage capability. After the completion of the
digitization process, the files will be organized and managed in two copies and
in twO separate archive systems, with the secondary and tertiary data stOrage
serving as long-term preservation.
Following the ZEN D workflow rC'Iuires: after the final quality assurance
by the scan operators, the digital master files are stored on a redundant array of
independent disks (RAID) system (primary storage). They are then coUected
overnight from the ZEND-CoUector and are transferred to one of the many
~IDZ production servers (currently over 80). An automated process is used to
register the files, and a semi-automated process to generate the metadata for the
digital objects:
1) The digital master files are separated accordingly to their ZEND-ID
and aU images of an object are stored in their own directory.
2) T he master TI FF files are then converted for standard Web presentation
into a JPEG Ooint Photographic Experts Group) format with twO
different resolution versions.
3) Next the simple structural metadata of the object is generated in T El Xlv[L (PS) to describe thc logical and physical structure of the object.
With this description, the book can be browsed later and edited online
with the ZEN D Table-of-Contents Editor.
4) If possible, a full text is generated by using an automated OCR-process.
Today this is only Implemented for titles printed in Latin script. German
Gothic or blackletrer scripts are problematic, in that there are no costeffective OC R programs for the creation of full text from these printing
types, which were in use from the middle ages up to 1943. However, as
soon as a cost-effective OCR solution is available, the objects can be
retrieved ftOm the archival systcm and the images reprocessed with a
gothk-OCR-software to create full text from these documents.
As a last step, the digital master files for images and texts are transferred
in their long-term stable file formats (rIFF and ~IJ~) to long-term archival
systems in twO setups: to a Network Attached Storage System (NAS), which
243
Challenges and Experiences in the Mass Digitization ...
currently has 165 terabytes of net capacity and to the Tape Library of the
Leibniz Supercomputing Center in Munich-Garching, which is running under
IBM Tivoli Storage Manager software. The files are al so stored in another copy
in another location. With the progress of the mass digitiza tion of dam, we expect
the following increases in the long- term archive:
2004
2005
20IlS
'1!XJ7
2008
2009
2010
2011
2012
2013
Figure 18. MDZ data storage and long term preservation: Slarus and forecast.
Today, with the continuous drop in prices for storage hardware, the
large amount of data is not a challenge - but as we learnt there is a new one:
the data management. New problems include the reading and writing speed
of the hardware and the required network infrastructure. The entire network
infrastructure from the BSB to the Lcibniz Supercomputing Center (LRZ) is
co ntinuously connected by a I-gigabit Ethernet connection, which is expandable
up to 10 Gigabit. It is important to note that within the ZEN D workAow, the
retrieval o f the digital master files (fl FF) from the archive system has just been
implemented. In 2010, the ~ IDZ started a project tOgether with the Bavarian
Library Network and the LRZ for the implementation of the Rosetta Digital
Preservation System from Ex Libris, which should suppOrt management of a
OAIS-compliant digital archive.
PrutlllOhon ond access
After the final quality assurance in the ZEN D -sys tcm, the object will be
published on the internet immedia tely. A message, that the object is ready on the
l\ 1. Brant!, I. Schafer
244
\'feb and available under its persistent internet address, is scnt [Q the locallibrar),
system online public access catalog (O PAq as well as to other reference systems
(union catalog, Europcana, etc.). So the internet access is possible in various
and multiple ways, including via the Digital Collections of BSB /ivID Z. catalog
s),slcms, search engin es, elc. In addi tion, there is alaways a free and full pdf
download available for every object, which enables the data Iransfer to a USCfPC and facilitates ofAine work. Furthermore, the export of the bibliographical
metadata for fe-usc in other information environments is carned out, for
example by an OA I interface for the Europcana or the World Digital ljbrary.
...
---
-~ -
europeana
Ih i"\ cullu ••
-_. _..-_,.----
•
'-
.-_._.....
....
-_.-
~-
-.. - .--
SUtCh
0_
1"iis\Of\HIIbII W "" •• c..." OM Matu IibrI
8S! Cod QrMC ISl
-.opiC ••
,... _
........... _ .. _ ..... _ _ <OOr-'"
..,.,-".
c
.- _ _ _
-~
"---'--.---.-_ -, - _ .......... _
.. . "1 ..
_-
......
-~ . --
Figure 19. Presentation of a manuscript in the Europt:·ana.
In addition to providing data through vanous 2D e-hook viewers,
since October 2009 the J\lDZ provides the possibility to read manuscripts
and rare books online with a 3D -viewer. The 3D-viewer allows the user
to see the entire book with its exterior binding in a 3D-model. Recently, the
provision of data for mobile devices was added, including the presentation
of 52 high-value manuscripts with iPad and iPhone applications.
Challenges and Experiences in the
,..
;,"
-
II
.'I
~ I ass
Digitization ...
245
~~
II
ill
"
~
Figure 20. iPad/iPhone application "Famous books: Treasures of the ila\':\rian State library".
Figure 21. 3D version of an illuminated Renaissance manuscript.
246
M. Braml, I. Schafer
Project examples of mass digicization of 16'h century books
The Bavarian State Library, as part of Germany's virmal national library,
coordinates the so called VD1 6, a bibliography of books printed in German·
speaking countries in the 16th century. In 2006, the Bavarian State Library
started the mass digitization o f its unique holdings from the 16th century as part
of the project "VD 16/1 digitaJ". fu nded by the ore. For this project, based
on the State of scanning technology of the time, the digitization of selected
German printS from 1500 [0 15 t 7 was implemented with manually operated,
conventional book scanners, equipped with additional book cradles. Due to
the strict preventive conservation guidelines, 70 percent of the books had an
allowable maximum o pening angle of 90-degrees. Therefore, all recto and ve rso
pages had [0 be scanned separately in twO processes, using different cradles in
o rder [0 respect the individual integ rity of the books. At the end of the scan ning
process, the left and right pages were assembled imo the entire e-book b y a
software prog ram.
Figure 22. Manuall}' operated book scanner with angle bracket (\ scan
= 1 image).
Challenges and Experiences in the Mass Digitization ...
247
This production method caused reduced throughput and increased costs
significantly. Another serious aspect was the especially high error rate. There is no
pagination in most 16th century books. This created an additional complexity for
the scan-operatOrs, even if the scan-software helped them in sorting the pages in
sequen tial order. All these facts limited the throughput speed dramatically: In the
24-momh project duration, only approximately 4,300 books with approximately
600,000 pages could be digitized at three (manual) scanning workstations.
Based on the VD16/ 1 experience, the search for alternatives to manual
scan ning equipment started with the goal to find high-quaLiry and non-damaging
scan ning technologies with high throughput. The resuh was a worldwide
novelty: the use of automated book or robotic scanners for the scanning of 16th
century rare books. In 2006, the Vienna-based company Trcvcntus developed
a completely new and innovative scanning robot sys tem, the ScanRobot®. In
2007 this system was further modified as part of a development partnership
with the IBR and the \\ IDZ, adapted to the specific requirements of rare books,
with special consideration of high preventive conservation requirements. The
scanning with the ScanRobot~ is carried out as follows: Whcn the scan starts, the
scanning unit with a glass prism is gently lowercd into the opened book, which
rests on a flexible book cradle adjuslable from 60 to 140-degrees. A prism with a
slit in the middle is moumed on the top of the scanning llllit. While the scanning
unit is moving slowly up, both left and right pages arc auached by a gentle air
flow from the prism slit, and scanncd simul taneously in strips with short Rash
imcrvals. When the scanni ng unit reachcs the end of the pages, a gentle air
blast turns the pages to the left and the process startS again with the next two
pages. Every turned page increases the books thickness on the left part of the
book cradle and as a consequence the book moves in small steps from right
to left. until the scanning is finished. In this process, the ScanRobotf.l operates
the scanning and the mrning of the pages in a single operation. The stead),
air Row between the paper and the device e nsu res an almost complete noncOntact scanning. As part of tbe development cooperation wi th Trevemus, the
Scan Robot'" is continuously revised and updated for the specific characteristics
of rare books, always meeting the bigh conservation re'luiremems. This led to
very gentle mechanical handling b), the ScanRobot~. The robot is very sensitive
to the material and technical individuali ty of rare books, and immediately StopS
the scanning process if necessary to prevent damage.
248
i\-J. Brant!, I. Schafer
Figure 23. ScanRobot with book cradIc (1 scan = 2 images).
By using this innovative and careful scanning technology, in which two pages
are digitally recorded with only one scan, about 9,000 works with over I million
pages were scanned within 18 months (Brand, Ccynowa, Fabian el alii 2009).
Conclusion and outlook
[n digitization, scanning is only onc step in the long and complex process that
leads to the presentation of books on the internet in the end. \'':lith scanning of
manuscripts, rare books, and special collections, strict preventive conservation
requirements and a high through put seem to be a contradiction at first. Due
to their dose cooperation over the years, the Bavarian State Library's tvlunich
DigitiZation Center and the Insti tute of Book and j\ lanuscript Conservation
collected extensive experience ancl know-how about the implementation of
digitization projects dedicated especially to valuable documents, and also in
a wide variety of document formats. Furthermore, the MDZ and !BR have
gained significant experience in mass djgitization, which in turn enabled new
developments together with scanner manufacturers. T hus, cooperation has
allowcd the optimization of existing scanning sysrcsns, as well as the creation
of new s)'stems, whilc meeting strict conservation requirements. Beyond this especially in combination with new technological approaches - the new systems
Challenges and Experiences in the \ Iass Digitization ...
249
are leading to lower production costs and will continue to reduce the COStS even
more in the future. T herefore, our vision,
tbat tbe entire II'nlten benlflge Ibal tbe 13(//1(1J1011 Sttlle Library bas collected
aNd pmen'fd Ol'(r tbe rtlllllnes lI'il/ be proL'ided /'/(/ tbe illtemetto al/ illtemted
partits for free,
has a solid basis for becoming reality in a reasonable time.
References
Brant!, \ 1., "1\ lass Digiti%ation and Long-term Preservation. Processes and
Production at \ Iunich Digitization Cenler", http://www.digitalesammlu ngen .de / md %/ con ten t/ service/docs / 2008-06_13 ran tl_en. pd f
Brand, \\ 1., Ceynowa, K., Fabian, C, i\ lel3mer, G, and Schafer, I.,
"i\lassendigitalisierung deutscher Drucke des 16. Jahrhundens. Ein
Erfahrungsbericht cler Bayerischen Staatsbibliothek", ZeitsdmJi flir
Bibliotbtksllutff ,md Bibliogmpbie 56, (2009), 327 - 338.
Ceynowa, K., "J\ lass Digitization for Research and Stud),: the Digiti za tion
Strategy of the Bavarian Stalc Library", IPL/ I jOflrnal 35. 1 (2009),
17 -24, http://comm in fo.rutgers.edu / - tefko / Courses/ eS53/ Readi ngs/
Ceynowa%20 IFLA%20J%202009.pdf.
Griebel, R., and Ceynowa, K. (cds.), 450 jnbre 13q;,tnsrhe Sltwtsbibliothek, BerlinNew York 2008.
i\ lilne, R., "A Move from "Boutique" to i\lass Digitiz3tion: the Google
Library Project at Oxford ", in R.A. Earnshaw, and J. Vince (cds.), Digital
COlIl'ergellrt, Ubrants of lhe F'idflrt, London 2008, 3-10.
Schafer, I., "Restaurieren fUr die Wissenschaft - Das Institul fur Buch- und
Hand schriftcnrestauricrllng", in R. Griebel, and K. Ccynowa (cds.), 450
jalJl"t H(!),r1iscbe SlatltsbibliotiJek, Berlin-New York 2008, 225-240.
Waters, Dj. , "What Arc Digital Libranes?", CLi R isSllts 4 (1998),
http://www.clir.org/pllbs / issucs/ isslles04. html#dlf.