Untitled
Transcription
Untitled
Challenges and Experiences in the Mass Digitization of Manuscripts and Rare Books at the Bavarian State Library" Markus Brantl\ and lrmhild Schafer 2 Direc/or 0/ the Munich Digitizotioll emltr / Digital Library, BtII.!Il17·(}1I Slale LjbrmJi ufdu:igslr. 16, 80539 Munich, GernJa/!)1 markus,brallt@bsb-lIllfel1c!Jen.de 1 Director 0/ tbe I"slitute of Book and MtJlIHScrip' COllIertiotion, 13(/l'(uiall Stale Libra')l LJldJ/!igsfl: 16, 80539 Municb, German)) irJJJbi/d..rciJaifer@bsb-IlJllellcbell.de I Keywords: Mass digitization, cultural heritage, manuscripts, tare books, incunabula, preventive conservation, books materiality, work flows, automated book scanner, manually-operated book scanner, scanning throughput. digital long-term preservation, digital collections, workflow, digital aSSCl management, mobile Web, public-private parrncrship. Abs tract: T he Bavarian State Library is one of the largest European universal libraries. T he library's uni'1ue collection profile is characte rized by extremely precious manuscripts, rate printed books and comprehensive special collections from dlOusands of years of cultural heritage. The internationally renowned manuscript, incunabula and rare book collections encompass more than 200,000 titles up to end of 16th century. As an international resea rch lib rary it offers services to scholars and students worldwide. In this respect the World Wide Web is the natural medium for the Bavarian State Library, since it allows 24/7-access to infor mation for everybody around the world. It is dlerefore a primary strategic objective of the library to digitize as soon as possible its unigue holdings and to make them accessibly free of charge on the \Y/\Y/\Y/.. This article provides an overview of the technical and organisational challenges and experiences encountered in the mass digitization of manuscripts • This articlc is a modified \'crsion of t-.-I. Brant] and I. Schafer, "i\hsscndigitalisierung vo n Handschriftcn und Altcn Drucken, Hcrausfordcrungen und Erfahrun&.-en an der Baycrischen StaatsbibHothek", in Ei/e(lf/opoiin. SpnpoliunJ 011 Digilal/!!Iagillg 0/ AntitlllTtxlfla/ I-Itlifttgt: TfChIiO/ogiclI/ Chll/ltligu and JO/lfh·O/IS, 28-29 Orlokr, 2010, I-it/lilllei, f-i ·nkllld, Helsinki 2010, 175- 197. 224 M. Brand, I. Schafer and rare books at the Bavarian Srate Library, J\'lunich , Germany. Section I introduces the Bavarian State Library and its I.nscitutc of Book and !\4anuscript Conservation and 1\!lunich Digitization Ceorer. Section 2 discusses the digitization strategies and mass digitization projects. Section 3 is concerned with the digitization process of manuscripts and rare books with the Central Digital Asset Nlanagemcnt System. This is followed by a case study of the bibliography of books printed in German-speaking countries in the 16th century, Projects VD16 /1 and VD16 /2, in Section 4. Finally, Section 5 offers conclusions, with the vision and plans for future digitization activities at the Bavarian State Library. Introduction T hough inconceivable until the widespread use of the internet in the mid 1990's, it is now technologically possible to scan the collections of entire libraries, archives, and museums to provide them in digital form on the internet. In September 201 0, the website of the Bavarian State Library (BSB) already offered 400,000 copyright-free medieval manuscripts, rare books, and books from the 6th to the 20 th century (about 267 terabytes of data in 537 million files as of Dec. 2010). Digitally providing this internationally significant collection for worldwide access is o ne of the main tasks of the Munich DigitiZation Center (MDZ), in collaboration with the Institute of Book and lVlanuscript Conservation (1BR), both departments of the Bavarian State Library. The IBR ensures safe handling of the items, and the MDZ cmploys and develops innovative tcchnology toward creating the "digital library" (\Xlaters 1998). The Bavarian State Library, digitization a nd preventive con servation As a treasure trove of cultural heritage, a multimedia information service provider for scholarly research, and an innovative force in the field of digital services, the Bavarian Scate Library is one of the primary national and international resources for reseatchers, students and all those seeking information. In 1558, Duke AJbrecht V of Bavaria purchased twO important private libraries, which constitute dle foundation of the Bavarian State Library. Today, the central srate library and repository lib rary of the Free State of Bavaria owns close to 10 miUion volumes and 55,000 current periodicals in hardcopy o r electronic format. With its unique collections of 93,000 manuscripts, 20,000 incunabula from Challenges and Experiences in the J\hss Digitization ... 225 1450- 1500, and 900,000 rare books from 1500-1850, the institution is one of the world's leading libraries. Together with the State Library in Berlin and the German National Library in Frankfurt and Leipzig, the Bavarian State Library constitutes Ger many's virtual national library (Griebel and Ceynowa 2008). The Munich DigitiZa tion Center (MDZ) was established in 1997, in the early stages of internet use in Germany, as one of twO national centers of excellence. The MDZ develops, evaluates and commissions new scanning technologies and production processes in order to create a "digital library". 1n close collaboration with universities and other research institutions, the J\{D Z has successfuJJy developed and implemented more than twO hundred innovative projects: From the retrodigitization (i.e. the creation of digital copies of handwritten or printed copyright-free documents) of full texts or images, as well as audio, and 3-dimensional (3-0 ) objects, in order to implement electronic publishing and virtual portals, as well as digital long-term preservation of digitial objects. The MDZ has also developed new processes and methods for the creation, management, storage, preservation and delivery of digital objects. Furthermore, the MDZ functions as a service provider, including providing support for digitization and development work for the open-source community in the digital libra f)'. The Scanning Center of the MDZ has gained significant experience and expertise with its extensive scanning of diverse collections of objects spanning over founeen centuries. Digitization of valuable objects is always carried out in close collaboration with the lnstitute of Book and Manuscript Conservation (IBR) as an important partner. The IBR was established in 1963 for the means of preser ving the unique cultural heritage of the Bavarian State Library. In addition to conservation and preventive conservation projects, the IBR trains conservators in Bachelor's and Master's degree programs in cooperation with the Technische UniversitatJ\'lunchen (fUM). Since the early days of digitization at the Bavarian State Library, the I BR and the MDZ have worked closely together to inco rporate aU necessary conservation reclui rements into the digitization processes. In particular, this collaboration includes the foll owing contributions of the IBR: - Advice on the purchase of scanners avai.lable on the market - Advice on the development of new scanning devices in collaboration wi th manufacturers - Assessing the condition of each item to be scanned as part of the project 226 M. Brant!, 1. Schafer preparation phase - Training the scan operatOrs in carefully handling of the objects - Providing functional devices to support the scanning process (e.g., the so called "Munich finger", sec Figure 5) - Hands-on support for the scanning of sensi tive and high value books. Digitization strategy and mass digitiz3tion projects Ooe of the traditional functions of the Bavarian State Library is to maintain and provide public access to d,C written cultural heritage that has been collected over the course of centuries. Thus, the strategic goal in the internet age is to provide the entire collection for quick, independent and ubiquitous access free of charge - to copyright-free library holdings. 1n order to achieve this, about 1.2 million objects have to be scanned and uploaded OntO the l'"fD Z web servers. This ambitious project is currently funded by the OFG, the Free State of Bavaria, and the European Union, as weU as a public-private partnership with Google and book publishers since Z007. Funding resources for this digitization project are allocated to support the digitization of objects from three time periods: the 6 th _16th centur}', 17'h_ 19 th century, and ZO'h_Z1 " century. Specifically, the three respective funding collaborations are as follows: - German Research Foundation (DPG), Free State of Bavaria, and European Union: Manuscripts and rare books and special collections from the 6,h to the 16,h centu ry - Public-private partnership with Google Book Search: Items from about the l7'h to the 19 th century - OFG and publishers: Special programs for the 20,h and rhe ZI ,h centuries Until 2005, the selection of digitization projects was exclusively focused on the Bavarian State Library's profile as an international research center, as well as its archiving function as tlle Bavarian central state library. Initially, "bou tique digitization" proiects predominated, i.e. the digitization of "masterpieces" as weU as of smaller collections (l\'1ilne Z008), such as: - T he OPG Special Subject Collections of the Bavarian State Library, in particular on History, Eastern Europe and Musicology, - Resources for the central artS and humanities on the portal on tlle history Challenges and Experiences in the r+. lass Digirization ... 227 and culture of Bavaria "Ba),erische Landesbibliothek Online" (fiLO Bavarian Regional Library Online) - Manuscripts and rare books due to the Bavarian State Ijbrary's status as a research library of international importance. As a consequence, initially the digital collection development was focused on the digitization of individually selected collections. With the completion of the learning and experimentation phase of German (retro-)digitization at the Bavarian State Library in 2005, the technical and organizational prerequisites (sec below) were ramped up for the second phase: the mass digitization. The definition of this term varies considerably, i.e. with respect to the amount of books or pages that are being digitized. In Ger man), this term is used if significantly more than one miLlion pages are scanned in a limited period, usually 24 months. ror comparison: the public-private partnership between the Bavarian State Library and Google encompasses about 5,000 books with a tOlal of about 1.9 million pages digitized every week. In early 2006, the MOZ Scanning Center started the first mass digitization project in Germany, the so-called VD 16/ I, co\'ering the period from 1500 to 1517. \,(/ith manually-operated book scanners, 4,300 titles with a tOtal of 600,000 pages were scanned. These fi les were linked with simple strucrure data (digital tables of coments) and madc available via the internet. Through funding from the OFG, several mass digitization projects arc being carried Out subsequently, including: - Rare books from 151 8 to 1600 (p roject VD 16/2). Within 18 months, starting in mid-2007, 12,000 titles with a total of 1.7 million pages were digitized using the world's first robotic scanning system for 16'" century books. This project was partially funded through a development partnership with the manufacturer, Trevennls (Vienna). - Incunabula from 1450 to 1500. In this 48-month project, about 9,700 incum.bula will be digitized. Th e estimated total is around 2 million pages. prom November 2008 to December 20'10, 4,100 titles with 1.1 million pages have already been djg1tized. - 20,h century literature on historical topics: a cooperative project with publishing houses, who have assigned the publishing rights to the Bavarian Scate Library for digitization and internet portable document format (pdf) download free of charge. About 4,700 titles with 1.4 million pages wi ll be digitized in 24 months. ~1. 228 Bran u, I. Schiifer The Google proj ect will email digi tizing about 1.2 million books from the 17 th up into the 19 th ccntury with a total of about 300 million pages. Google conducts the digital production at its own expense, while the Bavarian State Ubrary receives copies of all digital data (" Digital Ubrary Copy"), which arc made available to the public for free on the J\rDZ servers (Ccynow3 2007). The digitization workflow and ZEND Under the term "digitiz3tion" the following major process steps arc normally summarized: I) Preparation 2) Scanning, i.c. Im age Capture with different scanning devices 3) Indexing or production of mctadata (enrichment), including admini strative, bibliographical, technical and structural (inel. full -text) metadata 4) StOrage and long-term preservation 5) Access, to suppo rt the material-specific presentation on the internet; nowadays ready for mu lti-devices, like smartphones, tablets and deskTOps. The individual processes always depend on the Structure and composition of the original object and the intended presentation mode o n the internet (e.g., browsing, search, 3D animation) (Brantl 2008). The empirical formula is: the older an object is or the more metadata it needs - for example, information for sea rch and retri eval - the more compl ex and costly the entire production process will be. To standardize the production processes and ensure a consistent production of digital data, the Zentrnle E rfassungs- und Nachweisdatenbank (ZEN D - Central D igital Asse t[ Management System), a softwa re [001 with different document and workAow management components was developed b y the MDZ in 2003. ZEND comrols the entire production process from the preparation up to the au tomated transfer of the digital ma ster files to archival storage. The introduction of ZEN D significantly reduced the time reCJuired for the worki ng process. ZEN D also estabushed the basis for all current mass digitization projects. Today, ZEN D processes around 1.5 million images of book pages every week. Due to the high degree of au[Omation, the production p rocess without scanning takes only about 30 to 60 minutes per title, depe nding on d,e metadata creation . ZEN D - which is to be continuously improved - Challenges and Experiences in the i\lass Digitization ... 229 is the backbone of the entire (retro-)digital production at the Bavarian State Library, and includes for example the follow ing feature s: - Processing of all types of documents and their presentation - A standardi~ed electronic workAow for the different types of documents - The delivery of digital images and full text data from an unlimited number of service providers - Significant time and cos t savings due to automatic processing. In detail, ZEND is an electronic publishing system for (retro-)digitization based on twO open-source sofrware modules, namely a LAMP system (Unux, Apache, MySQL and PHP / Pe rl) and the EXtensible Markup Language (XML) publishing framework Cocoon with the Lucene/Solr-search platform. It combines different components of the document, web content and workAow management, and creates the ideal comprehensive production environment. In daily production, ZEND is very successful, as the over 500 million processed files prove (including in the Google project). With its open, modular, l1exible and scalable architecture, ZEND is equally suitable for the demands of scientific retrodigitization, as well as the image-oriented, automated mass digiti~ation . The development of ZEN D Started after the previously mentioned learni ng and experimentation phase in German digitization from 1997 to 2003. During this phase, a significant number of recurrent and complex manual steps were required, such as creating checklists for the books with aCNa] page CountS Oogical page counts), and the number of image files of a work (physical page colmts) . ZEND offers - inter alia - the following main features, which are embedded as integral components in the wotkAow: - A web-based user and admini strative interface which can be accessed from any Internet-connected computer - Definition of different user-roles - Generation of an object-specific digitization order (end use r or library stafQ with plausibility check - Automatic Universal Resource Name (Ufu'\J) generntion with a Un iversal Reference Locator CURL) [Q the central link -resolving server of the German National Library - Open Archives Initiative (OAI) data provider - Import of an necessary production dara, such as bibliographic data via Z.39.50 interface from the Online Public Acce ss Catalog (O PAq 230 \ 1. Branu, 1. Schafer • Support (import. export, generation) of all current XML document formats, especially Text Encoding Initiative (rE I) and Metadata Encoding and T ransmission Standard (METS), including the so-called DFG -VlEWER - Cooperative interfaces, e.g. for indexing of dig1tal objects - Poss ibility to separate between public (open) and internal access, e.g. a d igital reading room - Connecting an optical character recognition (OCR) server for automatic conversion of digital images to texl (currently only for Roman script, a subsequent usc of OCR for Frahur fonts in old prints is planned) - Search and retrieval of full text - PD F on the Ay output of entire e-books via the Web - CustOmizable Really Simple Sy nd ication (RSS) feeds, c.g., the daily production of digitized items. T hese features arc embedded as integral components of the overall production workAow. They are presented in the following step-by-step overview of the ZEN D work Aow. '. ,.. ... O rde< XMl·{d,tor -~ Collt<:toon M~n~gem~nt W_D.l_ DaVeDi St-a.( h/8<owle POF ·DownIo.od Ma nagem~ nt Adm lnlstrati(ln ..... ,, _'u...... "., Sublec t Portall UNO w ..... rlow M<;>dulf!, Image Conversion Dal' Manaq~ment InterfaCH cw.n9l<l _'\Ma.... Monitoring ...... 10 !!""lD9 -,Dat;,bale Fih'llrc I. )\Iodulcs of ZENO. RfPOlltooy Maintenance ArChival Storage (TSM) Challenges and Expe riences in the Mass D igitization ... ZEND order and unique identifier 231 --_ _" - - --,._--,,.,,, - .-_. - '''~ ........ ~ R~_OnIIne ( RR&O l ... Based on the electronically genera ted digital order form (by a cusromcr, project staff, etc.), ZEND crcates and prints an order slip. \'</ith the help of the imprinted barcode, " it provides all necessary information about the entire process, up to the quality control of the digital images at the end, while also tracking the individual process steps. The digital order form already defines the relevant scan parameters (e.g., the resolution N " . _ for the digitization, i.e. 400 pixels per inch and 24-bit color for !Ill 1111111 lUll manuscripts and rare books). With the creation of the order form, a unique identifier is produced, Figure 2. ZEND order slip with barcode. the so called "ZEND 10" (e.g., bsbOOOO"l119). The running count is appended to the identifier with an underscore (e.g., bsbOOOOI 119_0000I.tif). T he ID is also part of the URN, which is anchored by a resolving URL (e.g., http: // nbn-resolving.de/ urn/ resolver. pl?urn= urn:nbn :de:bvb: ']2-bsbOOOO 1119) in the catalogue system s. T he central administration of the identifiers is managed by ZEN D, which ensures that each unique identifier is assigned only once per obj ect. .... -- -~~ 1l7orkfloll' aNd prelltlllil!e cOI/Sen/(ltiON reqllirements The preventive conservation requirements fOf scanning, as defined by the IBR, have an impact on the process. T hus, the condition survey of manuscripts and rare books before digitization is a.n indispensable component o f any digitization project at the Bavarian State Library. The sUfvey is carried Out by the lBR. T he goal is to identify objects that require intervention, thus avoiding the risk of increasing existing damage or the loss of historical components. The suitable 232 M. Brantl, I. Schafer scanning device is determined based on this survey. Additionally, object-specific instructions arc given to the scan operator, including the pcrmincd book opening angle and additional tOols for supporting the document in its position when requ ired, as well as informacion about particularly sensitive parts of the book and risks. Figure 3. Examples of rare book materials. D etailed conservation requirements may vary from institution to institution. In general, the main risks in scanning arc the mechanical stress on the book, the light exposure, and the indoor environment WIth relative humidity and temperature. The scanning process must include "a good opening angle" for the book (as defined by the lB R) to ensure that the complex spine structure with endbands, sewing, spine lining, and covering material (e.g., leather) wiU nOl be damaged. It turns out that only vcry few rare books allow an opening angle of 180". To prescrve the historic significance of the objects and to avoid high conservation costs for repairing damaged covers, scanners have to adapt to the physical state of histOric books and offer a "good opeOlng angle" for scanning. Figure 4. l300k opening angle of 120" and ISO" - potencial cover damage. Challenges and Experiences in the Mass Digitization ... 233 ror preventive conservation reasons, the iVlD Z does not use the regular glass plates of conventional manual book scanners. These scanners have holders for a book opening angle of 180 0 in order to apply plane pressure all. the opened book by the glassplate. The inability to use the glass plates creates two significant disadvantages in the scanning process: - The scanning throughput is reduced by 50 percent, as recto and verso pages can only be scanned in twO separate processes, using custom book cradles with opening angles of about 1200 or less. - \,(/ithout the comact pressure of a glassplate all. the book, the depth of focus of the images varies. However, these disadvantages can be overcome by the use of other tools, such as the "tl.'lunchncr Finger" ~ Iun ich finger) or laser-assisted control to visually check focus-depth. In addition, the MDZ and the ffiR are working with manufacturers all. the optimization of manual scanning systems that allow scanning panels to work efficiently even withou t a glass plate (see below). Another well-known potential source of damage is light. For correct figure 5. T he "j\[unich finger, " created by the I13R for the smooth stabilization of pages during the scanning process. Fib'Urc 6. Testing of focus depth with laser during the book setup on the cradle. 234 1\ 1. Brand, I. Schafer color reproduction requires precise light exposure. Achieving accurate color values is a key objective of the scan philosophy of the Bavarian Scate Library. The image '1uality must be as high as possible in order to ensure that the digital image provides maXimlllTI reusability and full benefit for the users and that rescanning is avoided. Since illumination initiates chemical changes of organic materials (e.g. colors) that accumuJate with repeated exposure, the intensity and duration of the light levels associated with scanning need to be kept as low as possible. Today, ultraviolet and infrared filters are a basic requi remem. In recem years, the cold light emitting diode (LED) lighting in scanning systems has replaced conventional fluorescent tubes. In addition to these lighting techni'lucs, elcctronic flash can be used directly or indirectly, with a Pyrex dome as a fi lter and burst protection. In the J\{DZ Scanning Center currently three ligh ting techniques are used after approval by the IBR: electronic Rash (digital cameras and ScanRobor@), fluorescent lamps with indirect or synchronized lighting, and LEDs. Figure 7. The differcm light sources at the r-,IDZ: electronic flash. fluorescent tubes, LEDs. T he evaluation and assessment of these crucial elements are carried Ollt by the JBR, wiul in their framework of preventive conservation. T hus, the conservation activities in the preparation process include the fo Uowing steps: - Condition survey - Selection of the appropriate scanning device, control of room climate - Training of scan operators in carefuUy handling the books - Hands-on support fo r the scanning of valuable objects. Challenges and Experiences in the j\ l ass Digitization ... 235 In Figu re 8, the involvemcnt of conscrvators in thc digitization process at the Scanning Centcr is shown: Thc scanning of an illuminated Renaissance manusc ript with the genealogy of the famous Augsburg merchant famil y of Fugger. The heavy-weight (approximately 50 lb), large- format manuscript retains its o riginal leather bindi ng ornamented with delicate enamel medallion s. To ensure ca reful scanning, t\VO conservators provide hands-on suPPOrt for the scan operator, who only operates the computer in this casco r ib'lire 8. Example of hands-on suppOrt of conservators with scanning of r ugger's "Ehrenbuch". S (tIfIIljllg process The scanning of sensitivc, precious objects (manu scripts, rare books, special collections, etc.) is carried o ut exclusively in the Scanning Center of the J\lDZ. The Scanning Center was created during a reorganization in 2005-2007, which transformed the former analogue pho tographic reproduction unit into the digital scan ning center of the future. With special funds from thc Free State of Bavaria, extensive scanning hardware and so ftwa re were pu rchased. Technology has changed, bllt the main task of th e in-house photographic reproduction unit has remained unchanged : to digitize the mOSt valuable, precious and sensitive books. T he objective is to scan a highly valuable object only once and so well ("do it once, do it right'~, that the digital \'crsion is widely reusable for different purposes in terms of cross-media publishing, including internet, catalog, facsim ile or poster printing. This of course rC'lui.res a corrcspondingly strong infrastructure and workAows for the long-term prescrvation of the digital master files and their 236 1\1. Brantl, I. Schafer mctadata, as discussed below. In order to meet aU the prc"cmi\-c conservation rC'Iuiremcms and intrinsic material requirements of a universal library, like the Bavarian State Library, with such heterogeneous objects (e.g., small wooden sticks or long silk scrolls from the Far East, historical maps of about 1.80 sqarc meters), the Scanning Cemer coday is equipped with 21 differenr scanning devices, including book scan ners and special equipment scanners (including four automatic book scanners, a thermography scanner for watermarks, and a 30 scanner for book covers), that afC able (Q scan format sizes up to Deut sc hcs Institut fur Normung (DI N) standard AD (e.g., histOrical maps) with high optical resolution. Book or overhead scan ners consist of a fixed set-up of a digital camera with a book cradle (i.e. the distance/resolution between camera and cradle is fixed). Th e entire digi tal reproduction unit of a book scanner is optimized to provide constant illumination, resolution and depth of focus during the digitization of each object. Such a system is complemented by speciall}' adapted software for the reproduction of bound documents. This increases productivity substantially in comparison to digital studio or consumer cameras. Book scanners are most frequently used at the fo. ID Z. However with the rccent development of high-quality CCD-chips, the MDZ also uses a high-end 200 megapixcl digital camera for special cases. For the implementation of all prevcntive conscf\·ation rcquirements, a pool of different book cradles arc already in usc, and every day new challenges arise. Here are some examples of book cradles with different openi ng angles. Sometimes the cradles arc scanne r-specific, sometimes they are more versatile for use with a range of scanners: The conventional book holder illustrated in Figure 9, part of the standard package for all book scan ners, can o nly scan books that allow a (damage-free) 1800 opcning angle. T his set-up can bc used with less than 30 percent of the Bavarian State Librarys books collections. As previously mentioned, the usc of other cradles cuts the scanning throughput in half. Other examples of retrofits include Figure 9. Com'cnuonal book holder with a ISO-degree opening :mglc. those illustratcd in Figurcs 10 to 15: Challenges and Experiences in the l'o lass Digitization ... 237 Figure 10. /\ngle bracket for an opening angle Figure 1I. Traverse angle bracket for a from 90 up to 140 degrees. II O-degree angle. Often, these cradles alone are not enough to suppOrt the books suffic iently, and tailored constructions must be made, as seen in the fo llowing picture: Fib'llrc 12. Variable foam cradIc covered wilh acid board, e.g., 130-degrce angle. T he following examples of newer developments all o f wh ich are used in the MD Z provide book cradles which can be adapted very well to the physical conditions of books. Figure 13. Camera table "Mooel Graz", from 90 up (0 140.degrec angle, with a VlIcuum arm for single page positioning of illuminated manuscripts. 238 M. Brand, I. Schafer Figure 14. [I. [obile crndlc "The Grnz Traveller" Figure 15. ScanRooot li book cr:tdlc wilh with a ! 1O-degree angle. opening angles froln 60 up to 140-dcgrce. Images of manuscripls, rare books and special coUections arc usually produced using the abovc·named book cradles with the following parameters: - Optical resolution of 400 dpi to 600 dpi, relative to the original size of the object (1: 1; due to the differem sizes of the book scanners) - Digital master storage format: T agged Image File Format (rIFF) (version 6.0) uncompressed. This results in color scanning file sizes of 20 to 800 megabytes per image, depending on the format, the size of the book, and the image resolution - Color depth of 24-bit (duc to the TI FF-format) - Color management and media neutral image production: the entire production is based o n the use of a color management system (in RCB and LAB as transformations color space), i.e. scanners and monitOrs are color calibrated. Therefore, the main objective is the media-independent production of the images. The International Color Consortium (ICC) standard profiles arc always stored together within the images. Typically, the images arc saved without any further edi ting, in order to preserve high image quality in the digital archive (without any loss due image enhancement). In order to ensure high '1uality color management and to avoid color-interferences (e.g. caused by the aging of some light sources) , all scanners and monitors are weU maintained. -Targets (Figure 16) for visual conrrol of color, grayscale, depth of sharpness and scale are scanned once per object and stored with the digital master of the object. Challenges and Experiences in the J\'lass Digitization ... 239 - Authentic cligital scanning, whkh means cligitizing that is true to the original: The scanning of the work includes the outside of the front cover through to the outside of the back cover, including aU blank pages, insert sheets, etc. In addition, each page is digitized with a narrow surrounding rim, to show that nothing was added or omitted. An additional process step in scanning is the set-up of the document on the appropriate scanning device - before, during and after the process. Depending Figure 16. Color, grcyscale and scale on the value, size, and condition of a targets. document, this can take from a few minutes up to an hour. T he following steps describe the use of a manual book scanner to scan a book allowing an opening angle under 1DO-degrees: 1) Position the book on the book cradle of the scanner. 2) Scan color, grayscale, sharpness and scale targets. 3) Constantly re-adjust the depth of focus, due to the varying thickness of the book as the pages are turned. This is inevitable when working without a glass plate. Only the conStant readjustment of the book cradle height will compensate for the changing focus. 4) Sepa rately scan recto and verso pages that are linked afterwards by the software (this is a cri tical, error-prone process due to the often lack of pagination), if reguired due to tl1e scanning device and opening angle of the book. 5) Scan the book covers (Out- and inside). 6) Perform "what you see is what you get" (\WS J\X/YG)-guality control throughout the scan using the monitor. 7) Remove the book from the cradle after finishing the work. The total time reguired to perform these steps is referred as set-up time. T he creation of the digital image is the scan or exposure time. The integration 240 jv!, Brant!, 1. Schafer time is always very short relative to the set-up time. The set-up time for a sensitive, old book increases by exponentially. Book scanner Ihroughpul In this context a few words about throughput of book scanners. The calculation of throughput can be made in many different ways. In the case of manuscripts and rare books there is always a gap between the calculated values presented in glossy brochures of scanner manufacturers and the real achievable throughput. There are many reasons for this, including the preventive conservation requirements and the real daily working times. Unfortunately, there is an absence of a standardized metrics in this area: when does the time measurement starts, when does it end? A statement, declaring the throughput of one object per hour, does nOt really provide verifiable metrics. The MDZ measurement of the scan ning throughput has therefore been a process of evaluation over time, and includes the following relevant factors: - Technical development of the hardware - Numbe r of scanners - Scan parameters - \'(/orkAow - Formats, materiality of the objects, and preventive conservation rC'luirements - Staff expertise. The following processes are involved in determining the throughput at MDZ, 1) Preprocessi ng: Transport of the books to the Scanning Center, creating a checklist, selection of a suitable book scanner, etc. 2) Scan Operation: Creating a scan job, positioning of the book, scanning, target scanning, storage on the PC, data operations, etc. 3) Posrprocessing: Quality control, scanning, and rescanning if recluired by feedback, delivery to the \'(Ieb, return transport, etc. 4) Long-term preservation: Automated data transfer to the archival system, quality control, deletion of the digital master files on the production servers after the successful long-term preservation, erc. Challenges and Experiences in the ]\'fass D igitization ... 241 We achIeve the following average throughput rates based on real working time of 6 hours per day: - i\ lanuscripts/rare books and difficult, sensitive objects up to ca. 200 pages/day - Manuscripts/rare books with manual operated book scanner ca. 380 pages/day - Rare books with ScanRobot® ca . 1,000 pages/day. This rakes into account: - Our strict conservation reguirements (more tha n 70 % of manuscripts and rare books cannot be opened to 180j - The technical state of our scanning equipment (among them 8 older devices from 2005). To improve the throughput for fragile objects, the J'vlDZ and lBR are working together with several manufacturers to optimize the book scanning systems, including the book cradJes. T he valuable holdings of the Bavarian State Library, with their heterogeneous material properties, provide a more than suitable test repository for these developments. Recent examples o f such partnerships are: - The adaption of an automatic book scanner, the ScanRobor®, for the scannjng of 16th century rare books, in partnership with the Treventus company in Vienna - T he book2net Cobra scanner developed by the !Vucrobox company, Bad Nauheim (Germany) . T his scanner will increase the throughput in manual scannjng of manuscripts and rare books. Figure 17. The Cobra scanncr, product of a de\'clopmcm partnership. 242 l",J. Brand, I. Schafer Storage, indexing and digital long-Ier", preservation During the digicizacion process it is essential to save the generated digital master files and theirderivacives, like images, metadata and full text, within an appropriate storage solution as the primary storage capability. After the completion of the digitization process, the files will be organized and managed in two copies and in twO separate archive systems, with the secondary and tertiary data stOrage serving as long-term preservation. Following the ZEN D workflow rC'Iuires: after the final quality assurance by the scan operators, the digital master files are stored on a redundant array of independent disks (RAID) system (primary storage). They are then coUected overnight from the ZEND-CoUector and are transferred to one of the many ~IDZ production servers (currently over 80). An automated process is used to register the files, and a semi-automated process to generate the metadata for the digital objects: 1) The digital master files are separated accordingly to their ZEND-ID and aU images of an object are stored in their own directory. 2) T he master TI FF files are then converted for standard Web presentation into a JPEG Ooint Photographic Experts Group) format with twO different resolution versions. 3) Next the simple structural metadata of the object is generated in T El Xlv[L (PS) to describe thc logical and physical structure of the object. With this description, the book can be browsed later and edited online with the ZEN D Table-of-Contents Editor. 4) If possible, a full text is generated by using an automated OCR-process. Today this is only Implemented for titles printed in Latin script. German Gothic or blackletrer scripts are problematic, in that there are no costeffective OC R programs for the creation of full text from these printing types, which were in use from the middle ages up to 1943. However, as soon as a cost-effective OCR solution is available, the objects can be retrieved ftOm the archival systcm and the images reprocessed with a gothk-OCR-software to create full text from these documents. As a last step, the digital master files for images and texts are transferred in their long-term stable file formats (rIFF and ~IJ~) to long-term archival systems in twO setups: to a Network Attached Storage System (NAS), which 243 Challenges and Experiences in the Mass Digitization ... currently has 165 terabytes of net capacity and to the Tape Library of the Leibniz Supercomputing Center in Munich-Garching, which is running under IBM Tivoli Storage Manager software. The files are al so stored in another copy in another location. With the progress of the mass digitiza tion of dam, we expect the following increases in the long- term archive: 2004 2005 20IlS '1!XJ7 2008 2009 2010 2011 2012 2013 Figure 18. MDZ data storage and long term preservation: Slarus and forecast. Today, with the continuous drop in prices for storage hardware, the large amount of data is not a challenge - but as we learnt there is a new one: the data management. New problems include the reading and writing speed of the hardware and the required network infrastructure. The entire network infrastructure from the BSB to the Lcibniz Supercomputing Center (LRZ) is co ntinuously connected by a I-gigabit Ethernet connection, which is expandable up to 10 Gigabit. It is important to note that within the ZEN D workAow, the retrieval o f the digital master files (fl FF) from the archive system has just been implemented. In 2010, the ~ IDZ started a project tOgether with the Bavarian Library Network and the LRZ for the implementation of the Rosetta Digital Preservation System from Ex Libris, which should suppOrt management of a OAIS-compliant digital archive. PrutlllOhon ond access After the final quality assurance in the ZEN D -sys tcm, the object will be published on the internet immedia tely. A message, that the object is ready on the l\ 1. Brant!, I. Schafer 244 \'feb and available under its persistent internet address, is scnt [Q the locallibrar), system online public access catalog (O PAq as well as to other reference systems (union catalog, Europcana, etc.). So the internet access is possible in various and multiple ways, including via the Digital Collections of BSB /ivID Z. catalog s),slcms, search engin es, elc. In addi tion, there is alaways a free and full pdf download available for every object, which enables the data Iransfer to a USCfPC and facilitates ofAine work. Furthermore, the export of the bibliographical metadata for fe-usc in other information environments is carned out, for example by an OA I interface for the Europcana or the World Digital ljbrary. ... --- -~ - europeana Ih i"\ cullu •• -_. _..-_,.---- • '- .-_._..... .... -_.- ~- -.. - .-- SUtCh 0_ 1"iis\Of\HIIbII W "" •• c..." OM Matu IibrI 8S! Cod QrMC ISl -.opiC •• ,... _ ........... _ .. _ ..... _ _ <OOr-'" ..,.,-". c .- _ _ _ -~ "---'--.---.-_ -, - _ .......... _ .. . "1 .. _- ...... -~ . -- Figure 19. Presentation of a manuscript in the Europt:·ana. In addition to providing data through vanous 2D e-hook viewers, since October 2009 the J\lDZ provides the possibility to read manuscripts and rare books online with a 3D -viewer. The 3D-viewer allows the user to see the entire book with its exterior binding in a 3D-model. Recently, the provision of data for mobile devices was added, including the presentation of 52 high-value manuscripts with iPad and iPhone applications. Challenges and Experiences in the ,.. ;," - II .'I ~ I ass Digitization ... 245 ~~ II ill " ~ Figure 20. iPad/iPhone application "Famous books: Treasures of the ila\':\rian State library". Figure 21. 3D version of an illuminated Renaissance manuscript. 246 M. Braml, I. Schafer Project examples of mass digicization of 16'h century books The Bavarian State Library, as part of Germany's virmal national library, coordinates the so called VD1 6, a bibliography of books printed in German· speaking countries in the 16th century. In 2006, the Bavarian State Library started the mass digitization o f its unique holdings from the 16th century as part of the project "VD 16/1 digitaJ". fu nded by the ore. For this project, based on the State of scanning technology of the time, the digitization of selected German printS from 1500 [0 15 t 7 was implemented with manually operated, conventional book scanners, equipped with additional book cradles. Due to the strict preventive conservation guidelines, 70 percent of the books had an allowable maximum o pening angle of 90-degrees. Therefore, all recto and ve rso pages had [0 be scanned separately in twO processes, using different cradles in o rder [0 respect the individual integ rity of the books. At the end of the scan ning process, the left and right pages were assembled imo the entire e-book b y a software prog ram. Figure 22. Manuall}' operated book scanner with angle bracket (\ scan = 1 image). Challenges and Experiences in the Mass Digitization ... 247 This production method caused reduced throughput and increased costs significantly. Another serious aspect was the especially high error rate. There is no pagination in most 16th century books. This created an additional complexity for the scan-operatOrs, even if the scan-software helped them in sorting the pages in sequen tial order. All these facts limited the throughput speed dramatically: In the 24-momh project duration, only approximately 4,300 books with approximately 600,000 pages could be digitized at three (manual) scanning workstations. Based on the VD16/ 1 experience, the search for alternatives to manual scan ning equipment started with the goal to find high-quaLiry and non-damaging scan ning technologies with high throughput. The resuh was a worldwide novelty: the use of automated book or robotic scanners for the scanning of 16th century rare books. In 2006, the Vienna-based company Trcvcntus developed a completely new and innovative scanning robot sys tem, the ScanRobot®. In 2007 this system was further modified as part of a development partnership with the IBR and the \\ IDZ, adapted to the specific requirements of rare books, with special consideration of high preventive conservation requirements. The scanning with the ScanRobot~ is carried out as follows: Whcn the scan starts, the scanning unit with a glass prism is gently lowercd into the opened book, which rests on a flexible book cradle adjuslable from 60 to 140-degrees. A prism with a slit in the middle is moumed on the top of the scanning llllit. While the scanning unit is moving slowly up, both left and right pages arc auached by a gentle air flow from the prism slit, and scanncd simul taneously in strips with short Rash imcrvals. When the scanni ng unit reachcs the end of the pages, a gentle air blast turns the pages to the left and the process startS again with the next two pages. Every turned page increases the books thickness on the left part of the book cradle and as a consequence the book moves in small steps from right to left. until the scanning is finished. In this process, the ScanRobotf.l operates the scanning and the mrning of the pages in a single operation. The stead), air Row between the paper and the device e nsu res an almost complete noncOntact scanning. As part of tbe development cooperation wi th Trevemus, the Scan Robot'" is continuously revised and updated for the specific characteristics of rare books, always meeting the bigh conservation re'luiremems. This led to very gentle mechanical handling b), the ScanRobot~. The robot is very sensitive to the material and technical individuali ty of rare books, and immediately StopS the scanning process if necessary to prevent damage. 248 i\-J. Brant!, I. Schafer Figure 23. ScanRobot with book cradIc (1 scan = 2 images). By using this innovative and careful scanning technology, in which two pages are digitally recorded with only one scan, about 9,000 works with over I million pages were scanned within 18 months (Brand, Ccynowa, Fabian el alii 2009). Conclusion and outlook [n digitization, scanning is only onc step in the long and complex process that leads to the presentation of books on the internet in the end. \'':lith scanning of manuscripts, rare books, and special collections, strict preventive conservation requirements and a high through put seem to be a contradiction at first. Due to their dose cooperation over the years, the Bavarian State Library's tvlunich DigitiZation Center and the Insti tute of Book and j\ lanuscript Conservation collected extensive experience ancl know-how about the implementation of digitization projects dedicated especially to valuable documents, and also in a wide variety of document formats. Furthermore, the MDZ and !BR have gained significant experience in mass djgitization, which in turn enabled new developments together with scanner manufacturers. T hus, cooperation has allowcd the optimization of existing scanning sysrcsns, as well as the creation of new s)'stems, whilc meeting strict conservation requirements. Beyond this especially in combination with new technological approaches - the new systems Challenges and Experiences in the \ Iass Digitization ... 249 are leading to lower production costs and will continue to reduce the COStS even more in the future. T herefore, our vision, tbat tbe entire II'nlten benlflge Ibal tbe 13(//1(1J1011 Sttlle Library bas collected aNd pmen'fd Ol'(r tbe rtlllllnes lI'il/ be proL'ided /'/(/ tbe illtemetto al/ illtemted partits for free, has a solid basis for becoming reality in a reasonable time. References Brant!, \ 1., "1\ lass Digiti%ation and Long-term Preservation. Processes and Production at \ Iunich Digitization Cenler", http://www.digitalesammlu ngen .de / md %/ con ten t/ service/docs / 2008-06_13 ran tl_en. pd f Brand, \\ 1., Ceynowa, K., Fabian, C, i\ lel3mer, G, and Schafer, I., "i\lassendigitalisierung deutscher Drucke des 16. Jahrhundens. Ein Erfahrungsbericht cler Bayerischen Staatsbibliothek", ZeitsdmJi flir Bibliotbtksllutff ,md Bibliogmpbie 56, (2009), 327 - 338. Ceynowa, K., "J\ lass Digitization for Research and Stud),: the Digiti za tion Strategy of the Bavarian Stalc Library", IPL/ I jOflrnal 35. 1 (2009), 17 -24, http://comm in fo.rutgers.edu / - tefko / Courses/ eS53/ Readi ngs/ Ceynowa%20 IFLA%20J%202009.pdf. Griebel, R., and Ceynowa, K. (cds.), 450 jnbre 13q;,tnsrhe Sltwtsbibliothek, BerlinNew York 2008. i\ lilne, R., "A Move from "Boutique" to i\lass Digitiz3tion: the Google Library Project at Oxford ", in R.A. Earnshaw, and J. Vince (cds.), Digital COlIl'ergellrt, Ubrants of lhe F'idflrt, London 2008, 3-10. Schafer, I., "Restaurieren fUr die Wissenschaft - Das Institul fur Buch- und Hand schriftcnrestauricrllng", in R. Griebel, and K. Ccynowa (cds.), 450 jalJl"t H(!),r1iscbe SlatltsbibliotiJek, Berlin-New York 2008, 225-240. Waters, Dj. , "What Arc Digital Libranes?", CLi R isSllts 4 (1998), http://www.clir.org/pllbs / issucs/ isslles04. html#dlf.