Development of a Topographical Transcription Method Introduction
Transcription
Development of a Topographical Transcription Method Introduction
Development of a Topographical Transcription Method Introduction In the past years, digitization was about transforming analog documents into a digital representation. Problems in respect of color management, lightness scale, resolution and geometrical distortion had to be and have been solved. Today, the necessary methods can be considered to be well elaborated and satisfactory for the creation of digital images of analog documents. Thus, numerous digitization initiatives led to the formation of web-portals making available digital facsimiles, corresponding metadata and tools to search and browse. However, only few tools are available to uncover the full potential of these digital facsimiles with respect to their use in humanities research. Noticing this deficit, we are developing SALSAH, a Virtual Research Environment (VRE) for the humanities. SALSAH (System for Annotation and Linkage of Sources in Arts and Humanities) is a collaborative research platform allowing for the visualization, the annotation and the linkage of digital resources in the humanities. The application is completely webbased and renders possible the usage of digital resources in humanities research directly on the web. This way, the research data such as annotations, linkages etc. emerges in a borndigital form. When thinking about digital resources such as digital facsimiles and methods to make use of them in the humanities, it is also necessary to think of methods for transcription and text constitution in the digital medium. Furthermore, we have to consider the digital facsimile to be a representation of an analog document offering text but possibly also pictorial information such as illustrations etc. Putting the focus on pictorial aspects, we can also conceive text as being pictorial in the first place. For this reason, we are developing a topographical transcription method for digital facsimiles within SALSAH. This article first briefly describes the general purpose, the functionality and the data model of SALSAH. It then presents general thoughts about the benefit of digital facsimiles and describes the topographical transcription method currently being developed as an extension of SALSAH’s core functionality. SALSAH SALSAH has been developed at the Imaging & Media Lab of the University of Basel since summer 2009. It has originated in an art historical context and comprises further humanities disciplines today. Besides the “Narrenschiff”-project1, SALSAH is used by two edition projects: the “Anton Webern Gesamtausgabe”2 and the “Kritische Robert Walser-Ausgabe”3. SALSAH is designed as a general VRE for the humanities (Schweizer 2011: 147ff.). Currently, SALSAH offers methods to work with digital facsimiles (the support of audio and moving image is already planned). SALSAH offers the functionality to: - visualize various digital resources simultaneously - annotate digital resources and to share these annotations collaboratively 1 Together with Prof. Barbara Schellewald, Institute of Art History, University of Basel 2 Institute of Musicology, University of Basel 3 Institute of German Philology, University of Basel - create links between digital resources and to annotate them create Regions of Interest (ROI) within digital resources and to annotate and link them access external repositories of digital resources and to apply SALSAH’s functionality to them By the use of SALSAH’s annotation and linkage functionality, the research data emerges in the digital medium and is directly connected to the digital resources it refers to. Figure 1 shows an example out of the art historical “Narrenschiff”-project. The elliptical shapes represent digital objects (here: books and pages) while annotations are indicated by rectangles. The arrows show how the digital objects and annotations are related to each other. The book is characterized by two annotations: title and date of publication. A book is a compound object which means that it consists of other objects: single pages. Each page belonging to the “Narrenschiff “ thus refers to the digital object representing the book. Like the book itself, each page can be annotated4. Because pages have a certain order, they are annotated with a pagination. Furthermore, each page can be described by composing a page description. By creating links between digital objects, relations between them can be expressed. Each link is again treated as a digital object that can be annotated (here with a description). In this way, the link’s semantics can by expressed. By annotating and linking digital objects, the research knowledge emerges as a network-like structure which can be browsed and extended by other Title: Das Narrenschiff Date of Publication: 3rd March 1495 Book Page Page Page Pagination: 1 verso Link Page Description: This page shows a fool Description: Interesting illustrations of a fool Figure 1 Structure of the Research Data within SALSAH researchers. By working on the same digital corpus, humanities researchers can collaborate with each other – either within a working group or even in an interdisciplinary setting. The annotations and even the digital objects available can be defined specifically for each project. Each project within SALSAH can define the semantics of its digital objects and which annotations they may have. Digital objects may have a digital representation (that is digital data representing some physical aspects of the analog object such as the digital data of 4 In terms of the data model, annotations and metadata are not distinguished: metadata are also annotations. But in the Graphical User Interface (GUI), metadata will be presented seperately from the annotations since they are quite definite while annotations can be regarded as more subjective and thus open to discussion. a digital facsimile represents the local reflectance of a page of text), but they can also be abstract constructs (e.g. a person which may characterized by name and birthdate, but there is no further digital data representing the physical aspects of a person). For digital facsimiles, we have recently developed a method to define Regions of Interest. These regions are geometrically described areas on the digital facsimile and can be annotated and linked like other digital objects. This functionality renders possible the direct referencing of parts of pictorial resources. Figure 2 Creation of a Region of Interest Figure 2 shows the creation of a Region of Interest consisting of two polygon shapes. The region can be annotated with a comment. Art historians could describe specifically defined areas of pages of the “Narrenschiff” by using this functionality. Each region consists of one or more geometrical shapes and annotations which can be configured according to the research project’s needs. All of this functionality can also be applied to remote resources not stored in SALSAH’s local database. We have already implemented a connection to the assets of the e-codices-project5. Due to SALSAH’s flexible data model, the facsimiles of e-codices can be annotated as if they were locally stored in SALSAH. But in fact, only the annotations created in SALSAH are stored locally, the remote facsimiles are referenced in the SALSAH database. SALSAH is thus designed as a shared system: remote resources can be accessed and annotated within the SALSAH environment. On the other hand, all the annotations and links stored in SALSAH could be made available to the outside by implementing an interface accessible via a web service. We are currently working on an interface to export SALSAH’s data. This method would also allow for online connections. For example, the e-codices website could then indicate if there are annotations created by SALSAH for certain manuscripts. 5 The project can be accessed here: http://www.e-codices.ch. It currently (last access: 23rd November 2011) encompasses 833 manuscripts from 34 different libraries. Transcribing Digital Facsimiles Having digital images of analog sources available and digital tools to address them, we are able to conceive a method to transcribe digital facsimiles and subsequently to constitute texts. First, a brief outline about the importance of facsimiles shall be given. Then a method will be described which allows for the creation of transcriptions directly in the SALSAH environment. Importance of Facsimiles The digitization of analog documents makes them available as digital images respectively digital facsimiles. These digital images represent the analog documents with reference to their visual appearance. This allows for the examination of illustrations and all other kinds of pictorial elements contained in these documents. Unlike text-based representations, digital images represent the original material in a non-abstract way6 not presuming the separation of textual information from the document itself by identifying textual characters. In the (digital) facsimile, the surface of the document and the textual information are still one entity (Gabler 2007: 198). Taking the example of the Burgunderchronik of the XV century scribe Diebold Schilling from Bern, the most easily accessible edition is the purely text-based edition of Gustav Tobler (Tobler 1897 and Tobler 1901) presenting the manuscript kept in Zurich (known as the Zürcher Schilling) in the edition text and the official chronicle from Bern (known as the Berner Schilling) as a variant in the critical apparatus. The illuminations of both manuscripts are only briefly described in a register in the appendix of the edition. The assumptions of the editor seem to have been that the text of the Zürcher Schilling is more authentic because it is thought of as a more original version while the text of the official Berner Schilling is conceived as a censored copy (Tobler 1901: 347). So far, the editor’s interest is not orientated towards the documents themselves (their reception etc.) but to find somehow the best text available. As a consequence, both manuscripts are presented in one edition (implying that they are manifestations of the same text) but not without building a hierarchy between them (the text of one manuscript is presented in the edition text, the other manuscript’s text is presented in the critical apparatus). Having a look at the printed facsimile editions existing for both manuscripts (only available in few libraries and archives), the overall impression of the two manuscripts is very different. While it can be said that they offer a very similar text where they converge7, the illuminations offered by the two manuscripts are of very different kind and thus constitute very different relations between texts and pictorial elements significantly influencing the perception of the manuscripts. Besides having a look at the original documents, only facsimile editions reveal these aspects. But since these print editions are high priced and not widespread, their benefit is limited8. Looking at the younger history of editing in German philology, we can see a paradigm change towards an edition technique consequently integrating facsimiles in the seventies. The Frankfurter Hölderlin-Ausgabe (FHA) realized by Dietrich E. Sattler (Sattler 1975-2008) applied a novel way of editing. Instead of presenting the constituted text as the edition text accompanied by a critical apparatus containing its variants, this edition made visible the entire analytical process beginning with the facsimile and ending with a constituted text (Martens 6 Of course, also the making of digital images can be conceived as an abstraction from the original implying decisions about perspective, resolution, color adjustment etc. manuscripts are of different temporal extent. 8 In fact, e-codices has already digitized the Berner Schilling. Once it is made available on their website, the accessibility of this manuscript will be unproblematic. 7 The 1982: 52ff.). The edited text representing the final state in the constitution process can thus be conceived as the result of an analytical process openly presented to the reader via the consequent integration of facsimile, their diplomatic transcription and a phase analysis (Martens 1982:53f.). The integration of the facsimile in the edition ensures the transparency of the analytical process the edition has undertaken. The reference to the facsimile also emphasizes the status of the document the transcription and process of text constitution are based on (Gabler 2007: 199). Topographical Transcription Method The transcription method being developed in SALSAH is orientated topographically. The transcription process begins by defining visually coherent areas on the digital facsimile (using SALSAH’s functionality to create geometrical figures on the facsimile) to be encoded into textual characters line by line. Manuscripts possibly don’t offer one overall text area but several distinct areas of textual information (text blocks, annotations, notes, marginalia, glosses etc.). Addressing them topographically renders possible their individual transcription. By transcribing these areas line by line, the correspondency between the encoded text and the facsimile is sustained. Figure 3 Diplomatic Transcription of a Page of the “Narrenschiff” in SALSAH Figure 39 shows SALSAH’s transcription tool still being in an early state of development. On the left hand side, the facsimile is displayed. The regions defined on the facsimile are shown accordingly on the right side as rectangle shapes. Each of these rectangle shapes offers an editable area where the transcription of the corresponding part on the facsimile can be entered. While the facsimile can be feely zoomed and panned, the transcription area on the right side always shows the whole page because it is thought of as a typification of the textual information given by the facsimile. While the facsimile is conceived as an image (even though offering textual information), the transcription area on the right side requires the 9 This is an example out of the „Narrenschiff“ which often combines textual and pictorial information. The transcription method will be used soon in the Weber-project to transcribe supplement material like letters, notes etc. encoding of the transcription as textual characters. By doing this area per area, the sequential relation between the areas can be left open in the first place. The characters within the areas have to be entered in a linear order, but such an order is not presupposed between the single areas themselves. To the transcription text of each area properties can be assigned. Similar to a word processor, the user is able to make a text selection and to choose a property (like bold, underline etc.). Because SALSAH is designed as a generic and general system for the humanities, the available properties can be defined specifically for each project. These properties represent visual attributes of the transcribed text. Furthermore, structural relations can be defined – either within a single transcription area or in between several such areas. In the current state of development, we are thinking of the two basic operations insertion and deletion which could then be combined to a substitution or a transposition. In practice, it would be possible to express textual dynamics by defining structural relations resulting in alternative sequences of textual characters. For example, we could think of the overwriting of characters by others. We would then have an initial set of characters which then would have been substituted by others. Or we could think of additional text which could be considered as an insertion. The transcription of a facsimile as described before possibly offers more than one linear text. The sequential combination of transcription areas and the definition of structural relations10 (deletions, insertions, substitutions, transpositions) allow for the building of multiple readings. A reading is thought of as an unambiguous sequence of characters representing a certain interpretation of the facsimile. Each reading may be annotated with a comment etc. by other researchers and various readings could be interrelated to each other in order to express their semantic difference. For example, several readings could represent different states in the genesis of a text. These different states are based on the analysis of corrections (insertions, deletions, substitutions, transpositions) present in the digital facsimile. Each reading built by using this transcription method is transparent because is can be backtracked to the diplomatic transcription which is directly related to the facsimile. The described method is not a special tool offered by SALSAH but an integral application of its annotation and linkage possibilities. That way, the constitution of texts representing the content of documents can be seen as a task not fundamentally different from the constitution of research knowledge among sources within SALSAH as a VRE. As any other form of knowledge within SALSAH, the process of transcribing and the constitution of readings can be reconstructed in their generation as well as criticized by annotation. Bibliography GABLER Hans Walter (2007), ‘The Primacy of the Document in Editing’, Ecdotica, vol. 4, pp. 197-207. MARTENS Gunter (1982), ‘Texte ohne Varianten? Überlegungen zur Bedeutung der Frankfurter Hölderlin-Ausgabe in der gegenwärtigen Situation der Editionsphilologie’, Zeitschrift für deutsche Philologie, vol. 101, Sonderheft: Probleme neugermanistischer Edition, pp. 43-64. 10 Already a simple insertion offers two alternative sequences: a reading without it and another including it. SATTLER Dietrich E. (ed.) (1975-2008), Friedrich Hölderlin. Sämtliche Werke. ‘Frankfurter Ausgabe’, Frankfurt am Main, 20 vols plus supplements. SCHWEIZER Tobias, ROSENTHALER Lukas (2011), ‘SALSAH – eine virtuelle Forschungsumgebung für die Geisteswissenschaften’, in Konferenzband. EVA 2011 Berlin. Elektronische Medien & Kunst, Kultur, Historie, Berlin, pp. 147-153. TOBLER Gustav (ed.) (1897-1901), Die Berner-Chronik des Diebold Schilling 1468-1484, 2 vols, Bern.