4_Documentation and metadata
Transcription
4_Documentation and metadata
Documentation and metadata are all about clarity and context. What is the data, and what is the story of its creation? Researchers submitting good documentation and metadata enable archives to create rich and detailed metadata that allows people to get the best from data. Good, detailed, comprehensive data documentation leads to good data re-use. This is good for science and good for your archive. Documentation Documentation is contextual material generated in the course of data creation, analysis, and data archiving. Without it, the reuse of data is impossible because in order to receive valid answers to new research questions, it is indispensable to know the who, what, where, when and why of the data’s creation. A rich set of documentation should therefore include information on Objectives of data creation – hypotheses, operationalization, and funding proposals. Methods of data collection – questionnaires, interview schedules, instructions, consent procedures. Structure and relationship of files – organization of data, and naming conventions. Quality assurance procedures – problems encountered and solutions applied, cleaning and data verification procedures. Data manipulations – anonymisation undertaken; recoding. The European Values Study Longitudinal Data File 1981-2008 (EVS, 2011) provides a good example of study documentation. Metadata Good metadata enables data discovery and data re-use. That is why metadata is critical to archives and repositories. As an agreed set of standards, it allows humans and machines to discover, comprehend, and evaluate data across time and distance without having to access the data itself. Furthermore, it facilitates bibliographic data citation and is indispensable in the long-term curation of data, providing information on provenance, technical, and legal aspects. The bulk of metadata associated with archived datasets is created in the ingest process. However, this metadata is not static and can undergo changes even after a resource has been archived – for example to document any preservation action involving changes to the data (e.g. migration to a different format) or to document changes made to correct mistakes. CESSDA AS CESSDA House Parkveien 20 5007 Bergen NORWAY phone: +47 55 58 21 18 e-mail: cessda@cessda.net www.cessda.net Consortium of European Social Science Data Archives Documentation and Metadata The main types of metadata are: Descriptive: provides information on the intellectual content of a data collection. Researchers are critical to producing descriptive metadata, providing information on the fundamental nature, structure, context and sources of data. This metadata can take different forms, applicable at study, case, and variable level. Administrative: includes information that helps archives and repositories ingest and manage data for preservation. Archives and repositories, along with researchers are important creators of administrative metadata. Examples of administrative metadata include recording data formats, copyright ownership and terms of re-use licenses. As the National Information Standards Organization (NISO) points out, rights management metadata and preservation metadata are subsets of administrative metadata often listed separately (NISO, 2004). Structural: concerns (physical or logical) links between objects or between parts of a complex object. For example, the relationship between variables and cases in a dataset, different waves in a longitudinal study, pages in an interview, chapters in a book, images in a video. Technical: gives information about mime types, file formats, file size, encoding, or storage. Preservation Description Information In the OAIS reference model, an important function of metadata is to provide Preservation Description Information (PDI). PDI is all the information needed to preserve data. At the same time, PDI has the capacity of engendering trust, as it focuses particularly on “describing the past and present states” of a given resource, “ensuring it is uniquely identifiable, and ensuring it has not been unknowingly altered” (CCSDS, 2012, p. 4-29). PDI comprises of five different types of metadata (see box). Preservation Description Information Reference Information: information that helps unambiguously identify a resource, for example, a DOI®, an archive internal study number, call number, ISSN, or bibliographic description. Context Information: information on how a resource relates to its environment. This includes information on related documents or datasets, but also reasons the resource was created. Provenance Information: information documenting changes made to data since its creation. Typically, this takes the form of “a set of accumulative, chronologically ordered records that describe the events in the life of the content data” (Factor et al., 2009, no pag.). Fixity Information: provides means to detect unauthorized (i.e. undocumented) changes to a resource, for example by means of checksums or hash functions. It enables archives to monitor the stability, integrity, and authenticity of their digital assets. Access Rights Information: information on the license and legal conditions under which resources are preserved, accessed, and disseminated. Metadata schemes Metadata regarded as “suitable” and necessary for describing a resource can differ depending on subject discipline, type of resources to be described, intended uses or user communities. Accordingly, many types of metadata schemes (also referred to as “element sets”), exist for differing needs and disciplines. The semantics of a scheme provides definitions and meaning for each element. In addition, there are established rules as to formulating content and representation (for example, capitalization of terms). There may also be a controlled vocabulary for the values of elements.: 2 Here are some examples for metadata schemes: Data Documentation Initiative (DDI; http://www.ddialliance.org/): social science archiving Dublin Core® Metadata Initiative (DCMI; http://dublincore.org/): focuses on networked resources DataCite Metadata Schema (http://schema.datacite.org/): metadata properties chosen for the accurate and consistent identification of data for citation and retrieval purposes Metadata Encoding and Transmission Standard (METS; http://www.loc.gov/standards/mets/): for Librarianship Preservation Metadata Maintenance Activity (PREMIS; http://www.loc.gov/standards/premis/): Data Dictionary for Preservation Metadata ISO 19115 (http://www.iso.org/iso/catalogue_detail.htm?csnumber=26020): Geographic data and associated services, spatial-temporal, data quality, access and rights to use ISO11179 (http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=35343) : organizational metadata. Managing elements in a registry to create a common understanding of data across and between organizations Statistical Data and Metadata Exchange (SDMX; http://sdmx.org/): statistical data standard and harmonization This list is far from comprehensive. Furthermore, these schemes are not mutually exclusive. For example, DDI aligns with other metadata standards like Dublin Core, ISO1179, SDMX, and ISO 19915. ISO 19115 influences the coding of the spatial coverage of a study, and SDMX maps to DDI3 because its fields are closely related. References CCSDS. (2012). Reference Model for an Open Archival Information System (OAIS). Recommended Practice (No. CCSDS 650.0-M-2). Retrieved from http://public.ccsds.org/publications/archive/650x0m2.pdf EVS. (2011). European Values Study Longitudinal Data File 1981-2008 (EVS 1981-2008), Version 2.0.0. GESIS Data Archive, Cologne. ZA4804. doi:10.4232/1.11005 Factor, M., Henis, E., Naor, D., Rabinovici-Cohen, S., Reshef, P., Ronen, S., … Guerico, M. (2009). Authenticity and Provenance in Long Term Digital Preservation: Modeling and Implementation in Preservation Aware Storage. Retrieved from http://static.usenix.org/event/tapp09/tech/full_papers/factor/factor.pdf National Information Standards Organization (NISO). (2004). Understanding Metadata. Retrieved from http://www.niso.org/publications/press/UnderstandingMetadata.pdf 3 Further Reading Bailey, Jefferson (2012): File Fixity and Digital Preservation Storage: More Results from the NDSA Storage Survey. In: The Signal. Digital Preservation. http://blogs.loc.gov/digitalpreservation/2012/03/file-fixity-and-digital-preservation-storagemore-results-from-the-ndsa-storage-survey/ Zwaard, Kate (2011): Hashing Out Digital Trust. In: The Signal. Digital Preservation. http://blogs.loc.gov/digitalpreservation/2011/11/hashing-out-digital-trust/ This work is licensed under a Creative Commons Attribution 4.0 International License. 4