Preservation Metadata for Crystallography Data, JISC eCrystals Federation Project, WP4: Repositories, Preservation and Sustainability

Manjula Patel

    Research output: Book/ReportCommissioned report

    65 Downloads (Pure)


    The aim of the eCrystals Federation project is to enhance the management of crystallography data at the institution level, incorporating data generated in departments, laboratories and by individual researchers or practitioners. WP4 of the project is concerned with the development of approaches to the preservation and curation of crystallography data in open repositories. In terms of the crystallography community, the long-term provision of data is particularly important since structure determination can only be truly repeated or verified when the raw data is available. In addition, the availability of raw data is extremely useful for reanalysis and reprocessing as improved methods for performing these tasks emerge.
    Metadata is essentially any information that documents the characteristics and attributes of a resource. The term is often defined as “structured data about a resource”. A metadata vocabulary supports a wide variety of functions, for example description, identification, discovery, retrieval, rights management and preservation. Metadata is consequently pivotal in the management of all types of resources, helping to ensure that they will survive and continue to be accessible and usable into the future. A structured set of metadata elements is normally organised into a schema, representing a data model and the attributes associated with the entities within it.
    We consider that preservation activities should be viewed as an integral part of sound data management practice. As a result, metadata that supports curation and preservation should be embedded into the core metadata for managing crystallography data. However, metadata creation, capture and maintenance is generally regarded as being tedious, time-consuming, subjective and labour-intensive and therefore very costly. Given the critical role that metadata plays in the management and curation of electronic resources it is important to craft the structure and architecture of the metadata correctly from the outset so that adequate and appropriate information is recorded.
    We examine work that has already been done in the area of preservation metadata, in particular the influence of the Open Archival Information System (OAIS) Reference Model (PDI – Preservation Description Information) and the PREMIS (PREservation Metadata: Implementation Strategies) working group which has produced a Data Dictionary for Preservation Metadata – a core set of preservation metadata i.e. “the information most preservation repositories need to know to preserve digital materials over the long-term”.
    Preservation metadata is information that supports and documents the digital preservation process. It is sometimes considered a subset of technical or administrative metadata and incorporates:
    Provenance: Who has had custody or ownership of the digital object?
    Authenticity: Is the digital object what it purports to be?
    Preservation Activity: What has been done to preserve the digital object?
    Technical Environment: What is required to render and use the digital object?
    Rights Management: What intellectual property rights must be observed?
    The primary aim of preservation metadata is to support preservation activities; consequently, differing preservation strategies are likely to demand that distinct types of information be recorded. For example, a preservation plan based on migration activities will require different information to that of one based on emulation. Hence, the preservation plans and policies of a particular repository will heavily influence the additional specific metadata that is to be recorded.
    The technical aspects of digital curation and preservation are only one facet of a multidimensional problem; curatorial issues further encompass social, cultural, political,organisational, financial and legal factors as well. Community consensus and the development of standards and guidelines for best practice underpin the longevity, effective management, preservation, sharing and reuse of science data. To this end, a collaborative venture named Towards an International Data Commons for Crystallography (TIDCC) emerged as a result of discussions between participants in the TARDIS (The Australian Repositories for Diffraction Images), eCrystals Federation and DataMINX Projects and the Australian Research Council’s Molecular & Materials Structure Network (MMSN) in September 2008. The intention of the TIDCC is to develop a community derived metadata schema capable of describing all types of crystallography data related to single crystal diffraction.
    We use the notion of an Application Profile (AP) in contemplating the metadata required to manage and preserve crystallography data, building on several schemas including that of the eBank-UK AP; the TARDIS schema and the CCLRC Scientific Metadata Model (CSMD). The purpose of an AP is to adapt or combine existing schemas into a package that is tailored to the functional requirements of a particular application, whilst retaining interoperability with the original base schemas. This offers the potential for digital materials to be accessed, used and curated effectively both within and beyond the communities in which they were created.
    The TIDCC Metadata Application Profile (TMAP), which is presented in an Appendix, was a first attempt at constructing an over-arching AP for crystallography data which would facilitate the exchange of not only metadata, but also the data itself. However, following several meetings is has become apparent that a more effective way forward is to adapt the ICAT data model (a simpler version of the CSMD) and schema to cater for curatorial and preservation activities since ICAT is presently being used by a growing proportion of the science community for managing their data. As of the completion of this report, the work is still in progress; it is expected that the preservation metadata proposed in the TMAP will feed into the new development.
    It is clear that the crystallography community recognises the importance of high quality metadata for all the functions that it can support, including the long-term accessibility and reuse of scientific data. Although there is still a considerable way to go along the path to formulating community agreed metadata for the curation and preservation of crystallography data, the work outlined in this report proves that the crystallography community appreciates the benefits and does not lack the motivation to achieve such as goal.
    Original languageEnglish
    PublisherNational Crystallography Service, University of Southampton
    Number of pages35
    Publication statusPublished - 3 Sept 2009

    Bibliographical note

    Report originally published on eCrystals Federation Project wiki


    • crystallography data
    • digital curation
    • preservation


    Dive into the research topics of 'Preservation Metadata for Crystallography Data, JISC eCrystals Federation Project, WP4: Repositories, Preservation and Sustainability'. Together they form a unique fingerprint.

    Cite this