Metadata and semantic enrichment

(Gerald Hiebel, Edeltraud Aspöck)

    Main categories of physical reality, documentation and digitizing processes. Figure 1: Main categories of physical reality, documentation and digitising processes. Metadata created for TD resources and documented object. Table 1: Metadata created for TD resources and documented object.

To retrieve resources from the TD archive for archaeological research a data model was required to organise and integrate the metadata of the digital and analogue resources for querying. Entity relationship models have been used widely to model archaeological data in relational databases like Postgres, MySQL or MS Access. However, due to the complexity of archaeological documentation and the relations between physical objects, the processes excavating, documenting and interpreting them and the documentation created on the physical objects and the processes, the use of ontologies as data models increases in archaeology and semantic technologies are employed to implement the ontology and store the data.

We decided to use the CIDOC CRM ontology (Le Boeuf, P. et al. 2015) as the conceptual background to model TD resources, the physical reality they document and the process of creating digital documentation from analogue sources. The CIDOC CRM is an ISO standard for cultural heritage documentation and was extended in the past few years to model archaeological excavations (CRMarchaeo), scientific observation (CRMsci) and digital provenance (CRMdig) (CIDOC CRM 2016).

Analysing the available documentation and the process of digitising analogue sources we identified five main categories distinct in their nature: excavation areas, archaeological features and finds, documentation (analogue & digital), physical storage, digital secondary documentation (Figure 1).

In very simple terms it can be said that the physical reality of archaeological features and finds (archaeological objects created in the distant past) that were found in specific excavation areas (excavation objects such as squares and areas created by the archaeologists in the not so distant past) are documented in analogue and digital documentation. In the A Puzzle in 4D project we create homogeneous metadata to answer the following questions:

  • Which files document an excavation area? For example, which resources document the archaeological evidence and finds in area F/1, square j/21, Planum 3??
  • Which files document archaeological features/finds of a specific type (e.g. graves)?
  • Which files document a specific archaeological feature/find (e.g. grave 5 and walls in area F/1 square j/21)?
  • Which archaeological features/finds of a specific type and material are documented in an excavation area?
  • What stratum was assigned to archaeological features/finds?

In order to create homogeneous metadata we had to define processes to create identifiers for instances of the main categories. That meant we had to find out in what context is a name of the documentation unique and add the necessary context to the name. For example, “grave 5” is unique within a square such as „j/21”, which is unique in area “F/1”. Hence, the unique identifier for grave 5 would be “TD_F1_j21_grave5” (TD being acronym for the site). The naming conventions used in the TD excavations were very elaborated and allow creation of unique identifiers if context information is added. For archaeological features and finds that were not given names in their local context we assigned “puzzle4D” – identifiers. For resources that had not been assigned the conventional TD identifiers (such as inventory numbers etc.) during excavation we applied the same procedure and added the new identifiers on the analogue documentation or named the files with the identifiers in case of digital documentation. The thesaurus was extended with terms for documentation and digitising processes. Up to October 2017, metadata for the following resources, physical and conceptual objects has been created (Table 1).

The main aim of the remaining 2,5 years of 4DP is to create a workflow and metadata entry forms for all types of TD digital and analogue resources.