Storage

Identification

The Colectica Repository is modeled on the ISO/IEC naming and identification principles outlined in ISO 11179-5 (2005). When an administered item is placed into the repository, it is registered with its international registration data identifier (IRDI). The IRDI is made up of three components which are referred to as the identifier triple throughout the Colectica platform. It consists of three parts:

Registration Authority Identifier (RAI)

When the Colectica repository is used with DDI3, the registration authority identifier comes from an organization’s DDI agency id. Currently, the DDI Alliance issues these identifiers upon request. Other RAI issuers such as ARK and DOI are also supported, but use a different resolution system. A single Colectica repository can host multiple Registration Authorities simultaneously.

Note

Implementers Note: The repository can also host read-only copies of items for which it is not the RAI. This allows for a construction of a complete object graph of all related metadata, even if some administered items reference external resources.

Data Identifier (DI)

The Data Identifier used in the Colectica repository is based on ISO/IEC 9834-8 which describes the creation of globally unique identifiers. Since these identifiers are guaranteed to be unique, it allows users to create their own identifiers without a centralized DI issuance service within the organization. This enables offline use of all Colectica tools and enables federated Colectica repositories within or across Registration Authorities (RAI).

Since many organizations already have non ISO 9834-9 identifiers or use other unique id systems such as ARK, OID, or DOI/Handle, the Colectica Repository can create mappings between unique id systems. When using DDI 3.1 with the repository, these identifier mappings are serialized in the DDI’s UserID elements for persistence along with the identification system used.

DDI 3.1 Note: Mappings from ISO 9834-9 identifiers also ensures valid DDI 3.1 generation since DDI restricts the characters allowed in the identifier. Identification systems like ARK and DOI must be mapped to UserIDs to validate with the schema since they make use characters not valid in DDI identifiers.

The Colectica repository uses a write once scheme to ensure provenance tracking and data protection. Each change to an administered item therefore increases its revision number within the repository. The revision number corresponds to the ISO Version Identifier. The repository has the capability to retrieve the latest version or version histories of administered items.

Note

Implementers Note: Many organizations may use version numbers in different ways, such as incrementing the version of a survey instrument when a data collection wave is done. The Colectica repository supports this type of change recording through the use of tags. Tags in the Colectica repository work the same way as tags in a version control system. The user can specify a tag, such as 1.1 or Beta 2, and that tag will be associated with the correct revisions of administrated items in the repository. In this way, both the write once and an organization’s own version control can be accommodated.

Serialization

Administered Items are stored in the Colectica Repository in an XML serialization. Any format can be accommodated. The repository is currently used to store DDI 3.1 for the Colectica Designer.

When an item is registered with the repository, it may be deserialized and indexed for identification, relationships, and searching. This deserialization is built in for DDI 3.1 items and includes multilingual support. Repository plugins can be made for other standards or item types.