Elasticsearch Indices¶
Colectica Portal uses Elasticsearch for two optional features: full text, faceted search; and data extraction. Both these features are optional.
This document describes the Elastic search indices, documents, and properties used by Colectica Portal. For information on populating Elasticsearch, see Elastic Indexer.
Elasticsearch for Search¶
Index¶
Colectica Portal uses a single Elasticsearch index to power its search functionality. This index is named {prefix}_registered_item, where {prefix} is the value specified in the Elasticsearch:IndexName value of the appsettings.json configuration file. See Configuration for details.
Documents and Identification¶
The search index contains one Elasticsearch document for each registered item in Colectica Repository, except for Category and VariableStatistic items.
Only the latest version of registered items are included as Elasticsearch documents.
The Elasticsearch document identifier (_id) is the concatenation of the registered item’s agency identifier and identifier, representing the latest version of the item. For example, int.example:120adbcf-8e2f-406a-b8d0-64ec7cfcc5f6.
Properties¶
All documents contain the following properties.
- compositeId
The concatenated identifier of the registered item. agency:identifier:version.
- itemVersion
The numeric version number of the item.
- itemType
The type of the item. See Item Type Identifiers for a table of identifiers for all the supported item types.
- versionDate
The date and time the item was registered with the repository.
- DOI
The DOI identifier of the item, from the item’s DOI user id.
- uri
The uri identifier of the item in rdf, from the item’s uri user id.
- userIds
Any other identifiers of the item found in the item’s user id.
- parentIds
Identifiers of parent items of the registered item. This only includes items of type Group, StudyUnit, and PhysicalInstance.
- name_*
The name of the item, or the Title if the item has Dublin Core Citation information.
- label_*
The label of the item.
- description_*
The description of the item.
- dc_title_*
The title of an item that is described using Dublin Core.
- dc_subtitle_*
The subtitle of an item that is described using Dublin Core.
- dc_subject_*
The subject of an item that is described using Dublin Core.
- dc_identifiers
Any identifiers of the item found in the item’s Dublin Core identifiers collection.
- dc_contributors
Any contributors of the item found in the item’s Dublin Core contributors collection.
- dc_creators
Any creators of the item found in the item’s Dublin Core creators collection.
- dc_publishers
Any publishers of the item found in the item’s Dublin Core publishers collection.
- customFieldString
Any strings present in custom fields
- customFieldMultilingualString
Any multilingual strings present in custom fields
- extraText_*
Concepts, Universes, and Organizations that are associated with the item
- breadcrumbs
A model that includes labels and identification information about the item’s parents. For example, which which dataset a variable is contained within, or which survey instruments a question is part of.
Note
The wildcards shown above generally indicate fields that may be indexed in multiple languages. For example, label_* may actually have properties such as label_en, label_fr, and label_no. Each property’s Elasticsearch mapping definition specifies the corresponding language, to ensure proper text analyzing.
Special Properties for Series and Study Items¶
For Variable items, several additional properties are used.
- abstract_*
The abstract of the Series or Study
- purpose_*
In purpose of the Series or Study
- subjects
The subjects of the Series or Study
- subjects_cv_*
The subjects of a study when multilingual code values are used
- keywords
The keywords of the Series or Study
- keywords_cv_*
The keywords of a study when multilingual code values are used
Special Properties for Question and Variable Items¶
For Question and Variable items, several additional properties are used.
- questionText_*
The question text for the Question
- interviewInstructions_*
Any interviewer instructions contained within the question
Custom indexing for Items¶
Colectica version 6.2 and later allow for adding additional customized fields to the Elasticsearch index. Indexing addins can be created using the Colectica SDK and the IItemIndexer addin type.
Elasticsearch for Data Extraction¶
Note
Colectica may change the storage mechanism used to power its custom data extract functionality in the future.
See also
For details on Elasticsearch column store capabilities, see https://www.elastic.co/blog/elasticsearch-as-a-column-store
Elastic Index¶
An Elasticsearch index is created for each PhysicalInstance that has its corresponding data file ingested for the custom data extract functionality. The name of the index is {prefix}_physicalinstance_{identifier}.
{prefix} is the value specified in the Elasticsearch:IndexName value of the appsettings.json configuration file. See Configuration for details.
{identifier} is the identifier of the PhysicalInstance.
Documents and Identification of Data¶
Each row of the data file is indexed as an Elasticsearch document. The identifier of the document is the value of the in the columns corresponding to the variable(s) specified as the CaseIdentifier.
Properties for Data¶
Each variable in the PhysicalInstance is mapped as a property, with its type specified as either text or numeric. The content of each property is the datum at the appropriate row and column of the dataset that correspond to the Elasticsearch document and property, respectively.
Running on alternative versions of Elasticsearch¶
Colectica currently supports Elasticsearch version 7.x. Historically elasticsearch has not maintained compatible across their major versions.
Starting with Elasticsearch 8.x, the Elasticsearch server provides some backwards compatibility using a version header in its REST api calls.
To try to use a newer version of Elasticsearch with Colectica, you can enable this version header by setting the ElasticIndexer:EnableApiVersioningHeader setting to true in the ElasticIndexer and the Elasticsearch:EnableApiVersioningHeader setting to true in the Colectica Repository.