Data Concordance¶
Contents
Overview¶
Colectica allows you to describe how variables measure the same information at different times or among different populations.
Metadata Structure for Variables¶
Colectica uses three levels of items to describe how variables from different points in time or different datasets correspond to each other.
Item Type |
Description |
---|---|
Variable |
A column in a dataset
|
Represented Variable |
Describes how a variable is measured; the data
type. This may be consistent across rounds, or may change.
|
Conceptual Variable |
Describes a measurement of a person, firm, or other thing,
without specifying the data type. The most
generic way to describe something that is measured.
|
Consider a dataset that measures marital status in three different years: 2000, 2005, and 2010. The dataset may look like:
ID |
marstat2000 |
marstat2005 |
marstat2010 |
---|---|---|---|
1 |
Married |
Divorced |
Married |
2 |
Divorced |
Divorced |
Divorced |
3 |
Married |
Married |
Widowed |
In the first two years, the data was represented by two choices: Married and Divorced. In the third year, the data contain a new option: Married, Divorced, and Widowed. Two different representation types are used for this variable over time.
This dataset can be documented with three variables (aside from
the ID): marstat2000
, marstat2005
, marstat2010
. Since a
variable corresponds to a single column in a single data file, three
variables are necessary.
Since there are two representation types, we will use two represented
variables; let’s call them marstat
and marstat-plus
.
Finally, a conceptual variable is used to describe the common
information among the three variables. This can be named marstat
,
and should be referenced by both the represented variables.
The following diagram visualizes these items and their relationships.
Variable Concordance Views in Colectica Portal¶
By specifying the variables in this way, Colectica Portal is able to create concordance views that show a comparison the variables.
For coded variables, Portal can also show a comparison of the codes used for each variable.
Metadata Structure Used to Define Concordance¶
In Colectica Portal, the Explore page shows concordance tables that allow users to browse for variables by topic across many datasets. To enable this functionality, information can be created and published following the structured described here.
See also
For information on configuring the concordance tables in Colectica Portal, see Configuration.
Organizing by data file with one file per round¶
Colectica uses the following metadata structure to display the Explore page.
- Series (Group, in DDI terms)
- Metadata Package (1..1)
- Concept Set (1..1)
Concept (0..n)
- Conceptual Variable Set (1..1)
- Conceptual Variable Group (1..1) (container group)
Conceptual Variable Group (0..n)
- Study (0..n)
- PhysicalInstance (0..n) (PhysicalInstance, in DDI terms)
- Data Relationship (1..1)
Variable (0..n)
- Series
The Series is used to organize a collection of Studies that are related in some way. This could be a series of surveys or a collection of datasets that are related in some way.
- Metadata Package
The metadata package is the container for much of the metadata used to define concordance.
- Concept Set and Concepts
The concept set is used to define the navigation that is displayed on the left side of the Explore view.
- Conceptual Variable Set and container Conceptual Variable Group
There must be a single Conceptual Variable Group within the Conceptual Variable Set. This group acts as a container for the next levels of group. The group structure inside the container should mirror the Concept Set described above. The Conceptual Variable Groups under the container Conceptual Variable Group must reference DefiningConcept`s that exist within the `Concept Set.
- Study
Studies under the Series are used to organize a repeated project into rounds or waves. Each Study can have one or more data files within it.
- PhysicalInstance and DataRelationship
The PhysicalInstance and DataRelationship describe a data file.
- Variable
Variables should use the metadata structure for variables, described above.
The concordance tables are created with the following logic:
One table is built per Conceptual Variable Group
Each Conceptual Variable in the group gets a row
All variables that reference the Conceptual Variable, or that reference a Represented Variable that references the Conceptual Variable, are gathered
For each gathered Variable: - All Physical Instance`s that reference the `Variable are gathered
One column is created for each distinct gathered PhysicalInstance
The content of each Conceptual Variable-to-Physical Instance cell are filled with links to Variables that that reference the Conceptual Variable, and that are referenced by the PhysicalInstance.
Organizing with a single data file for all rounds¶
When all rounds are stored in a single data file, Colectica uses this metadata structure to display the Explore page.
- Series (Group, in DDI terms)
- Metadata Package (1..1)
- Concept Set (1..1)
Concept (0..n)
- Conceptual Variable Set (1..1)
- Conceptual Variable Group (1..1) (container group)
Conceptual Variable Group (0..n)
- PhysicalInstance (1..1)
- Data Relationship (1..1)
Variable (0..n)
- VariableGroup (1..1) with Group Type set to “Rounds”
Variable (0..n)
- Series
The Series is used to organize a collection of Studies that are related in some way. This could be a series of surveys or a collection of datasets that are related in some way.
- Metadata Package
The metadata package is the container for much of the metadata used to define concordance.
- Concept Set and Concepts
The concept set is used to define the navigation that is displayed on the left side of the Explore view.
- Conceptual Variable Set and container Conceptual Variable Group
There must be a single Conceptual Variable Group within the Conceptual Variable Set. This group acts as a container for the next levels of group. The group structure inside the container should mirror the Concept Set described above. The Conceptual Variable Groups under the container Conceptual Variable Group must reference DefiningConcept`s that exist within the `Concept Set.
- PhysicalInstance and DataRelationship
The PhysicalInstance and DataRelationship describe a data file.
- Variable
Variables should use the metadata structure for variables, described above.
- VariableGroup
Variables can be organized into groups. These groups can be used to organize variables collected in the same round of a survey. The Group Type should be set to “Rounds” to indicate that the variables in the group are collected in the same round.
The concordance tables are created with the following logic:
One table is built per Conceptual Variable Group
Each Conceptual Variable in the group gets a row
All variables that reference the Conceptual Variable, or that reference a Represented Variable that references the Conceptual Variable, are gathered
One column is created for each distinct VariableGroup
The content of each Conceptual Variable-to-VariableGroup cell are filled with links to Variables that that reference the Conceptual Variable, and that are included in the VariableGroup.
Organizing by variable sets¶
In addition to supporting concordance tables as described above, concordance tables can also be built based on an additional metadata structure. This allows concordance tables to be built based on VariableGroups, instead of requiring PhysicalInstances to exist. The concorded items can be stored in VariableGroups in ResourcePackages that exist either under a Group or a StudyUnit.
- Series or Study
- Metadata Package (1..1)
- Concept Set (1..1)
Concept (0..n)
- Conceptual Variable Set (1..1)
- Conceptual Variable Group (container group) (1..1)
Conceptual Variable Group (0..n)
- Variable Set
- Variable Group (0..n)
Variable (0..n)
- Study (0..n)
- Metadata Package (0..n)
- Variable Scheme
- Variable Group (0..n)
Variable (0..n)
The following logic will be used to build these concordance tables.
One table is built per ConceptualVariableGroup (same as existing)
Each ConceptualVariable in the group gets a row (same as existing)
All Variables that reference the ConceptualVariable are gathered
For each gathered Variable: - All VariableGroups and VariableSchemes that reference the Variable are gathered
One column is created for each distinct gathered VariableGroup/Scheme
The content of each ConceptualVariable(row)-to-VariableGroup cell is filled with links to Variable that reference the ConceptualVariable, and that are contained in the VariableGroup
Concordance Tables in Colectica Portal¶
With metadata in the structure described above, Colectica Portal will display concordance tables similar to the following:
Note
Statistical comparisons are only available when using the PhysicalInstance approach.
Describe Concordance in Colectica Designer¶
To assign a represented variable to a variable:
Navigate to the variable editor for the variable.
On the Concept tab, search for a represented variable to assign, or create a new one.
See also
For instructions on assigning or creating the represented variable, see Reference a Single Item.
Assign the same represented variable to any other variables, as appropriate.
Drill into the represented variable.
Using the represented variable’s editor, assign or create a conceptual variable.
After assigning represented and conceptual variables, you can view the list of all referenced variables by using the Links view from the represented variable.