The Contextual Data Specification

About
What are ontologies and how do they improve data quality?
The iMicroSeq Contextual Data Specification Package
- Version Control
- Package Contents
Contacts
License
Acknowledgements

About

The iMicroSeq Contextual Data Specification defines a standardized, ontology-informed modular framework for capturing contextual (metadata) associated with microbial sequencing data generated through the iMicroSeq program. The specification is designed to support harmonized, interoperable data across environmental, public health, and One Health use cases, with a particular focus on environmental water-based monitoring. This repository contains the authoritative data schema, along with supporting documentation, mapping to downstream repositories and guidance materials. This repository is not used for data submission to the iMicroSeq Portal. To submit data or for questions related to data access and workflows, please visit the iMicroSeq Data Portal.

What are ontologies and how do they improve data quality?

Labs collect, encode and store information in different ways. They use different fields, terms and formats, they categorize variables in different ways, and the meanings of words change depending on the focus of the organization (think of the word “plant”. To someone in agriculture, “plant” could mean an organism that carries out photosynthesis, while a food regulator might understand the word “plant” to mean a factory where food products are made). This variability makes comparing, integrating and analyzing data generated by different organizations like trying to compare apples, oranges and bananas, which is difficult to do.

Ontologies are collections of controlled vocabulary that are arranged in a hierarchy, where all the terms are linked using logical relationships. Ontologies are open source and meant to represent “universal truth” as much as possible (so not tied to one organization’s vocabulary of use case). Ontologies encode synonyms, which enables mapping between the specific languages used by different organizations, and every term in the ontology is assigned a globally unique and persistent identifier. Using ontology terms to standardize GRDI-AMR contextual data not only helps make data more interoperable by using a common language, it also helps to make contextual data FAIR (Findable, Accessible, Interoperable, Reusable).

The iMicroSeq Contextual Data Specification Package

This specification is implemented via a DataHarmonizer validation template, with accompanying Field and Term reference guides (which provide definitions and additional specific guidance) and a curation Standard Operating Procedure (SOP). New terms and/or term changes can be requested using issue request forms, with additional guidance on how to do so outline in the New Term Request (NTR) SOP. These resources and locations are listed below under Package Contents.

Package Contents

Data Collection Template

Pathogen Genomics Package ()
- Template schema files can be found as .yaml/.json/.tsv under pathogen-genomics-package/templates/
DataHarmonizer App
- The DataHarmonizer is a standardized browser-based spreadsheet editor and validator.
- Instructions on "Getting Started" downloading and using the application can be found under DataHarmonizer Instructions and SOP below.
- Further information about application functionality can be found on the DataHarmonizer Wiki.

Field and Term Reference Guides

XLSX version TBD
PDF version
- Field Reference Guide _TBD
- Term Reference Guide _TBD

Curation SOP

PDF version _TBD
Online version _TBD

New Term Request (NTR) SOP

Version Control

Please note that development of the specification is dynamic and it will be updated periodically to address user needs. Versioning is done in the format of x.y.z.

x = Field level changes
y = Term value / ID level changes
z = Definition, guidance, example, formatting, or other uncategorized changes

Descriptions of changes are provided in [release notes](https://github.com/cidgoh//releases) for every new version.

Contacts

For more information and/or assistance, contact at or submit a repository issue request.

License

Pending / To Be Determined

Acknowledgements

Brought to you by The Centre for Infectious disease Genomics and One Health as part of the iMicroSeq Project, a Genome Canada funded project under the eDNA Surveillance Program.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
Needs assessment		Needs assessment
Reference Guide		Reference Guide
SOPs		SOPs
Template		Template
Wiki (delete after use)		Wiki (delete after use)
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Contextual Data Specification

About

What are ontologies and how do they improve data quality?

The iMicroSeq Contextual Data Specification Package

Package Contents

Data Collection Template

Field and Term Reference Guides

Curation SOP

New Term Request (NTR) SOP

Version Control

Contacts

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

The Contextual Data Specification

About

What are ontologies and how do they improve data quality?

The iMicroSeq Contextual Data Specification Package

Package Contents

Data Collection Template

Field and Term Reference Guides

Curation SOP

New Term Request (NTR) SOP

Version Control

Contacts

License

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages