|
2 | 2 |
|
3 | 3 | ## Overview |
4 | 4 |
|
5 | | -A `.janno` file is a tabular, tab-separated (`.tsv`) file. A base set of `.janno` file columns are specified in the Poseidon package specification [here](https://github.com/poseidon-framework/poseidon-schema/blob/master/janno_columns.tsv), including information on which columns are mandatory, which ones are list columns that can hold multiple entries, and which ones limit the allowed set of entries to a strict enumeration. Beyond that the `.janno` file can include any number and type of additional columns to hold project- and context-specific variables. |
| 5 | +A `.janno` file is a tabular, tab-separated (`.tsv`) file. A base set of `.janno` file columns are specified in the Poseidon package specification [here](https://github.com/poseidon-framework/poseidon-schema/blob/master/janno_columns.tsv), including information on which columns are mandatory, which ones are list columns that can hold multiple entries, and which ones limit the allowed set of entries to a strict enumeration. Beyond that the `.janno` file can include any number and type of additional columns to hold project- and context-specific variables. These arbitrary additional columns should be named in a way so that they do not conflict with the base set. They are not validated (assumed to free-form text) by the Poseidon tooling, but they will be preserved in the Poseidon package, and propagated in operations like `trident forge`. |
6 | 6 |
|
7 | 7 | The following documentation includes additional background information ob the base set. This should make it more easy to understand and use the columns for both published and unpublished data. A `.pdf` version of the latest version of this document is available for download [here](https://github.com/poseidon-framework/poseidon-framework.github.io/blob/master/janno_details.pdf). |
8 | 8 |
|
@@ -30,7 +30,7 @@ The column `Alternative_IDs` provides a way to list other IDs used for the respe |
30 | 30 |
|
31 | 31 | To document the context of such an `Alternative_IDs` entry, the column `Alternative_IDs_Context` (introduced in Poseidon v3.0.0) allows to provide the necessary context. It is a list column with the same length and order as the `Alternative_IDs` list column, where the name of the respectice source database, e.g. `AADRv62`, must be entered. For common non-scientific names used in media and public discussion, the term `popular` can be entered. |
32 | 32 |
|
33 | | -The `Collection_ID` column stores additional, secondary identifiers used by collaboration partners (archaeologists, museums, collections) that provide the specimen for archaeogenetic research. These identifiers can have a very heterogenous structure and may not be unique across different projects or institutions. The `Collection_ID` column is therefore a free-form text list column. |
| 33 | +The `Collection_ID` column stores additional, secondary identifiers used by collaboration partners (archaeologists, museums, collections) that provide the specimen for archaeogenetic research (see also `Custodian_Institution` below). These identifiers can have a very heterogenous structure and may not be unique across different projects or institutions. The `Collection_ID` column is therefore a free-form text list column. |
34 | 34 |
|
35 | 35 | The `Group_Name` column contains one or multiple group or population names for each sample, separated by `;`. The first entry must be identical to the one used in the genotype data for the respective sample in a Poseidon package. Especially for the first entry it is recommended to only use the ASCII characters `A-Za-z0-9_-.`. Whitespaces are not allowed in any of the entries. The names can follow the geographic-temporal nomenclature proposed by [@Eisenmann2018](https://doi.org/10.1038/s41598-018-31123-z), or communicate additional categories that are meaningful for groupings in specific analyses, such as cultural labels, outlier status or relatedness to other samples |
36 | 36 |
|
@@ -69,15 +69,15 @@ For each entry in `Relation_To` there must be a corresponding entry in `Relation |
69 | 69 |
|
70 | 70 | Unlike `Relation_Degree`, `Relation_Type` can be left empty even if there are entries in `Relation_To`. But if it is filled, then the number of values must be equal to the number of entries in both `Relation_To` and `Relation_Degree`. |
71 | 71 |
|
72 | | -## Archaeological context |
| 72 | +## Cultural and archaeological context |
73 | 73 |
|
74 | | -Cultural_Era |
| 74 | +Poseidon v3.0.0 introduced the following four columns to add archaeological context information for a given sample -- at least on the level of era- and archaeological culture-attribution. Given the nature of human behaviour and archaeological inference these attributions must not be understood as absolute, objective classifications, but rather as preliminary model assumptions and interpretative tool. |
75 | 75 |
|
76 | | -Cultural_Era_URL |
| 76 | +The `Cultural_Era` column serves to list one or multiple cultural eras approximating the period in which the sampled individual lived. These can be classes like, for example "Danish Bronze Age" or "Pre-Pottery Neolithic A". If possible these classes should be taken from an established space-time gazetteer like ChronOntology (https://chronontology.dainst.org) or PeriodO (https://perio.do) to link relevant background information about the referenced phenomena, so their spatiotemporal extend and research history. |
77 | 77 |
|
78 | | -Archaeological_Culture |
| 78 | +The `Cultural_Era_URL` column allows to complement the human-readable era terms give in `Cultural_Era` with persistent URLs pointing to definitions of said entities. Length and order of both columns must therefore match. https://n2t.net/ark:/99152/p0zj6g8ks9s, for example, points to an entry for "Danish Bronze Age", and https://chronontology.dainst.org/period/Gx4uxaeTCbbg to one for "Pre-Pottery Neolithic A". Note how the entries in said gazetters go back to an authoritative source, e.g. in the form of an archaeological publication presenting a typo-chronological scheme. Most archaeological and archaeogenetic publications implicitly or explicitly adopt such a scheme for the spatio-temporal context they work on. Ideally the scheme referenced in the Poseidon package and the one in the publication should match, but in practice this may be difficult to ascertain. |
79 | 79 |
|
80 | | -Archaeological_Culture_URL |
| 80 | +The column pair `Archaeological_Culture` and `Archaeological_Culture_URL` functions just as the cultural era pair, but now on a more fine-grained level. It allows to attribute a given ancient individual to specific archaeological cultures, technocomplexes, pottery styles or political entities, for example the "Hallstatt culture in Hungary" (https://n2t.net/ark:/99152/p0nxc78fxgt), or the "Neo-Assyrian Empire" (https://chronontology.dainst.org/period/bvLwqFcGyoaL). |
81 | 81 |
|
82 | 82 | ## Spatial position |
83 | 83 |
|
@@ -210,6 +210,8 @@ The `Genetic_Source_Accession_IDs` column was introduced to link the derived gen |
210 | 210 |
|
211 | 211 | The `Primary_Contact` column is a free-form text field that stores the name of the main or the corresponding author of the respective paper for published data. |
212 | 212 |
|
| 213 | +The `Custodian_Institution` column (introduced in Poseidon v3.0.0) allows to document one or multiple institutions that curated the sampled remains at the time of sampling. Each institution should be given with name, city and country. The `Collection_ID` column may allow to link to the internal bookkeeping of this institutions. |
| 214 | + |
213 | 215 | The `Publication` column holds either the value `unpublished` for (yet) unpublished samples or -- for published data -- one or multiple citation-keys of the form `AuthorJournalYear` without any spaces or special characters. These keys have to be identical to the [BibTeX](http://www.bibtex.org) citation-keys identifying the respective entries in the `.bib` file of the package. BibTeX is a file format to store bibliographic information, where each entry (article, book, website, ...) is defined by a series of parameters (authors, year of publication, journal, ...). Here's an example `.bib` file with two entries for [@Cassidy2015](https://doi.org/10.1073/pnas.1518445113) and [@Feldman2019](https://doi.org/10.1126/sciadv.aax0061): |
214 | 216 |
|
215 | 217 | ```default |
@@ -252,10 +254,8 @@ The string `CassidyPNAS2015` is the citation-key of the first entry. To cite bot |
252 | 254 |
|
253 | 255 | When creating a new Poseidon package the `.bib` file should be filled together with the `Publication` column. One of the most simple ways to obtain the BibTeX entries may be to request them with the doi from the [doi2bib](https://doi2bib.org) wep app. It could be necessary to adjust the result manually, though. The citation-key, for example, has to be replaced by the one used in the `Publication` column. |
254 | 256 |
|
255 | | -The `Note` column is a free-form text field that can contain small amounts of additional information that is not yet expressed in a more systematic form in the the other `.janno` file columns. |
| 257 | +The `Note` column is a free-form text field that can contain small amounts of additional information that is not yet expressed in a more systematic form in the other `.janno` file columns. |
256 | 258 |
|
257 | 259 | The `Keywords` column was introduced to allow for tagging individuals with arbitrary keywords. This should simplify sorting and filtering in personal Poseidon package repositories. Each keyword is a string and multiple keywords can be separated with `;`. |
258 | 260 |
|
259 | | -Arbitrary additional columns can be included in a `.janno` file, but they should be named in a way that they do not conflict with the Poseidon package specification. These columns will not be validated (assumed free-form text), but they will be preserved in the Poseidon package, and propagated during operations with `trident forge`. |
260 | | - |
261 | 261 | --- |
0 commit comments