|
1 | 1 | # Background |
2 | 2 |
|
3 | | -Archaeogenetics has become a fast accelerating field, with new data coming out faster than many individual researchers can keep track of and co-analyze. Recently, we have surpassed the threshold of genome-wide data for [10,000 ancient human individuals](https://www.nature.com/articles/d41586-023-01403-4). In addition, for many of those samples we also have rich metadata ranging from archaeological information to radiocarbon dating. |
| 3 | +Archaeogenetics has become a fast accelerating field, with new data coming out faster than many individual researchers can keep track of and co-analyze. Already in 2023 we have surpassed the threshold of genome-wide data for [10,000 ancient human individuals](https://www.nature.com/articles/d41586-023-01403-4). In addition, for many of those samples we also have rich metadata ranging from archaeological information to radiocarbon dating. |
4 | 4 |
|
5 | | -The way data is currently shared and published via academic papers, at least from genetic analyses, is mainly via releasing raw sequencing data into public repositories such as the [ENA](https://www.ebi.ac.uk/ena), while providing partial metadata on samples via often poorly formatted Excel tables in the Supplement. This creates (at least) the following problems: |
| 5 | +The way data is currently shared and published via academic papers, at least from genetic analyses, is mainly via releasing raw sequencing data into public repositories such as the [ENA](https://www.ebi.ac.uk/ena), while providing partial metadata on samples via often poorly formatted Excel tables in the supplementary materials. This creates (at least) the following problems: |
6 | 6 |
|
7 | 7 | 1. Intermediate data such as genotypes are often not released at all, making it hard for others to reproduce analyses. |
8 | 8 | 2. The connection between individuals, contextual information, and genetic data becomes hard to maintain, bridging between very different repositories and sources (Excel vs. personal homepages vs. public repositories) |
9 | 9 | 3. Meta-analyses spanning datasets require enormous amounts of work on data collection and curation. |
10 | 10 |
|
11 | | -A major initiative to address these problems in human archaeogenetics is the [Allen Ancient DNA Resource](https://doi.org/10.1101/2023.04.06.535797) ("AADR"), which is a curated dataset of public ancient DNA data generated, curated and bundled by David Reich's ancient DNA laboratory at Harvard University. In many ways, our initiative is inpiried by and deriving from this resource. In particular, the AADR currently (April 2023) is arguably the most complete resource world-wide that provides genome-wide genotype data for ancient human individuals from nearly all publications in the field. |
| 11 | +A major initiative to address these problems in human archaeogenetics is the [Allen Ancient DNA Resource](http://dx.doi.org/10.1038/s41597-024-03031-7) ("AADR"), which is a curated dataset of public ancient DNA data generated, curated and bundled by David Reich's ancient DNA laboratory at Harvard University. In many ways, our initiative is inpiried by and deriving from this resource. In particular, the AADR currently (April 2023) is arguably the most complete resource world-wide that provides genome-wide genotype data for ancient human individuals from nearly all publications in the field. |
12 | 12 |
|
13 | 13 | Our [public archives](archive_overview) derive to a large extent directly from the AARD, while many curated packages, in particular from 2019 and later, contain data compiled and generated by us. But our initiative also differs in important aspects from the AARD: |
14 | 14 |
|
|
0 commit comments