Skip to content

Commit b59c4f7

Browse files
Update paper.md
Removed unintended duplications of text in Statement of Need section.
1 parent 3a24dc7 commit b59c4f7

1 file changed

Lines changed: 5 additions & 5 deletions

File tree

joss/paper.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -46,17 +46,17 @@ authors:
4646
corresponding: true
4747
affiliation: "3, 7"
4848
affiliations:
49-
- name: Netherlands eScience Center, Netherlands
49+
- name: Netherlands eScience Center, the Netherlands
5050
index: 1
5151
- name: RECETOX, Faculty of Science, Masaryk University, Kotlářská 2, 60200, Brno, Czech Republic
5252
index: 2
53-
- name: Bioinformatics Group, Wageningen University & Research, Netherlands
53+
- name: Bioinformatics Group, Wageningen University & Research, the Netherlands
5454
index: 3
5555
- name: Naicons Srl, Milan, Italy
5656
index: 4
5757
- name: Interfaculty Institute of Microbiology and Infection Medicine Tübingen (IMIT), University of Tübingen, Germany
5858
index: 5
59-
- name: Newcastle University, Biosciences Institute, Newcastle upon Tyne, UK
59+
- name: Newcastle University, Biosciences Institute, Newcastle upon Tyne, United Kingdom
6060
index: 6
6161
- name: Department of Biochemistry, University of Johannesburg, 2006 Johannesburg, South Africa
6262
index: 7
@@ -70,8 +70,8 @@ Natural product discovery increasingly relies on the integration of multi-omics
7070

7171
# Statement of need
7272
Omics datasets have become a key resource for natural products discovery, enabling the systematic exploration of specialized metabolites, the refinement of knowledge of known natural products, and the identification of novel bioactive compounds or metabolic enzymes. Paired omics analyses combine complementary genomics (e.g., biosynthetic gene clusters (BGCs)) and metabolomics (e.g., mass spectra and mass fragmentation or MS/MS spectra) datasets to elucidate gene-metabolite relationships, accelerating the discovery process [@goering_metabologenomics_2016; @leao_npomix_2022; @hooft_linking_2020].
73-
Several computational strategies have been developed to propose such gene cluster-mass spectral links, including i) feature-based approaches that match predicted structural or substructure information between genomes and tandem mass spectrometry (.e.g., Pep2Path, MetaRiPPquest, MetaMiner, DeepRiPP) [@medema2014pep2path; @mohimani2017metarippquest; @cao2019metaminer; @merwin2020deepripp], ii) correlation-based “metabologenomics” that infer co-occurrence patterns across strains or samples [@goering_metabologenomics_2016], and iii) hybrid frameworks such as NPLinker and the machine learning classifier-based NPOmix [@eldjarn_ranking_2021; @leao_npomix_2022], with recently proposed phylogeny-aware extensions to reduce false positive associations [@boldt2026phylogeny]. However, the connected omics data structures, preproccessing pipelines, resources, and annotation tools are constantly being improved: curated BGC repositories such as MIBiG are continually expanded with new entries and annotation fields [@zdouc_mibig_2025], while community MS/MS resources such as GNPS spectral libraries keep growing [@wang_sharing_2016]. Together with the constant expansion of experimental datasets, this puts a strain on downstream frameworks that integrate processed omics data and link results. Hence, natural products discovery would benefit from up-to-date and user-friendly software packages that parse processed omics data and connect it with algorithms returning ranked, queryable gene cluster - mass spectra links to prioritize links to further investigate manually. Here, we redesigned NPLinker to provide such an integrative omics tool that guides both users and developers in paired omics mining with its modular setup. For example, recent developments in omics processing, annotation tools, and ranking metrics could be added to the framework [@louwen_enhanced_2023; @louwen_ipresto_2023]. Moreover, several of such linking scores could then be used together with the currently implemented strain correlation score to further improve ranking results.
74-
Omics datasets have become a key resource for natural products discovery, enabling the systematic exploration of specialized metabolites, the refinement of knowledge of known natural products, and the identification of novel bioactive compounds or metabolic enzymes. Paired omics analyses combine complementary genomics (e.g., biosynthetic gene clusters (BGCs)) and metabolomics (e.g., mass spectra) datasets to elucidate gene-metabolite relationships, accelerating the discovery process [@goering_metabologenomics_2016; @leao_npomix_2022; @hooft_linking_2020]. However, omics data structures, preproccessing pipelines, resources, and annotation tools are constantly being improved. For example, newer releases of MIBiG contain more validated BGCs and new annotation fields [@zdouc_mibig_2025], while mass spectral libraries are growing in size and information as well [@wang_sharing_2016]. Besides, newer versions of omics clustering tools have different output file formats. Together with the constant expansion of available experimental datasets, this puts a strain on downstream frameworks that integrate the data and results. Hence, researchers working in the natural products discovery field, or anyone with paired genomics and metabolomics data, would benefit from up-to-date and user-friendly software packages that parse processed omics data and connect it with algorithms returning ranked, queryable gene cluster - mass spectra links to prioritize links to further investigate manually. Here, we redesigned NPLinker to provide such an integrative omics tool that guides users in paired omics mining with its modular setup. The tool is also of interest to developers in this field, as recent developments in omics processing, annotation tools, and ranking metrics could be added to the framework [@louwen_enhanced_2023; @louwen_ipresto_2023]. Moreover, several of such linking scores could then be used together with the currently implemented strain correlation score to further improve ranking results.
73+
Several computational strategies have been developed to propose such gene cluster-mass spectral links, including i) feature-based approaches that match predicted structural or substructure information between genomes and tandem mass spectrometry (.e.g., Pep2Path, MetaRiPPquest, MetaMiner, DeepRiPP) [@medema2014pep2path; @mohimani2017metarippquest; @cao2019metaminer; @merwin2020deepripp], ii) correlation-based “metabologenomics” that infer co-occurrence patterns across strains or samples [@goering_metabologenomics_2016], and iii) hybrid frameworks such as NPLinker and the machine learning classifier-based NPOmix [@eldjarn_ranking_2021; @leao_npomix_2022], with recently proposed phylogeny-aware extensions to reduce false positive associations [@boldt2026phylogeny]. However, the connected omics data structures, preproccessing pipelines, resources, and annotation tools are constantly being improved: curated BGC repositories such as MIBiG are continually expanded at every release with new entries and new annotation fields [@zdouc_mibig_2025], while community mass spectral (MS/MS) resources such as GNPS spectral libraries keep growing in size and information as well [@wang_sharing_2016]. Besides, newer versions of omics clustering tools have different output file formats. Together with the constant expansion of experimental datasets, this puts a strain on downstream frameworks that integrate processed omics data and link results.
74+
Hence, researchers working in the natural products discovery field, or anyone with paired genomics and metabolomics data, would benefit from up-to-date and user-friendly software packages that parse processed omics data and connect it with algorithms returning ranked, queryable gene cluster - mass spectra links to prioritize links to further investigate manually. Here, we redesigned NPLinker to provide such an integrative omics tool that guides users in paired omics mining with its modular setup. The tool is also of interest to developers in this field, as recent developments in omics processing, annotation tools, and ranking metrics could be added to the framework [@louwen_enhanced_2023; @louwen_ipresto_2023]. Moreover, several of such linking scores could then be used together with the currently implemented strain correlation score to further improve ranking results.
7575

7676
![The NPLinker 2 framework. The current pipeline consists of five main components: 1. Initiating an analysis with an input block that includes configuration file and optional input data; 2. Preparing dataset by automatically downloading or generating data; 3. Loading and parsing data from data files; 4. Scoring and linking data; 5. Creating an output for analysis and visualization of results.\label{fig:1}](fig1.png)
7777

0 commit comments

Comments
 (0)