You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- name: Bioinformatics Group, Wageningen University & Research, Netherlands
53
+
- name: Bioinformatics Group, Wageningen University & Research, the Netherlands
54
54
index: 3
55
55
- name: Naicons Srl, Milan, Italy
56
56
index: 4
57
57
- name: Interfaculty Institute of Microbiology and Infection Medicine Tübingen (IMIT), University of Tübingen, Germany
58
58
index: 5
59
-
- name: Newcastle University, Biosciences Institute, Newcastle upon Tyne, UK
59
+
- name: Newcastle University, Biosciences Institute, Newcastle upon Tyne, United Kingdom
60
60
index: 6
61
61
- name: Department of Biochemistry, University of Johannesburg, 2006 Johannesburg, South Africa
62
62
index: 7
@@ -70,8 +70,8 @@ Natural product discovery increasingly relies on the integration of multi-omics
70
70
71
71
# Statement of need
72
72
Omics datasets have become a key resource for natural products discovery, enabling the systematic exploration of specialized metabolites, the refinement of knowledge of known natural products, and the identification of novel bioactive compounds or metabolic enzymes. Paired omics analyses combine complementary genomics (e.g., biosynthetic gene clusters (BGCs)) and metabolomics (e.g., mass spectra and mass fragmentation or MS/MS spectra) datasets to elucidate gene-metabolite relationships, accelerating the discovery process [@goering_metabologenomics_2016; @leao_npomix_2022; @hooft_linking_2020].
73
-
Several computational strategies have been developed to propose such gene cluster-mass spectral links, including i) feature-based approaches that match predicted structural or substructure information between genomes and tandem mass spectrometry (.e.g., Pep2Path, MetaRiPPquest, MetaMiner, DeepRiPP) [@medema2014pep2path; @mohimani2017metarippquest; @cao2019metaminer; @merwin2020deepripp], ii) correlation-based “metabologenomics” that infer co-occurrence patterns across strains or samples [@goering_metabologenomics_2016], and iii) hybrid frameworks such as NPLinker and the machine learning classifier-based NPOmix [@eldjarn_ranking_2021; @leao_npomix_2022], with recently proposed phylogeny-aware extensions to reduce false positive associations [@boldt2026phylogeny]. However, the connected omics data structures, preproccessing pipelines, resources, and annotation tools are constantly being improved: curated BGC repositories such as MIBiG are continually expanded with new entries and annotation fields [@zdouc_mibig_2025], while community MS/MS resources such as GNPS spectral libraries keep growing [@wang_sharing_2016]. Together with the constant expansion of experimental datasets, this puts a strain on downstream frameworks that integrate processed omics data and link results. Hence, natural products discovery would benefit from up-to-date and user-friendly software packages that parse processed omics data and connect it with algorithms returning ranked, queryable gene cluster - mass spectra links to prioritize links to further investigate manually. Here, we redesigned NPLinker to provide such an integrative omics tool that guides both users and developers in paired omics mining with its modular setup. For example, recent developments in omics processing, annotation tools, and ranking metrics could be added to the framework [@louwen_enhanced_2023; @louwen_ipresto_2023]. Moreover, several of such linking scores could then be used together with the currently implemented strain correlation score to further improve ranking results.
74
-
Omics datasets have become a key resource for natural products discovery, enabling the systematic exploration of specialized metabolites, the refinement of knowledge of known natural products, and the identification of novel bioactive compounds or metabolic enzymes. Paired omics analyses combine complementary genomics (e.g., biosynthetic gene clusters (BGCs)) and metabolomics (e.g., mass spectra) datasets to elucidate gene-metabolite relationships, accelerating the discovery process [@goering_metabologenomics_2016; @leao_npomix_2022; @hooft_linking_2020]. However, omics data structures, preproccessing pipelines, resources, and annotation tools are constantly being improved. For example, newer releases of MIBiG contain more validated BGCs and new annotation fields [@zdouc_mibig_2025], while mass spectral libraries are growing in size and information as well [@wang_sharing_2016]. Besides, newer versions of omics clustering tools have different output file formats. Together with the constant expansion of available experimental datasets, this puts a strain on downstream frameworks that integrate the data and results. Hence, researchers working in the natural products discovery field, or anyone with paired genomics and metabolomics data, would benefit from up-to-date and user-friendly software packages that parse processed omics data and connect it with algorithms returning ranked, queryable gene cluster - mass spectra links to prioritize links to further investigate manually. Here, we redesigned NPLinker to provide such an integrative omics tool that guides users in paired omics mining with its modular setup. The tool is also of interest to developers in this field, as recent developments in omics processing, annotation tools, and ranking metrics could be added to the framework [@louwen_enhanced_2023; @louwen_ipresto_2023]. Moreover, several of such linking scores could then be used together with the currently implemented strain correlation score to further improve ranking results.
73
+
Several computational strategies have been developed to propose such gene cluster-mass spectral links, including i) feature-based approaches that match predicted structural or substructure information between genomes and tandem mass spectrometry (.e.g., Pep2Path, MetaRiPPquest, MetaMiner, DeepRiPP) [@medema2014pep2path; @mohimani2017metarippquest; @cao2019metaminer; @merwin2020deepripp], ii) correlation-based “metabologenomics” that infer co-occurrence patterns across strains or samples [@goering_metabologenomics_2016], and iii) hybrid frameworks such as NPLinker and the machine learning classifier-based NPOmix [@eldjarn_ranking_2021; @leao_npomix_2022], with recently proposed phylogeny-aware extensions to reduce false positive associations [@boldt2026phylogeny]. However, the connected omics data structures, preproccessing pipelines, resources, and annotation tools are constantly being improved: curated BGC repositories such as MIBiG are continually expanded at every release with new entries and new annotation fields [@zdouc_mibig_2025], while community mass spectral (MS/MS) resources such as GNPS spectral libraries keep growing in size and information as well [@wang_sharing_2016]. Besides, newer versions of omics clustering tools have different output file formats. Together with the constant expansion of experimental datasets, this puts a strain on downstream frameworks that integrate processed omics data and link results.
74
+
Hence, researchers working in the natural products discovery field, or anyone with paired genomics and metabolomics data, would benefit from up-to-date and user-friendly software packages that parse processed omics data and connect it with algorithms returning ranked, queryable gene cluster - mass spectra links to prioritize links to further investigate manually. Here, we redesigned NPLinker to provide such an integrative omics tool that guides users in paired omics mining with its modular setup. The tool is also of interest to developers in this field, as recent developments in omics processing, annotation tools, and ranking metrics could be added to the framework [@louwen_enhanced_2023; @louwen_ipresto_2023]. Moreover, several of such linking scores could then be used together with the currently implemented strain correlation score to further improve ranking results.
75
75
76
76

0 commit comments