You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A Nextflow DSL2 pipeline for inferring single-cell gene regulatory networks (eGRNs) using SCENIC+ from paired scRNA-seq and scATAC-seq data.
Overview
SCENIC+ identifies enhancer-driven gene regulatory networks (eGRNs) by linking transcription factor (TF) binding motifs in accessible chromatin regions to target gene expression. This pipeline orchestrates the full SCENIC+ workflow from raw inputs to a final annotated MuData object.
Input: scRNA-seq + scATAC-seq (peak matrix or fragment files) Output:scplusmdata.h5mu — annotated MuData with eGRNs, AUCell scores, and cistromes
results/
├── create_cistopic_object/
│ └── cistopic_obj.pkl
├── topic_modeling/
│ └── models/
├── topic_modeling_evaluate/
│ ├── cistopic_obj.pkl # updated with selected model
│ └── region_sets/
│ ├── topics_otsu/ # one BED file per topic (Otsu threshold)
│ ├── topics_top3k/ # one BED file per topic (top 3000 regions)
│ └── DARs/ # differentially accessible regions per cell type
├── prepare_gex_acc/
│ └── ACC_GEX.h5mu
├── motif_enrichment_cistarget/
│ └── cistarget_results.hdf5
├── motif_enrichment_dem/
│ └── dem_results.hdf5
├── egrn/
│ ├── direct_egrn.tsv
│ └── extended_egrn.tsv
├── auccell/
│ ├── auccell_direct.h5ad
│ └── auccell_extended.h5ad
└── create_scplus_mudata/
└── scplusmdata.h5mu # ← final output
Module Reference
Module
Resource label
Description
CREATE_CISTOPIC_OBJECT
process_medium
Builds CistopicObject from h5ad or fragment files
TOPIC_MODELING
process_long
LDA topic modeling (one parallel job per topic count)
TOPIC_MODELING_EVALUATE
process_medium
Model selection, topic binarization, DAR computation
PREPARE_GEX_ACC
process_medium
Assembles paired GEX+ACC MuData object
DOWNLOAD_GENOME_ANNOTATION
process_low
Fetches gene annotation and chromosome sizes via Biomart
GET_SEARCH_SPACE
process_medium
Computes region→gene search windows
MOTIF_ENRICHMENT_CISTARGET
process_high
cisTarget motif ranking enrichment
MOTIF_ENRICHMENT_DEM
process_high
Differential enrichment method (DEM) scoring
PREPARE_MENR
process_low
Merges enrichment results; extracts TF name list
TF_TO_GENE
process_grn
TF→gene importance scoring (GBM/RF via arboreto)
REGION_TO_GENE
process_grn
Region→gene importance + correlation scoring
EGRN_DIRECT / EGRN_EXTENDED
process_grn
eGRN assembly (direct and orthology annotations)
AUCCELL_DIRECT / AUCCELL_EXTENDED
process_medium
AUCell regulon activity scoring
CREATE_SCPLUS_MUDATA
process_medium
Assembles final annotated MuData output
Resource labels (defined in conf/base.config):
Label
CPUs
Memory
Queue
process_low
2
16 GB
normal
process_medium
4
64 GB
normal
process_high
8
200 GB
normal
process_long
4
200 GB
long
process_grn
24
150 GB
long
Barcode Alignment
ATAC and RNA barcodes frequently differ in format (e.g. suffix additions). Use --bc_transform_func to supply a Python lambda that transforms ATAC barcodes to match RNA barcodes:
# Strip a sample suffix added to ATAC barcodes
--bc_transform_func "lambda x: x.split('___')[0]"# Add a sample prefix
--bc_transform_func "lambda x: 'SAMPLE1_' + x"
cisTarget Databases
Pre-built cisTarget and DEM databases for human (hg38/hg19) and mouse (mm10) can be downloaded from the Aerts lab resources portal.
Citation
If you use this pipeline, please cite:
Bravo González-Blas, C. et al. SCENIC+: single-cell multiomic inference of enhancer-driven gene regulatory networks. Nature Methods20, 1355–1367 (2023). https://doi.org/10.1038/s41592-023-01938-4
License
This pipeline is distributed under the MIT License. See LICENSE for details.