Identify enhancer regulons (eRegulons) from jointly profiled scRNA-seq and scATAC-seq data

Most transcriptional regulation analyses, including prediction of transcription factor (TF) binding sites and identification of enhancer-gene relations require the discovery of enhancer regulons (eRegulons), which are active in a set of cells. A TF with its set of target enhancers and target genes is called an eRegulon. This function takes as input a Seurat object (containing both scRNA-seq and scATAC-seq) and then simultaneously identifies eRegulons as well as the cell sets where eRegulons are active. In addition, this function determines the number of eRegulons in a dataset by submodular optimization.

run_stream(
  obj = NULL,
  qubic.path = "/users/PAS1475/liyang/software/QUBIC2/qubic",
  candidate.TFs = NULL,
  peak.assay = "ATAC",
  var.genes = 3000,
  top.peaks = 3000,
  ifPutativeTFs = FALSE,
  min.cells = 10,
  out.dir = "./",
  org = "hg38",
  top.ngenes = 5,
  c.cutoff = 1,
  n.blocks = 100,
  distance = 5e+05,
  filter_peaks_for_cicero = FALSE,
  ifWeighted = TRUE,
  cicero.covar = -Inf,
  signac.score = -Inf,
  signac.pval = Inf,
  intra.cutoff = 1,
  inter.cutoff = 0.8,
  peak.cutoff = 0.8,
  score.cutoff = 1,
  KL = "min.exp",
  quantile.cutoff = 4,
  BlockOverlap = 0.5,
  Extension = 0.9,
  submod.step = 30,
  min.eRegs = 100,
  url.link = "https://figshare.com/ndownloader/files/38794185"
)

Arguments

obj: A Seurat object composed of both scRNA-seq and scATAC-seq assays.
qubic.path: The path of the binary executable file of QUBIC2, e.g., "/users/PAS1475/liyang/software/QUBIC2/qubic"
candidate.TFs: The list of candidate TFs used to identify eRegulons and eGRNs, NULL by default.
peak.assay: The scATAC-seq assay, "ATAC" by default.
var.genes: The number of highly variable genes to predict used to identify the core part of HBCs, 3000 by default.
top.peaks: The number of top-ranked peaks to identify the core part of hybrid biclusters (HBCs), 3000 by default.
min.cells: The cutoff of minimum number of cells for quality control (QC), 10 by default.
out.dir: The directory to save the intermediate or final results, "./" by default.
org: The organism version, hg38 by default.
top.ngenes: The number of genes composing the core part of an HBC, 5 by default.
c.cutoff: The cutoff of consistency during hybrid biclustering process, 1.0 by default.
n.blocks: The cutoff of the maximum number of blocks output by IRIS-FGM, 100 by default.
distance: The distance cutoff to build enhancer-enhancer relations, 5e+05 by default.
filter_peaks_for_cicero: Whether filter the peaks in a neighborhood of the Signac results before running cicero, FALSE by default.
intra.cutoff: The cutoff to calculate pairwise similarity among HBCs associated with the same TFs, 1.0 by default.
inter.cutoff: The cutoff to compute pairwise similarity among genes in HBCs associated with different TFs, 0.80 by default.
peak.cutoff: The cutoff to quantify pairwise similarity among enhancers in HBCs associated with different TFs, 0.80 by default.
KL: Which method to use for measuring the score of HBCs, "min.exp" by default, i.e., the smaller one between the numbers of genes and cells in a HBC.
quantile.cutoff: The quantile cutoff of the ratio of HBC cells, where enhancers are accessible, 4 by default, indicating the top-25% among ranks.
BlockOverlap: The cutoff of maximum overlap between blocks output by IRIS-FGM, 0.50 by default.
Extension: The consistency level to expand a block by IRIS-FGM, 0.70 by default.
submod.step: The step size of, i.e., the number of HBCs to add in each step during iteration, for submodular optimization, 30 by default.
min.eRegs: The minimum number of enhancer regulons (eRegulons) to output, 100 by default.

Value

When running on a Seurat object, returns a list of eRegulons saved in a nested list, each of which contains the following attributes:

terminal: The IRIS-FGM block used to predict the eRegulon.
Tier: The tier of the TF-enhancer relations: 1 represents JASPAR annotations; 2 denotes motif scanning.
TF: The TF of the eRegulon.
genes: Genes of the eRegulon.
peaks: Enhancers of the eRegulon.
cells: Cells where the eRegulon is active.
atac.ratio: The ratio of cells where the eRegulon enhancers are accessible against cells in which the eRegulon genes are expressed.
score: The eRegulon score.
weight: The eRegulon weight.
links: The enhancer-gene relations saved in GRanges object.
seed: The seed to obtain the eRegulon.

References

Li, Y., Ma, A., Wang, Y., Wang, C., Chen, S., Fu, H., Liu, B. and Ma, Q., 2022. Enhancer-driven gene regulatory networks inference from single-cell RNA-seq and ATAC-seq data. bioRxiv, pp.2022-12.

Chang, Y., Allen, C., Wan, C., Chung, D., Zhang, C., Li, Z. and Ma, Q., 2021. IRIS-FGM: an integrative single-cell RNA-Seq interpretation system for functional gene module analysis. Bioinformatics, 37(18), pp.3045-3047.

Xie, J., Ma, A., Zhang, Y., Liu, B., Cao, S., Wang, C., ... & Ma, Q. (2020). QUBIC2: a novel and robust biclustering algorithm for analyses and interpretation of large-scale RNA-Seq data. Bioinformatics, 36(4), 1143-1149.

Stuart, T., Srivastava, A., Madad, S., Lareau, C. A., & Satija, R. (2021). Single-cell chromatin state analysis with Signac. Nature methods, 18(11), 1333-1341.

Pliner, H. A., Packer, J. S., McFaline-Figueroa, J. L., Cusanovich, D. A., Daza, R. M., Aghamirzaie, D., ... & Trapnell, C. (2018). Cicero predicts cis-regulatory DNA interactions from single-cell chromatin accessibility data. Molecular cell, 71(5), 858-871.

Castro-Mondragon, J. A., Riudavets-Puig, R., Rauluseviciute, I., Berhanu Lemma, R., Turchi, L., Blanc-Mathieu, R., ... & Mathelier, A. (2022). JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic acids research, 50(D1), D165-D173.