Most transcriptional regulation analyses, including prediction of transcription factor (TF) binding sites and identification of enhancer-gene relations require the discovery of enhancer regulons (eRegulons), which are active in a set of cells. A TF with its set of target enhancers and target genes is called an eRegulon. This function takes as input a Seurat object (containing both scRNA-seq and scATAC-seq) and then simultaneously identifies eRegulons as well as the cell sets where eRegulons are active. In addition, this function determines the number of eRegulons in a dataset by submodular optimization.

run_stream(
  obj = NULL,
  qubic.path = "/users/PAS1475/liyang/software/QUBIC2/qubic",
  candidate.TFs = NULL,
  peak.assay = "ATAC",
  var.genes = 3000,
  top.peaks = 3000,
  ifPutativeTFs = FALSE,
  min.cells = 10,
  out.dir = "./",
  org = "hg38",
  top.ngenes = 5,
  c.cutoff = 1,
  n.blocks = 100,
  distance = 5e+05,
  filter_peaks_for_cicero = FALSE,
  ifWeighted = TRUE,
  cicero.covar = -Inf,
  signac.score = -Inf,
  signac.pval = Inf,
  intra.cutoff = 1,
  inter.cutoff = 0.8,
  peak.cutoff = 0.8,
  score.cutoff = 1,
  KL = "min.exp",
  quantile.cutoff = 4,
  BlockOverlap = 0.5,
  Extension = 0.9,
  submod.step = 30,
  min.eRegs = 100,
  url.link = "https://figshare.com/ndownloader/files/38794185"
)

Arguments

obj

A Seurat object composed of both scRNA-seq and scATAC-seq assays.

qubic.path

The path of the binary executable file of QUBIC2, e.g., "/users/PAS1475/liyang/software/QUBIC2/qubic"

candidate.TFs

The list of candidate TFs used to identify eRegulons and eGRNs, NULL by default.

peak.assay

The scATAC-seq assay, "ATAC" by default.

var.genes

The number of highly variable genes to predict used to identify the core part of HBCs, 3000 by default.

top.peaks

The number of top-ranked peaks to identify the core part of hybrid biclusters (HBCs), 3000 by default.

min.cells

The cutoff of minimum number of cells for quality control (QC), 10 by default.

out.dir

The directory to save the intermediate or final results, "./" by default.

org

The organism version, hg38 by default.

top.ngenes

The number of genes composing the core part of an HBC, 5 by default.

c.cutoff

The cutoff of consistency during hybrid biclustering process, 1.0 by default.

n.blocks

The cutoff of the maximum number of blocks output by IRIS-FGM, 100 by default.

distance

The distance cutoff to build enhancer-enhancer relations, 5e+05 by default.

filter_peaks_for_cicero

Whether filter the peaks in a neighborhood of the Signac results before running cicero, FALSE by default.

intra.cutoff

The cutoff to calculate pairwise similarity among HBCs associated with the same TFs, 1.0 by default.

inter.cutoff

The cutoff to compute pairwise similarity among genes in HBCs associated with different TFs, 0.80 by default.

peak.cutoff

The cutoff to quantify pairwise similarity among enhancers in HBCs associated with different TFs, 0.80 by default.

KL

Which method to use for measuring the score of HBCs, "min.exp" by default, i.e., the smaller one between the numbers of genes and cells in a HBC.

quantile.cutoff

The quantile cutoff of the ratio of HBC cells, where enhancers are accessible, 4 by default, indicating the top-25% among ranks.

BlockOverlap

The cutoff of maximum overlap between blocks output by IRIS-FGM, 0.50 by default.

Extension

The consistency level to expand a block by IRIS-FGM, 0.70 by default.

submod.step

The step size of, i.e., the number of HBCs to add in each step during iteration, for submodular optimization, 30 by default.

min.eRegs

The minimum number of enhancer regulons (eRegulons) to output, 100 by default.

Value

When running on a Seurat object, returns a list of eRegulons saved in a nested list, each of which contains the following attributes:

  • terminal: The IRIS-FGM block used to predict the eRegulon.

  • Tier: The tier of the TF-enhancer relations: 1 represents JASPAR annotations; 2 denotes motif scanning.

  • TF: The TF of the eRegulon.

  • genes: Genes of the eRegulon.

  • peaks: Enhancers of the eRegulon.

  • cells: Cells where the eRegulon is active.

  • atac.ratio: The ratio of cells where the eRegulon enhancers are accessible against cells in which the eRegulon genes are expressed.

  • score: The eRegulon score.

  • weight: The eRegulon weight.

  • links: The enhancer-gene relations saved in GRanges object.

  • seed: The seed to obtain the eRegulon.

References

Li, Y., Ma, A., Wang, Y., Wang, C., Chen, S., Fu, H., Liu, B. and Ma, Q., 2022. Enhancer-driven gene regulatory networks inference from single-cell RNA-seq and ATAC-seq data. bioRxiv, pp.2022-12.

Chang, Y., Allen, C., Wan, C., Chung, D., Zhang, C., Li, Z. and Ma, Q., 2021. IRIS-FGM: an integrative single-cell RNA-Seq interpretation system for functional gene module analysis. Bioinformatics, 37(18), pp.3045-3047.

Xie, J., Ma, A., Zhang, Y., Liu, B., Cao, S., Wang, C., ... & Ma, Q. (2020). QUBIC2: a novel and robust biclustering algorithm for analyses and interpretation of large-scale RNA-Seq data. Bioinformatics, 36(4), 1143-1149.

Stuart, T., Srivastava, A., Madad, S., Lareau, C. A., & Satija, R. (2021). Single-cell chromatin state analysis with Signac. Nature methods, 18(11), 1333-1341.

Pliner, H. A., Packer, J. S., McFaline-Figueroa, J. L., Cusanovich, D. A., Daza, R. M., Aghamirzaie, D., ... & Trapnell, C. (2018). Cicero predicts cis-regulatory DNA interactions from single-cell chromatin accessibility data. Molecular cell, 71(5), 858-871.

Castro-Mondragon, J. A., Riudavets-Puig, R., Rauluseviciute, I., Berhanu Lemma, R., Turchi, L., Blanc-Mathieu, R., ... & Mathelier, A. (2022). JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic acids research, 50(D1), D165-D173.