R/run_stream.R
run_stream.Rd
Most transcriptional regulation analyses, including prediction of transcription factor
(TF) binding sites and identification of enhancer-gene relations require the discovery of enhancer
regulons (eRegulons), which are active in a set of cells. A TF with its set of target enhancers
and target genes is called an eRegulon. This function takes as input a Seurat
object (containing both scRNA-seq and scATAC-seq) and then simultaneously identifies eRegulons as well
as the cell sets where eRegulons are active. In addition, this function determines the number of eRegulons
in a dataset by submodular optimization.
run_stream(
obj = NULL,
qubic.path = "/users/PAS1475/liyang/software/QUBIC2/qubic",
candidate.TFs = NULL,
peak.assay = "ATAC",
var.genes = 3000,
top.peaks = 3000,
ifPutativeTFs = FALSE,
min.cells = 10,
out.dir = "./",
org = "hg38",
top.ngenes = 5,
c.cutoff = 1,
n.blocks = 100,
distance = 5e+05,
filter_peaks_for_cicero = FALSE,
ifWeighted = TRUE,
cicero.covar = -Inf,
signac.score = -Inf,
signac.pval = Inf,
intra.cutoff = 1,
inter.cutoff = 0.8,
peak.cutoff = 0.8,
score.cutoff = 1,
KL = "min.exp",
quantile.cutoff = 4,
BlockOverlap = 0.5,
Extension = 0.9,
submod.step = 30,
min.eRegs = 100,
url.link = "https://figshare.com/ndownloader/files/38794185"
)
A Seurat
object composed of both scRNA-seq and scATAC-seq assays.
The path of the binary executable file of QUBIC2, e.g., "/users/PAS1475/liyang/software/QUBIC2/qubic"
The list of candidate TFs used to identify eRegulons and eGRNs, NULL by default.
The scATAC-seq assay, "ATAC" by default.
The number of highly variable genes to predict used to identify the core part of HBCs, 3000 by default.
The number of top-ranked peaks to identify the core part of hybrid biclusters (HBCs), 3000 by default.
The cutoff of minimum number of cells for quality control (QC), 10 by default.
The directory to save the intermediate or final results, "./" by default.
The organism version, hg38 by default.
The number of genes composing the core part of an HBC, 5 by default.
The cutoff of consistency during hybrid biclustering process, 1.0 by default.
The cutoff of the maximum number of blocks output by IRIS-FGM, 100 by default.
The distance cutoff to build enhancer-enhancer relations, 5e+05 by default.
Whether filter the peaks in a neighborhood of the Signac
results
before running cicero
, FALSE by default.
The cutoff to calculate pairwise similarity among HBCs associated with the same TFs, 1.0 by default.
The cutoff to compute pairwise similarity among genes in HBCs associated with different TFs, 0.80 by default.
The cutoff to quantify pairwise similarity among enhancers in HBCs associated with different TFs, 0.80 by default.
Which method to use for measuring the score of HBCs, "min.exp" by default, i.e., the smaller one between the numbers of genes and cells in a HBC.
The quantile cutoff of the ratio of HBC cells, where enhancers are accessible, 4 by default, indicating the top-25% among ranks.
The cutoff of maximum overlap between blocks output by IRIS-FGM
, 0.50 by default.
The consistency level to expand a block by IRIS-FGM
, 0.70 by default.
The step size of, i.e., the number of HBCs to add in each step during iteration, for submodular optimization, 30 by default.
The minimum number of enhancer regulons (eRegulons) to output, 100 by default.
When running on a Seurat
object,
returns a list of eRegulons saved in a nested list, each of which contains the following attributes:
terminal: The IRIS-FGM
block used to predict the eRegulon.
Tier: The tier of the TF-enhancer relations: 1 represents JASPAR annotations; 2 denotes motif scanning.
TF: The TF of the eRegulon.
genes: Genes of the eRegulon.
peaks: Enhancers of the eRegulon.
cells: Cells where the eRegulon is active.
atac.ratio: The ratio of cells where the eRegulon enhancers are accessible against cells in which the eRegulon genes are expressed.
score: The eRegulon score.
weight: The eRegulon weight.
links: The enhancer-gene relations saved in GRanges
object.
seed: The seed to obtain the eRegulon.
Li, Y., Ma, A., Wang, Y., Wang, C., Chen, S., Fu, H., Liu, B. and Ma, Q., 2022. Enhancer-driven gene regulatory networks inference from single-cell RNA-seq and ATAC-seq data. bioRxiv, pp.2022-12.
Chang, Y., Allen, C., Wan, C., Chung, D., Zhang, C., Li, Z. and Ma, Q., 2021. IRIS-FGM: an integrative single-cell RNA-Seq interpretation system for functional gene module analysis. Bioinformatics, 37(18), pp.3045-3047.
Xie, J., Ma, A., Zhang, Y., Liu, B., Cao, S., Wang, C., ... & Ma, Q. (2020). QUBIC2: a novel and robust biclustering algorithm for analyses and interpretation of large-scale RNA-Seq data. Bioinformatics, 36(4), 1143-1149.
Stuart, T., Srivastava, A., Madad, S., Lareau, C. A., & Satija, R. (2021). Single-cell chromatin state analysis with Signac. Nature methods, 18(11), 1333-1341.
Pliner, H. A., Packer, J. S., McFaline-Figueroa, J. L., Cusanovich, D. A., Daza, R. M., Aghamirzaie, D., ... & Trapnell, C. (2018). Cicero predicts cis-regulatory DNA interactions from single-cell chromatin accessibility data. Molecular cell, 71(5), 858-871.
Castro-Mondragon, J. A., Riudavets-Puig, R., Rauluseviciute, I., Berhanu Lemma, R., Turchi, L., Blanc-Mathieu, R., ... & Mathelier, A. (2022). JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic acids research, 50(D1), D165-D173.