R/dtu_analysis.R
priming_bias_detection_probability.Rd
Estimate transcript detection probability for 3'- or 5'-biased data
priming_bias_detection_probability( counts, gtf, tx2gene, one_to_one = NULL, priming_enrichment = "3", genes = NULL, add_to_table = NULL, BPPARAM = BiocParallel::SerialParam() )
counts | A (sparse) count matrix, where columns represent a sample / cell and rows represent a single transcript isoform. This data is used to infer each gene's reference transcript. |
---|---|
gtf | A GTF file with gene and exon-level information. Can be a filepath or a previously imported gtf file (as GRanges or data frame). It is advised to read-in the file like this: |
tx2gene | Data frame, where the first column consists of feature identifiers and the second column consists of corresponding gene identifiers. Feature identifiers must match with the rownames of the counts object. |
one_to_one | Specify |
priming_enrichment | Specify, which end of the mRNA is supposed to be enriched in your (single-cell) RNA-seq protocol. Can be either '3' or '5', for the 3'-end or the 5'-end respectively. |
genes | (Optional) Specify certain genes, that shall be analysed. If |
add_to_table | (Optional) add the |
BPPARAM | If multicore processing should be used, specify a |
A data frame with the columns:
gene
: A gene identifier.
tx
: A transcript identifier.
detection_probability
: The calculated detection probability score.
used_as_ref
: Boolean vector, indicating which transcripts were used as reference transcript for the specific gene.
If a valid data frame in add_to_table
is provided, this data frame is returned with the added detection_probability
and used_as_ref
column.
Many (single-cell) RNA-seq protocols do not produce reads from the full-length of the mRNA, but instead favor fragments of the 3' or 5' end of the mRNA. Such protocols limit the ability to detect DTU events for specific transcripts, e.g. for transcripts of the same gene, where the first exon-level difference is close to the non-favoured priming end. This function tries to estimate, which transcripts might not pop up in a DTU analysis, because of this effect.
First, this function sets the major proportionally expressed transcript as the reference transcript for that specific gene. If no count information are availble, the first transcript is chosen as reference.
Then, for each other transcript of that gene, the first exon-level difference compared to the reference transcript is detected and a probability score is calculated based on the exonic distance between that difference and the favoured priming end.
The probability score ranges from 0 to 1, where 1 indicates no influence by the prime-biased protocol, and 0 indicates an extreme heavy influence. Thus, DTU effects for transcripts with a low score are less likely to be detectable with the given data.
Other DTUrtle DTU:
combine_to_matrix()
,
import_counts()
,
posthoc_and_stager()
,
run_drimseq()