Import the quantification results for DGE analysis of many RNA-seq quantifiers, including alevin and bustools for single-cell data.
Most likely the first step in your DTUrtle DGE analysis.
import_dge_counts(files, type, ...)
Arguments
| files |
Vector of files to be imported. Optionally can be named to keep the samples names. |
| type |
Type of the quantification data. All tools supported by tximport can be selected, additionally to the newly implemented bustools support for single-cell data. If you have single-cell data, the use of alevin or bustools is proposed.
'salmon'
'alevin'
'kallisto'
'bustools'
'rsem'
'stringtie'
'sailfish'
'none'
|
| ... |
Arguments passed on to tximport::tximport
txInlogical, whether the incoming files are transcript level (default TRUE)
txOutlogical, whether the function should just output
transcript-level (default FALSE)
countsFromAbundancecharacter, either "no" (default), "scaledTPM",
"lengthScaledTPM", or "dtuScaledTPM".
Whether to generate estimated counts using abundance estimates:
scaled up to library size (scaledTPM),
scaled using the average transcript length over samples
and then the library size (lengthScaledTPM), or
scaled using the median transcript length among isoforms of a gene,
and then the library size (dtuScaledTPM).
dtuScaledTPM is designed for DTU analysis in combination with txOut=TRUE,
and it requires specifing a tx2gene data.frame.
dtuScaledTPM works such that within a gene, values from all samples and
all transcripts get scaled by the same fixed median transcript length.
If using scaledTPM, lengthScaledTPM, or geneLengthScaledTPM,
the counts are no longer correlated across samples with transcript length,
and so the length offset matrix should not be used.
tx2genea two-column data.frame linking transcript id (column 1)
to gene id (column 2).
the column names are not relevant, but this column order must be used.
this argument is required for gene-level summarization, and the tximport
vignette describes how to construct this data.frame (see Details below).
An automated solution to avoid having to create tx2gene if
one has quantified with Salmon or alevin with human or mouse transcriptomes
is to use the tximeta function from the tximeta Bioconductor package.
varReducewhether to reduce per-sample inferential replicates
information into a matrix of sample variances variance (default FALSE).
alevin computes inferential variance by default for bootstrap
inferential replicates, so this argument is ignored/not necessary
dropInfRepswhether to skip reading in inferential replicates
(default FALSE). For alevin, tximport will still read in the
inferential variance matrix if it exists
infRepStata function to re-compute counts and abundances from the
inferential replicates, e.g. matrixStats::rowMedians to re-compute counts
as the median of the inferential replicates. The order of operations is:
first counts are re-computed, then abundances are re-computed.
Following this, if countsFromAbundance is not "no",
tximport will again re-compute counts from the re-computed abundances.
infRepStat should operate on rows of a matrix. (default is NULL)
ignoreTxVersionlogical, whether to split the tx id on the '.' character
to remove version information to facilitate matching with the tx id in tx2gene
(default FALSE)
ignoreAfterBarlogical, whether to split the tx id on the '|' character
to facilitate matching with the tx id in tx2gene (default FALSE)
geneIdColname of column with gene id. if missing, the tx2gene
argument can be used
txIdColname of column with tx id
abundanceColname of column with abundances (e.g. TPM or FPKM)
countsColname of column with estimated counts
lengthColname of column with feature length information
importera function used to read in the files
existenceOptionallogical, should tximport not check if files exist before attempting
import (default FALSE, meaning files must exist according to file.exists)
sparselogical, whether to try to import data sparsely (default is FALSE).
Initial implementation for txOut=TRUE, countsFromAbundance="no"
or "scaledTPM", no inferential replicates. Only counts matrix
is returned (and abundance matrix if using "scaledTPM")
sparseThresholdthe minimum threshold for including a count as a
non-zero count during sparse import (default is 1)
readLengthnumeric, the read length used to calculate counts from
StringTie's output of coverage. Default value (from StringTie) is 75.
The formula used to calculate counts is:
cov * transcript length / read length
alevinArgsnamed list, with logical elements filterBarcodes,
tierImport, forceSlow. See Details for definitions.
|
Value
For bulk data: A list containing a count matrix, a matrix of average effective transcript lengths and a flag how counts where inferred from abundance estimates.
For single-cell data: A list of count matrices per sample. Should be combined and optionally added to a Seurat object with combine_to_matrix().
Details
It is necessary to specify a tx2gene data frame as a parameter.
This data frame must be a a two-column data frame linking transcript id (column 1) to gene id/name (column 2).
Please see import_gtf(), move_columns_to_front() and one_to_one_mapping() to help with tx2gene creation.
See also combine_to_matrix(), when output is a list of single-cell runs.
See also