Main filtering and DTU statistical computations

Perform customizable filtering and the main DTU calling with DRIMSeq.

run_drimseq(
  counts,
  tx2gene,
  pd,
  id_col = NULL,
  cond_col,
  cond_levels = NULL,
  filtering_strategy = "bulk",
  add_pseudocount = FALSE,
  BPPARAM = BiocParallel::SerialParam(),
  force_dense = TRUE,
  subset_feature = NULL,
  subset_sample = NULL,
  carry_over_metadata = TRUE,
  filter_only = FALSE,
  ...
)

Arguments

counts	Can be either: (sparse) matrix with feature counts, where the rows correspond to features (e.g. transcripts). One column per sample/cell with the count data for the specified features must be present (The names of these columns must match with the identifiers in `id_col`). Seurat object with a transcription level assay as `active.assay` (most likely result object from `combine_to_matrix()`)
tx2gene	Data frame, where the first column consists of feature identifiers and the second column consists of corresponding gene identifiers. Feature identifiers must match with the rownames of the counts object. If a Seurat object is provided in `counts` and `tx2gene` was provided in `combine_to_matrix()`, a vector of the colnames of the specific feature and gene identifiers is sufficient.
pd	Data frame with at least a column of sample/cell identifiers (rownames or specified in `id_col`) and the comparison group definition (specified in `cond_col`).
id_col	Name of the column in `pd`, where unique sample/cell identifiers are stored. If `NULL` (default), use rownames of `pd`.
cond_col	Name of the column in `pd`, where the comparison groups/conditions are defined. If more than 2 levels/groups are present, the groups that should be used must be specified in `cond_levels`.
cond_levels	Define two levels/groups of `cond_col`, that should be compared. The order of the levels states the comparison formula (i.e. `cond_col[1]-cond_col[2]`).
filtering_strategy	Define the filtering strategy to reduce and noise and increase statistical power. `'bulk'`: Predefined strategy for bulk RNAseq data (default): Features must contribute at least 5% of the total expression in at least 50% of the samples of the smallest group. Additionally the total gene expression must be 5 or more for at least 50% of the samples of the smallest group. `'sc'`: Predefined strategy for single-cell RNAseq data: Features must contribute at least 5% of the total expression in at least 5% of the cells of the smallest group. `'own'`: Can be used to specify a user-defined strategy via the `...` argument (using the parameters of `dmFilter`).
add_pseudocount	Define `TRUE` if a very small pseudocount shall be added to transcripts with zero expression in one group. Adding the pseudocount enables statistical analysis for comparisons, where one groups proportion is completely zero.
BPPARAM	If multicore processing should be used, specify a `BiocParallelParam` object here. Among others, can be `SerialParam()` (default) for non-multicore processing or `MulticoreParam('number_cores')` for multicore processing. See `BiocParallel` for more information.
force_dense	If you do not want to use a sparse Matrix for DRIMSeq calculations, you can force a dense conversion by specifying `TRUE`. Increases memory usage, but also reduces runtime drastically (currently).
subset_feature	Subsets the provided count matrix to only specified features. Can be names, indices or logicals.
subset_sample	Subsets the provided count matrix to only specified samples. Can be names, indices or logicals.
carry_over_metadata	Specify if compatible additional columns of `tx2gene` shall be carried over to the gene and transcript level `meta_table` in the results. Columns with `NA` values are not carried over.
filter_only	Return filtered (sparse) matrix, without performing DRIMSeq statistical computations.
...	Arguments passed on to `sparseDRIMSeq::dmFilter` `x` `dmDSdata` or `dmSQTLdata` object.

Value

dturtle object with the key results, that can be used in the DTUrtle steps hereafter. The object is just a easily accessible list with the following items:

meta_table_gene: Data frame of the expressed-in ratio of all genes. Expressed-in is defined as expression > 0. Can be used to add gene level meta-information for plotting.
meta_table_tx: Data frame of the expressed-in ratio of all transcripts. Expressed-in is defined as expression > 0. Can be used to add transcript level meta-information for plotting.
meta_table_sample: Data frame of the provided sample level information (pd). Can be used to add sample level meta-information for plotting.
drim: Results of the DRIMSeq statistical computations (dmTest()).
design: Design matrix generated from the specified pd columns.
group: Vector which sample/cell belongs to which comparison group.
used_filtering_options: List of the used filtering options.
add_pseudocount: Keeps track if pseudocount was added in comparison.

If filter_only=TRUE, only the filtered (sparse) matrix is returned.

Details

Run the main DRIMSeq pipeline, including generation of a design matrix, gene/feature filtering and running the statistical computations of DRIMSeq (dmPrecision(), dmFit() and dmTest())

Main filtering and DTU statistical computations

Arguments

Value

Details

See also