Perform customizable filtering and the main DTU calling with DRIMSeq.

run_drimseq(
  counts,
  tx2gene,
  pd,
  id_col = NULL,
  cond_col,
  cond_levels = NULL,
  filtering_strategy = "bulk",
  add_pseudocount = FALSE,
  BPPARAM = BiocParallel::SerialParam(),
  force_dense = TRUE,
  subset_feature = NULL,
  subset_sample = NULL,
  carry_over_metadata = TRUE,
  filter_only = FALSE,
  ...
)

Arguments

counts

Can be either:

  1. (sparse) matrix with feature counts, where the rows correspond to features (e.g. transcripts). One column per sample/cell with the count data for the specified features must be present (The names of these columns must match with the identifiers in id_col).

  2. Seurat object with a transcription level assay as active.assay (most likely result object from combine_to_matrix())

tx2gene

Data frame, where the first column consists of feature identifiers and the second column consists of corresponding gene identifiers. Feature identifiers must match with the rownames of the counts object. If a Seurat object is provided in counts and tx2gene was provided in combine_to_matrix(), a vector of the colnames of the specific feature and gene identifiers is sufficient.

pd

Data frame with at least a column of sample/cell identifiers (rownames or specified in id_col) and the comparison group definition (specified in cond_col).

id_col

Name of the column in pd, where unique sample/cell identifiers are stored. If NULL (default), use rownames of pd.

cond_col

Name of the column in pd, where the comparison groups/conditions are defined. If more than 2 levels/groups are present, the groups that should be used must be specified in cond_levels.

cond_levels

Define two levels/groups of cond_col, that should be compared. The order of the levels states the comparison formula (i.e. cond_col[1]-cond_col[2]).

filtering_strategy

Define the filtering strategy to reduce and noise and increase statistical power.

  • 'bulk': Predefined strategy for bulk RNAseq data (default): Features must contribute at least 5% of the total expression in at least 50% of the samples of the smallest group. Additionally the total gene expression must be 5 or more for at least 50% of the samples of the smallest group.

  • 'sc': Predefined strategy for single-cell RNAseq data: Features must contribute at least 5% of the total expression in at least 5% of the cells of the smallest group.

  • 'own': Can be used to specify a user-defined strategy via the ... argument (using the parameters of dmFilter).

add_pseudocount

Define TRUE if a very small pseudocount shall be added to transcripts with zero expression in one group. Adding the pseudocount enables statistical analysis for comparisons, where one groups proportion is completely zero.

BPPARAM

If multicore processing should be used, specify a BiocParallelParam object here. Among others, can be SerialParam() (default) for non-multicore processing or MulticoreParam('number_cores') for multicore processing. See BiocParallel for more information.

force_dense

If you do not want to use a sparse Matrix for DRIMSeq calculations, you can force a dense conversion by specifying TRUE. Increases memory usage, but also reduces runtime drastically (currently).

subset_feature

Subsets the provided count matrix to only specified features. Can be names, indices or logicals.

subset_sample

Subsets the provided count matrix to only specified samples. Can be names, indices or logicals.

carry_over_metadata

Specify if compatible additional columns of tx2gene shall be carried over to the gene and transcript level meta_table in the results. Columns with NA values are not carried over.

filter_only

Return filtered (sparse) matrix, without performing DRIMSeq statistical computations.

...

Arguments passed on to sparseDRIMSeq::dmFilter

x

dmDSdata or dmSQTLdata object.

Value

dturtle object with the key results, that can be used in the DTUrtle steps hereafter. The object is just a easily accessible list with the following items:

  • meta_table_gene: Data frame of the expressed-in ratio of all genes. Expressed-in is defined as expression > 0. Can be used to add gene level meta-information for plotting.

  • meta_table_tx: Data frame of the expressed-in ratio of all transcripts. Expressed-in is defined as expression > 0. Can be used to add transcript level meta-information for plotting.

  • meta_table_sample: Data frame of the provided sample level information (pd). Can be used to add sample level meta-information for plotting.

  • drim: Results of the DRIMSeq statistical computations (dmTest()).

  • design: Design matrix generated from the specified pd columns.

  • group: Vector which sample/cell belongs to which comparison group.

  • used_filtering_options: List of the used filtering options.

  • add_pseudocount: Keeps track if pseudocount was added in comparison.

If filter_only=TRUE, only the filtered (sparse) matrix is returned.

Details

Run the main DRIMSeq pipeline, including generation of a design matrix, gene/feature filtering and running the statistical computations of DRIMSeq (dmPrecision(), dmFit() and dmTest())

See also