Perform differential gene expression analysis with DESeq2

  id_col = NULL,
  cond_levels = NULL,
  lfc_threshold = 0,
  sig_threshold = 0.01,
  dge_calling_strategy = "bulk",
  subset_feature = NULL,
  subset_sample = NULL,
  deseq_opts = list(),
  lfc_shrink_opts = list(),
  return_dds = FALSE,
  BPPARAM = BiocParallel::SerialParam()



Can be either:

  1. (sparse) matrix with gene counts, where the rows correspond to genes. One column per sample/cell with the count data for the specified genes must be present (The names of these columns must match with the identifiers in id_col).

  2. An results object from tximport (preferred if import_dge_counts() was used with bulk data).

  3. Seurat object with a gene level assay as active.assay (most likely result object from combine_to_matrix())


Data frame with at least a column of sample/cell identifiers (rownames or specified in id_col) and the comparison group definition (specified in cond_col).


Name of the column in pd, where unique sample/cell identifiers are stored. If NULL (default), use rownames of pd.


Name of the column in pd, where the comparison groups/conditions are defined. If more than 2 levels/groups are present, the groups that should be used must be specified in cond_levels.


Define two levels/groups of cond_col, that should be compared. The order of the levels states the comparison formula (i.e. cond_col[1]-cond_col[2]).


Specify a log2 fold change threshold (on log2 scale) to test against. 0 implicates no threshold.


Specify a significance threshold for the results adjusted p-values or s-values. 1 implicates no threshold.


Should be either 'bulk' or 'sc'. Specify the type of the provided data (bulk or single-cell RNA-seq) so that appropriate parameters can be applied.


Subsets the provided count matrix to only specified features. Can be names, indices or logicals.


Subsets the provided count matrix to only specified samples. Can be names, indices or logicals.


Manually specify parameters for the DESeq function. Will overwrite recommended parameters if necessary.


Manually specify parameters for the lfcShrink function. Will overwrite recommended parameters if necessary.


Should the DESeqDataSet object be returned?


If multicore processing should be used, specify a BiocParallelParam object here. Among others, can be SerialParam() (default) for non-multicore processing or MulticoreParam('number_cores') for multicore processing. See BiocParallel for more information.


A list with the analysis results and parameters:

  • results_all: Data frame of the DGE test results for all analyzed genes.

  • results_sig: Data frame of the significant DEG test results, according to the specified parameters (sig_threshold, lfc_threshold).

  • dds: The DESeqDataSet of the analysis, if return_dds=TRUE.

  • drim: Results of the DRIMSeq statistical computations (dmTest()).

  • sval_threshold / adjp_threshold: The given significance threshold used to either filter s-values or adjusted p-values.

  • comparison: A string representation of the performed comparison.

  • condition1: The first condition of the performed comparison.

  • condition2: The second condition of the performed comparison.

  • sample_table: The (filtered) sample table (pd) - including the condition column used for comparison.

  • deseq_opts: A list of used DESeq parameters.

  • lfc_shrink_opts: A list of used lfcShrink parameters.


Offers functionality to perform a DGE analysis for bulk and single-cell data with DESeq2, automatically applying recommended models and parameter settings. It is strongly advised to provide 'raw' count data, as imported with import_dge_counts(). Installation of package 'apeglm' is recommended for LFC-shrinkage, for single-cell data the package 'glmGamPoi' is additionally recommended. For questions about DESeq2, LFC-shrinkage or s-values, please refer to the excellent DESeq2 vignette.

See also

import_dge_counts() for correct import of gene-level counts. combine_to_matrix() to summarize scRNA counts to one matrix.

