Perform differential gene expression analysis with DESeq2

run_deseq2(
  counts,
  pd,
  id_col = NULL,
  cond_col,
  cond_levels = NULL,
  lfc_threshold = 0,
  sig_threshold = 0.01,
  dge_calling_strategy = "bulk",
  subset_feature = NULL,
  subset_sample = NULL,
  deseq_opts = list(),
  lfc_shrink_opts = list(),
  return_dds = FALSE,
  BPPARAM = BiocParallel::SerialParam()
)

Arguments

counts

Can be either:

  1. (sparse) matrix with gene counts, where the rows correspond to genes. One column per sample/cell with the count data for the specified genes must be present (The names of these columns must match with the identifiers in id_col).

  2. An results object from tximport (preferred if import_dge_counts() was used with bulk data).

  3. Seurat object with a gene level assay as active.assay (most likely result object from combine_to_matrix())

pd

Data frame with at least a column of sample/cell identifiers (rownames or specified in id_col) and the comparison group definition (specified in cond_col).

id_col

Name of the column in pd, where unique sample/cell identifiers are stored. If NULL (default), use rownames of pd.

cond_col

Name of the column in pd, where the comparison groups/conditions are defined. If more than 2 levels/groups are present, the groups that should be used must be specified in cond_levels.

cond_levels

Define two levels/groups of cond_col, that should be compared. The order of the levels states the comparison formula (i.e. cond_col[1]-cond_col[2]).

lfc_threshold

Specify a log2 fold change threshold (on log2 scale) to test against. 0 implicates no threshold.

sig_threshold

Specify a significance threshold for the results adjusted p-values or s-values. 1 implicates no threshold.

dge_calling_strategy

Should be either 'bulk' or 'sc'. Specify the type of the provided data (bulk or single-cell RNA-seq) so that appropriate parameters can be applied.

subset_feature

Subsets the provided count matrix to only specified features. Can be names, indices or logicals.

subset_sample

Subsets the provided count matrix to only specified samples. Can be names, indices or logicals.

deseq_opts

Manually specify parameters for the DESeq function. Will overwrite recommended parameters if necessary.

lfc_shrink_opts

Manually specify parameters for the lfcShrink function. Will overwrite recommended parameters if necessary.

return_dds

Should the DESeqDataSet object be returned?

BPPARAM

If multicore processing should be used, specify a BiocParallelParam object here. Among others, can be SerialParam() (default) for non-multicore processing or MulticoreParam('number_cores') for multicore processing. See BiocParallel for more information.

Value

A list with the analysis results and parameters:

  • results_all: Data frame of the DGE test results for all analyzed genes.

  • results_sig: Data frame of the significant DEG test results, according to the specified parameters (sig_threshold, lfc_threshold).

  • dds: The DESeqDataSet of the analysis, if return_dds=TRUE.

  • drim: Results of the DRIMSeq statistical computations (dmTest()).

  • sval_threshold / adjp_threshold: The given significance threshold used to either filter s-values or adjusted p-values.

  • comparison: A string representation of the performed comparison.

  • condition1: The first condition of the performed comparison.

  • condition2: The second condition of the performed comparison.

  • sample_table: The (filtered) sample table (pd) - including the condition column used for comparison.

  • deseq_opts: A list of used DESeq parameters.

  • lfc_shrink_opts: A list of used lfcShrink parameters.

Details

Offers functionality to perform a DGE analysis for bulk and single-cell data with DESeq2, automatically applying recommended models and parameter settings. It is strongly advised to provide 'raw' count data, as imported with import_dge_counts(). Installation of package 'apeglm' is recommended for LFC-shrinkage, for single-cell data the package 'glmGamPoi' is additionally recommended. For questions about DESeq2, LFC-shrinkage or s-values, please refer to the excellent DESeq2 vignette.

See also

import_dge_counts() for correct import of gene-level counts. combine_to_matrix() to summarize scRNA counts to one matrix.

Other DTUrtle DGE: import_dge_counts()