Skip to contents

This function takes in a count matrix (where genes (features) are rows and samples are columns) and sample level metadata and returns a list object with an R::prcomp calculated object, the metadata, the percent variance explained for each principal component, and the genes (features) chosen for the PCA

Usage

run_pca(
  feature_by_sample,
  meta,
  method = "prcomp",
  ntop = 1000,
  hvg_selection = "scran",
  hvg_force = NULL,
  feature_scale = TRUE,
  feature_center = TRUE,
  normalization = TRUE,
  sample_scale = "cpm",
  log1p = TRUE,
  remove_regex = "^MT|^RPS|^RPL",
  irlba_n = 50
)

Arguments

feature_by_sample

Raw feature (gene) count matrix (where genes/features are rows and samples are columns).

meta

Metadata for the samples. The rows must match the columns for feature_by_sample.

method

Defaults to prcomp, use irlba for large matrices for speed improvement

ntop

Number of highly variable genes/features to use in the prcomp PCA. Defaults to 1000.

hvg_selection

Either "classic" or "scran" to select the "ntop" features. "classic" will simply use the top n features by variance, and "scran" will use the scran package's strategy of scaling variance by expression (as highly expressed features/genes) will also have higher variance and thus may be less useful for sample distinction.

hvg_force

Optional vector of features / genes that must be in the stats::promp input

feature_scale

Default is TRUE, which means features (genes) are scaled with the R::scale function.

feature_center

Default is TRUE, which means features (genes) are centered

normalization

Default is TRUE, if set to FALSE will override sample_scale and log1p and not do any sample scaling

sample_scale

Default is cpm; performs cpm scaling on the samples with the scuttle::calculateCPM() function.

log1p

Default is TRUE; applies log1p scaling to the input count matrix.

remove_regex

Default regex pattern is '^MT|^RPS|^RPL'. Set to '' to skip.

irlba_n

Default 50. Only used for irlba - will return this many principal components.

Value

A named list object with the prcomp output returned under the $PCA slot, the given metadata under the $meta slot, the percent variance of each PC as the $percentVar slot, a list object containing the scaled data's "center" and "scale" values for use in the metamoRph function, and the used parameters under the $params slot.\