Method to subsample data. Can subsample by randomly selecting a desired number of cells from all samples (DEFAULT), subsample by specifying the exact number of cells for each sample (specify divide.by), or by subsampling the same number of cells from each sample based on the sample with the lowest count (specify divide.by and min.per). Useful to decrease total cells for generating dimensionality reduction plots (tSNE/UMAP).
Arguments
- dat
NO DEFAULT. Input dataframe with cells (rows) vs markers (columns).
- targets
NO DEFAULT. Vector of downsample targets. If divide.by is specified, then must be a vector of subsample targets in the same order as the unique divide.by entries (e.g. unique(dat[divide.by])). Can also provide as a data.table or data.frame where the first column is the unique entries in the divide.by argument (i.e. unique(dat[divide.by])), and the second column should be the targets. In this case, does not have to be in the order they appear in the dataset, but the 'divide.by' argument must be set.
- divide.by
DEFAULT = NULL. Character. Name of the column that reflects groupings of cells (sample names, group names etc) if you want to subsample by each.
- min.per
DEFAULT = FALSE. If TRUE, and divide.by is specified, each sample contributes the same amount of data based on sample with lowest count.
- seed
DEFAULT = 42. Numeric. Seed for reproducibility.
Author
Thomas Ashhurst, thomas.ashhurst@sydney.edu.au Felix Marsh-Wakefield, felix.marsh-wakefield@sydney.edu.au
Examples
# Subsample 10,000 cells randomly from the total dataset
sub.dat <- Spectre::do.subsample(dat = Spectre::demo.clustered,
targets = 10000)
# Subsample based on the sample with the smallest number of cells
sub.dat.sample <- Spectre::do.subsample(dat = Spectre::demo.clustered,
divide.by = "FileName",
min.per = TRUE)