This function allows you to run the 'Harmony' data alignment algorithm on single cell or cytometry data stored in a data.table
Arguments
- dat
NO DEFAULT. A data.table with all of the data you wish to align
- align.cols
NO default. The columns you wish to align. For cytometry data, this can be the markers themselves or principle components. For single-cell seq data, principle components are recommended.
- batch.col
NO default. The column that denotes the batch or dataset that each cell belongs to
- append.name
DEFAULT = '_aligned'. Text that will be appended to the new columns containing aligned data
- do_pca
DEFAULT = TRUE. Whether to perform PCA on input matrix.
- npcs
If doing PCA on input matrix, number of PCs to compute.
- theta
Diversity clustering penalty parameter. Specify for each variable in vars_use Default theta=2. theta=0 does not encourage any diversity. Larger values of theta result in more diverse clusters.
- lambda
Ridge regression penalty parameter. Specify for each variable in vars_use. Default lambda=1. Lambda must be strictly positive. Smaller values result in more aggressive correction.
- sigma
Width of soft kmeans clusters. Default sigma=0.1. Sigma scales the distance from a cell to cluster centroids. Larger values of sigma result in cells assigned to more clusters. Smaller values of sigma make soft kmeans cluster approach hard clustering.
- nclust
Number of clusters in model. nclust=1 equivalent to simple linear regression.
- tau
Protection against overclustering small datasets with large ones. tau is the expected number of cells per cluster.
- block.size
What proportion of cells to update during clustering. Between 0 to 1, default 0.05. Larger values may be faster but less accurate
- max.iter.harmony
Maximum number of rounds to run Harmony. One round of Harmony involves one clustering and one correction step.
- max.iter.cluster
Maximum number of rounds to run clustering at each round of Harmony.
- epsilon.cluster
Convergence tolerance for clustering round of Harmony. Set to -Inf to never stop early.
- epsilon.harmony
Convergence tolerance for Harmony. Set to -Inf to never stop early.
- plot_convergence
Whether to print the convergence plot of the clustering objective function. TRUE to plot, FALSE to suppress. This can be useful for debugging.
- return_object
(Advanced Usage) Whether to return the Harmony object or only the corrected PCA embeddings.
- verbose
DEFAULT = FALSE. Whether to print progress messages. TRUE to print, FALSE to suppress.
- reference_values
(Advanced Usage) Defines reference dataset(s). Cells that have batch variables values matching reference_values will not be moved.
- cluster_prior
(Advanced Usage) Provides user defined clusters for cluster initialization. If the number of provided clusters C is less than K, Harmony will initialize K-C clusters with kmeans. C cannot exceed K.
Author
Thomas M Ashhurst, thomas.ashhurst@sydney.edu.au