Overview
The batch alignment and analysis workflow builds on the ‘simple discovery’ workflow by adding a step to facilitate batch alignment. This workflow allows for the correction of technical variation or shifts in signal levels in samples stained and/or acquired across multiple batches. To do this, we have implemented the CytoNorm algorithm (Van Gassen 2020). CytoNorm uses reference control samples that are prepared and recorded along with each batch of samples to identify and correct technical variations between individual batches, while preserving biologically relevant differences. For more information on CytoNorm, see Van Gassen et al 2020, and for more information on our implementation in Spectre, see Ashhurst et al 2021.
The demo dataset used for this worked example are cells extracted from mock- or virally-infected mouse bone marrow, measured by flow cytometry. Expression level values in these datasets have been manipulated to simulate acquisition over two batches.
Reference controls for CytoNorm
An example of reference control samples are aliquots of peripheral blood mononuclear cells (PBMCs) that are derived from a single donor at one time point, and cryopreserved (i.e. multiple aliquots of a biologically identical sample). Each time a set of PBMC samples from the study cohort are thawed, stained, and recorded, a reference controls is also thawed, stained, and recorded. Differences in signal level between the reference controls allows CytoNorm to learn the differences in signal levels due to the batches, and correct them, while preserving biological differences between the individual samples. In our demo dataset, we are using bone marrow samples derived from separate mice. Though not derived from the same mouse, these are similar enough that they can be used successfully as reference controls.
Correction of batch effects with CytoNorm
Reference samples are clustered together using FlowSOM, where we attempt to captures cells from matching populations in each batch in a single metacluster (e.g. neutrophils from batch 1 and 2 are captured in metacluster 1, etc). This assumes a reasonably low level of batch effect, consisting of small shifts in the expression levels for one or more markers. Because the proportion of cells in each population of the reference samples are identical, FlowSOM can use quantile distributions for each marker on each metacluser to create a model which will adjust the data values, removing technical variation between batches. This model is then applied to all samples in each batch.
Requirements
The reference controls can only correct batch effects for markers which are actually expressed on the reference controls. For example, some activation markers may be expressed on samples from the ‘disease’ group, but won’t be present on reference controls derived from healthy donors. In this case, we would simply not attempt to perform alignment on that marker. Typically, stable phenotyping markers (e.g. CD4, CD8) would be expressed strongly enough in reference samples, and would be suitable for alignment.
Other approaches
If you have samples derived from multiple batches, but do not have reference controls processed with each batch, then other forms of batch alignment might be possible – see our data integration options for more information.
Citation
If you use Spectre in your work, please consider citing Ashhurst TM, Marsh-Wakefield F, Putri GH et al. (2022). Cytometry Part A 101 (3), 237-253. Please also consider citing the authors of the individual packages or tools (e.g. CytoNorm, FlowSOM, tSNE, UMAP, etc) that are critical elements of your analysis work.
Sample methods blurb
Here is a sample methods blurb for this workflow. You may need to adapt this text to reflect any changes made in your analysis.
Computational analysis of data was performed using the Spectre R package (Ashhurst et al., 2022), with instructions and source code provided at https://github.com/ImmuneDynamics/spectre. Samples were initially prepared in FlowJo, and the population of interest was exported as raw value CSV files. Arcsinh transformation was performed on the data in R using a co-factor of 15 to redistribute the data on a linear scale and compress low end values near zero. The dataset was then merged into a single data.table, with keywords denoting the sample, group, and other factors added to each row (cell). The CytoNorm algorithm was used to correct for batch effects in the data using matched reference samples processed in each batch. The FlowSOM algorithm (Van Gassen et al., 2015) was then run on the merged dataset to cluster the data, where every cell is assigned to a specific cluster and metacluster. Subsequently, the data was downsampled and analysed by the dimensionality reduction algorithm Uniform Manifold Approximation and Projection (UMAP) (McInnes, Healy, Melville, 2018) for cellular visualisation.
Create a master folder with a meaningful name. Then inside that folder, insert the following:
Example:
BM analysis
/data
-- Contains data files, one CSV or FCS per sample
/metadata
-- Contains a CSV containing sample metadata (group, batch, etc)
/Spectre CytoNorm
-- CytoNorm workflow.R
You can download the CytoNorm script from this
link – place this inside the Spectre CytoNorm
folder.
If you would to use the demo data as a test run for the CytoNorm workflow, nothing to do at this step. Simply follow the relevant instructions further down this page to download the demo data (under 2. Import and prep data).
If you would like to use your own data, add your data and metadata files:
data
folder
you created above.metadata
folder you
created above.Please see this page for detailed instructions on exporting data for Spectre and setting up a metadata file.
Load the Spectre and other required libraries
Running library(Spectre) will load the Spectre package. We can then
use package.check()
to see if the standard dependency
packages are installed, and package.load()
to load those
packages.
### Load libraries
library(Spectre)
Spectre::package.check() # Check that all required packages are installed
Spectre::package.load() # Load required packages
### Install CytoNorm package
if(!require('remotes')) {install.packages('remotes')}
library('remotes')
remotes::install_github(repo = "saeyslab/CytoNorm")
Set ‘PrimaryDirectory’
Initially, we will set the location of the script as
PrimaryDirectory
. We’ll use this as a sort of ‘home page’
for where our analysis is going to be performed – including where to
find our input data, metadata, and where our output data will go.
### Set PrimaryDirectory
dirname(rstudioapi::getActiveDocumentContext()$path) # Finds the directory where this script is located
setwd(dirname(rstudioapi::getActiveDocumentContext()$path)) # Sets the working directory to where the script is located
getwd()
PrimaryDirectory <- getwd()
PrimaryDirectory
Set ‘InputDirectory’
Next we need to set the location of the ‘data’ folder – where our
samples for analysis are stored. In this example they are stored in a
sub-folder called data
.
### Set 'input' directory
setwd(PrimaryDirectory)
dir.create('../data', showWarnings = FALSE)
setwd("../data/")
InputDirectory <- getwd()
setwd(PrimaryDirectory)
Set ‘MetaDirectory’
We need to set the location of the ‘metadata’ folder. This is where
we can store a CSV file that contains any relevant metadata that we want
to embed in our samples. In this example, it is located in a sub-folder
called metadata.
### Set 'metadata' directory
setwd(PrimaryDirectory)
dir.create('../metadata', showWarnings = FALSE)
setwd("../metadata/")
MetaDirectory <- getwd()
setwd(PrimaryDirectory)
Create ‘OutputDirectory’
We need to create a folder where our output data can go once our
analysis is finished. In this example we will call this
Output_Spectre
.
### Create output directory
setwd(PrimaryDirectory)
dir.create("Output_Spectre", showWarnings = FALSE)
setwd("Output_Spectre")
OutputDirectory <- getwd()
setwd(PrimaryDirectory)
If you need the demo dataset, uncomment the following code in the analysis script (select all, CMD+SHIFT+C) and run to download. If you are using your own datasets, then skip this step.
This code will download the demo dataset files and metadata file, and
place them in the data
and metadata
folders
respectively.
# setwd(PrimaryDirectory)
# setwd("../")
# getwd()
# download.file(url = "https://github.com/ImmuneDynamics/data/blob/main/simBatches.zip?raw=TRUE", destfile = 'simBatches.zip', mode = 'wb')
# unzip(zipfile = 'simBatches.zip')
# for(i in list.files('simBatches/data', full.names = TRUE)){
# file.rename(from = i, to = gsub('simBatches/', '', i))
# }
# for(i in list.files('simBatches/metadata', full.names = TRUE)){
# file.rename(from = i, to = gsub('simBatches/', '', i))
# }
# unlink(c('simBatches/', 'simBatches.zip', '__MACOSX'), recursive = TRUE)
To begin, we will change our working directory to ‘InputDirectory’ and list all the files in that directory – CSV files in this example.
### Import data
setwd(InputDirectory)
list.files(InputDirectory, ".csv")
## [1] "Mock_01_A.csv" "Mock_02_A.csv" "Mock_03_A.csv" "Mock_04_A.csv"
## [5] "Mock_05_B.csv" "Mock_06_B.csv" "Mock_07_B.csv" "Mock_08_B.csv"
## [9] "Virus_01_A.csv" "Virus_02_A.csv" "Virus_03_A.csv" "Virus_04_A.csv"
## [13] "Virus_05_B.csv" "Virus_06_B.csv" "Virus_07_B.csv" "Virus_08_B.csv"
We can then read in all of our samples (in this example, one CSV file
per sample) into a list called data.list.
Spectre uses the
data.table
framework to store data, which reads, writes,
and performs operations on data very quickly.
data.list <- Spectre::read.files(file.loc = InputDirectory,
file.type = ".csv",
do.embed.file.names = TRUE)
Next we will review the data files by running the
do.list.summary()
function.
### Check the data
check <- do.list.summary(data.list)
check$name.table # Review column names and their subsequent values
## X1 X2 X3 X4 X5 X6
## 1 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 2 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 3 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 4 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 5 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 6 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 7 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 8 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 9 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 10 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 11 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 12 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 13 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 14 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 15 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 16 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## X7 X8 X9 X10
## 1 BUV737 B220 BV605 Ly6C FileName FileNo
## 2 BUV737 B220 BV605 Ly6C FileName FileNo
## 3 BUV737 B220 BV605 Ly6C FileName FileNo
## 4 BUV737 B220 BV605 Ly6C FileName FileNo
## 5 BUV737 B220 BV605 Ly6C FileName FileNo
## 6 BUV737 B220 BV605 Ly6C FileName FileNo
## 7 BUV737 B220 BV605 Ly6C FileName FileNo
## 8 BUV737 B220 BV605 Ly6C FileName FileNo
## 9 BUV737 B220 BV605 Ly6C FileName FileNo
## 10 BUV737 B220 BV605 Ly6C FileName FileNo
## 11 BUV737 B220 BV605 Ly6C FileName FileNo
## 12 BUV737 B220 BV605 Ly6C FileName FileNo
## 13 BUV737 B220 BV605 Ly6C FileName FileNo
## 14 BUV737 B220 BV605 Ly6C FileName FileNo
## 15 BUV737 B220 BV605 Ly6C FileName FileNo
## 16 BUV737 B220 BV605 Ly6C FileName FileNo
check$ncol.check # Review number of columns (features, markers) in each sample
## [,1]
## [1,] 10
## [2,] 10
## [3,] 10
## [4,] 10
## [5,] 10
## [6,] 10
## [7,] 10
## [8,] 10
## [9,] 10
## [10,] 10
## [11,] 10
## [12,] 10
## [13,] 10
## [14,] 10
## [15,] 10
## [16,] 10
check$nrow.check # Review number of rows (cells) in each sample
## [,1]
## [1,] 10000
## [2,] 10000
## [3,] 10000
## [4,] 10000
## [5,] 10000
## [6,] 10000
## [7,] 10000
## [8,] 10000
## [9,] 10000
## [10,] 10000
## [11,] 10000
## [12,] 10000
## [13,] 10000
## [14,] 10000
## [15,] 10000
## [16,] 10000
You can review the first sample by running
data.list[[1]]
.
data.list[[1]]
## PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48
## <num> <num> <num> <num> <num>
## 1: 51.6403 1239.360 -681.257 1752.270 2600.430
## 2: 58.8824 -140.921 -267.039 1188.640 392.962
## 3: 60.0077 2191.260 -362.088 1676.960 1441.300
## 4: 313.9580 2532.700 -438.388 478.109 1809.220
## 5: -83.0391 1603.390 -1200.350 564.994 3946.200
## ---
## 9996: -121.9210 4626.660 7377.120 1600.950 197.503
## 9997: 5635.1700 807.513 -831.751 1750.210 2388.820
## 9998: 132.8540 185.348 780.195 611.077 1471.310
## 9999: -78.8966 2564.800 10150.800 2161.430 264.904
## 10000: 17.2606 531.683 -801.073 350.229 1963.560
## BUV395 CD11b BUV737 B220 BV605 Ly6C FileName FileNo
## <num> <num> <num> <char> <int>
## 1: 627.259 17763.6000 128.9340 Mock_01_A 1
## 2: 304.615 11834.8000 907.3650 Mock_01_A 1
## 3: 4740.710 185.9470 3915.8200 Mock_01_A 1
## 4: 9440.380 1056.2900 6221.6700 Mock_01_A 1
## 5: 1636.110 4579.8700 274.5270 Mock_01_A 1
## ---
## 9996: 12732.300 -18.9127 3826.8000 Mock_01_A 1
## 9997: 1106.890 -394.7910 79.3191 Mock_01_A 1
## 9998: 823.702 6442.3800 -345.6880 Mock_01_A 1
## 9999: 12933.400 -106.9610 8881.1600 Mock_01_A 1
## 10000: 793.921 8161.8700 288.1360 Mock_01_A 1
Merge data.tables
We can merge the data into a single data.table using do.merge.files(). By default, columns with matching names will be aligned in the new table, and any columns that are present in some samples, but not others, will be added and filled with ‘NA’ for any samples that didn’t have that column initially.
### Merge data
cell.dat <- Spectre::do.merge.files(dat = data.list)
cell.dat
## PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48
## <num> <num> <num> <num> <num>
## 1: 51.6403 1239.3600 -681.257 1752.270 2600.430
## 2: 58.8824 -140.9210 -267.039 1188.640 392.962
## 3: 60.0077 2191.2600 -362.088 1676.960 1441.300
## 4: 313.9580 2532.7000 -438.388 478.109 1809.220
## 5: -83.0391 1603.3900 -1200.350 564.994 3946.200
## ---
## 159996: 106.3880 1661.1200 -114.892 2138.700 8311.230
## 159997: 1750.2300 65.4275 -843.984 19655.900 7874.900
## 159998: -159.2870 5363.3900 17389.300 2578.650 -362.093
## 159999: -62.6836 3132.7100 1941.660 1548.130 577.625
## 160000: 650.9440 1991.4900 254.429 1677.730 2209.210
## BUV395 CD11b BUV737 B220 BV605 Ly6C FileName FileNo
## <num> <num> <num> <char> <int>
## 1: 627.259 17763.600 128.934 Mock_01_A 1
## 2: 304.615 11834.800 907.365 Mock_01_A 1
## 3: 4740.710 185.947 3915.820 Mock_01_A 1
## 4: 9440.380 1056.290 6221.670 Mock_01_A 1
## 5: 1636.110 4579.870 274.527 Mock_01_A 1
## ---
## 159996: 127.068 258.936 6161.070 Virus_08_B 16
## 159997: 122.179 550.596 132.445 Virus_08_B 16
## 159998: 12091.600 121.659 4257.980 Virus_08_B 16
## 159999: 12904.000 338.244 1030.080 Virus_08_B 16
## 160000: 22242.300 789.639 12467.000 Virus_08_B 16
Read in sample metadata.
### Read in metadata
setwd(MetaDirectory)
meta.dat <- fread("sample.details.csv")
meta.dat
## FileName Sample Group Batch
## <char> <char> <char> <char>
## 1: Mock_01_A.csv Mock_01_A Mock A
## 2: Mock_02_A.csv Mock_02_A Mock A
## 3: Mock_03_A.csv Mock_03_A Mock A
## 4: Mock_04_A.csv Mock_04_A Mock A
## 5: Virus_01_A.csv WNV_01_A Virus A
## 6: Virus_02_A.csv WNV_02_A Virus A
## 7: Virus_03_A.csv WNV_03_A Virus A
## 8: Virus_04_A.csv WNV_04_A Virus A
## 9: Mock_05_B.csv Mock_05_B Mock B
## 10: Mock_06_B.csv Mock_06_B Mock B
## 11: Mock_07_B.csv Mock_07_B Mock B
## 12: Mock_08_B.csv Mock_08_B Mock B
## 13: Virus_05_B.csv WNV_05_B Virus B
## 14: Virus_06_B.csv WNV_06_B Virus B
## 15: Virus_07_B.csv WNV_07_B Virus B
## 16: Virus_08_B.csv WNV_08_B Virus B
Before we perform clustering etc, we need to meaningfully transform the data. For more information on why this is necessary, please see this page.
setwd(OutputDirectory)
dir.create("Output 1 - transformed plots")
## Warning in dir.create("Output 1 - transformed plots"): 'Output 1 - transformed
## plots' already exists
setwd("Output 1 - transformed plots")
First, check the column names of the dataset.
### Arcsinh transformation
as.matrix(names(cell.dat))
## [,1]
## [1,] "PECy5-5 CD3e"
## [2,] "PECy7 CD16-32"
## [3,] "DL800 Ly6G"
## [4,] "AF700 CD45"
## [5,] "APCCy7 CD48"
## [6,] "BUV395 CD11b"
## [7,] "BUV737 B220"
## [8,] "BV605 Ly6C"
## [9,] "FileName"
## [10,] "FileNo"
In this example, the columns we want to apply arcsinh transformation to are the cellular columns – column 1 to column 8. We can specify those columns using the code below.
to.asinh <- names(cell.dat)[c(1:8)]
to.asinh
## [1] "PECy5-5 CD3e" "PECy7 CD16-32" "DL800 Ly6G" "AF700 CD45"
## [5] "APCCy7 CD48" "BUV395 CD11b" "BUV737 B220" "BV605 Ly6C"
Define the cofactor we will use for transformation. As a general recommendation, we suggest using cofactor = 15 for CyTOF data, and cofactor between 100 and 1000 for flow data (we suggest 500 as a starting point). For more detailed guidance on setting the co-factors, see this page.
cofactor <- 500
You can also choose a column to use for plotting the transformed result – ideally something that is expressed on a variety of cell types in your dataset.
plot.against <- "BV605 Ly6C_asinh"
Now we need to apply arcsinh transformation to the data in those columns, using a specific co-factor.
cell.dat <- do.asinh(cell.dat, to.asinh, cofactor = cofactor)
We can then make some plots to see if the arcsinh transformation is appropriate. Check the plots and see if you are happy with the transformation. If happy, the proceed with analysis. Otherwise, go back to the merging of the data.list (to create cell.dat) and try with another co-factor.
transformed.cols <- paste0(to.asinh, "_asinh")
for(i in transformed.cols){
make.colour.plot(do.subsample(cell.dat, 20000), i, plot.against)
}
We also want to attach some sample metadata, to aid with our analysis.
### Add metadata to data.table
meta.dat
## FileName Sample Group Batch
## <char> <char> <char> <char>
## 1: Mock_01_A.csv Mock_01_A Mock A
## 2: Mock_02_A.csv Mock_02_A Mock A
## 3: Mock_03_A.csv Mock_03_A Mock A
## 4: Mock_04_A.csv Mock_04_A Mock A
## 5: Virus_01_A.csv WNV_01_A Virus A
## 6: Virus_02_A.csv WNV_02_A Virus A
## 7: Virus_03_A.csv WNV_03_A Virus A
## 8: Virus_04_A.csv WNV_04_A Virus A
## 9: Mock_05_B.csv Mock_05_B Mock B
## 10: Mock_06_B.csv Mock_06_B Mock B
## 11: Mock_07_B.csv Mock_07_B Mock B
## 12: Mock_08_B.csv Mock_08_B Mock B
## 13: Virus_05_B.csv WNV_05_B Virus B
## 14: Virus_06_B.csv WNV_06_B Virus B
## 15: Virus_07_B.csv WNV_07_B Virus B
## 16: Virus_08_B.csv WNV_08_B Virus B
Here we will select just the columns we would like to add to our
data. In this example we only want to include use first four columns:
Filename
, Sample
, Group
, and
Batch.
We will use Filename
for matching
between cell.dat
and meta.dat
, and the other
three columns will be the information that gets added to
cell.dat
.
sample.info <- meta.dat[,c(1:4)]
sample.info
## FileName Sample Group Batch
## <char> <char> <char> <char>
## 1: Mock_01_A.csv Mock_01_A Mock A
## 2: Mock_02_A.csv Mock_02_A Mock A
## 3: Mock_03_A.csv Mock_03_A Mock A
## 4: Mock_04_A.csv Mock_04_A Mock A
## 5: Virus_01_A.csv WNV_01_A Virus A
## 6: Virus_02_A.csv WNV_02_A Virus A
## 7: Virus_03_A.csv WNV_03_A Virus A
## 8: Virus_04_A.csv WNV_04_A Virus A
## 9: Mock_05_B.csv Mock_05_B Mock B
## 10: Mock_06_B.csv Mock_06_B Mock B
## 11: Mock_07_B.csv Mock_07_B Mock B
## 12: Mock_08_B.csv Mock_08_B Mock B
## 13: Virus_05_B.csv WNV_05_B Virus B
## 14: Virus_06_B.csv WNV_06_B Virus B
## 15: Virus_07_B.csv WNV_07_B Virus B
## 16: Virus_08_B.csv WNV_08_B Virus B
Now we can add this information to cell.dat
.
Essentially, the file names are listed in the metadata table, and we can
use that to add any listed metadata in the table to the corresponding
files in data.list. We can review the data to ensure the metadata has
been correctly embedded.
cell.dat <- do.add.cols(cell.dat, "FileName", sample.info, "FileName", rmv.ext = TRUE)
## Removing '.csv' or '.fcs' extension
## Step 1/3. Mapping data
## Step 2/3. Merging data
## Step 3/3. Returning data
cell.dat
## PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48
## <num> <num> <num> <num> <num>
## 1: 51.6403 1239.3600 -681.257 1752.270 2600.430
## 2: 58.8824 -140.9210 -267.039 1188.640 392.962
## 3: 60.0077 2191.2600 -362.088 1676.960 1441.300
## 4: 313.9580 2532.7000 -438.388 478.109 1809.220
## 5: -83.0391 1603.3900 -1200.350 564.994 3946.200
## ---
## 159996: 106.3880 1661.1200 -114.892 2138.700 8311.230
## 159997: 1750.2300 65.4275 -843.984 19655.900 7874.900
## 159998: -159.2870 5363.3900 17389.300 2578.650 -362.093
## 159999: -62.6836 3132.7100 1941.660 1548.130 577.625
## 160000: 650.9440 1991.4900 254.429 1677.730 2209.210
## BUV395 CD11b BUV737 B220 BV605 Ly6C FileName FileNo
## <num> <num> <num> <fctr> <int>
## 1: 627.259 17763.600 128.934 Mock_01_A 1
## 2: 304.615 11834.800 907.365 Mock_01_A 1
## 3: 4740.710 185.947 3915.820 Mock_01_A 1
## 4: 9440.380 1056.290 6221.670 Mock_01_A 1
## 5: 1636.110 4579.870 274.527 Mock_01_A 1
## ---
## 159996: 127.068 258.936 6161.070 Virus_08_B 16
## 159997: 122.179 550.596 132.445 Virus_08_B 16
## 159998: 12091.600 121.659 4257.980 Virus_08_B 16
## 159999: 12904.000 338.244 1030.080 Virus_08_B 16
## 160000: 22242.300 789.639 12467.000 Virus_08_B 16
## PECy5-5 CD3e_asinh PECy7 CD16-32_asinh DL800 Ly6G_asinh
## <num> <num> <num>
## 1: 0.1030979 1.6392988 -1.1159989
## 2: 0.1174943 -0.2782380 -0.5114826
## 3: 0.1197291 2.1835405 -0.6723598
## 4: 0.5926148 2.3251843 -0.7915047
## 5: -0.1653241 1.8818839 -1.6097071
## ---
## 159996: 0.2112023 1.9157040 -0.2278085
## 159997: 1.9658468 0.1304844 -1.2947039
## 159998: -0.3134175 3.0680563 4.2423561
## 159999: -0.1250411 2.5345014 2.0660180
## 160000: 1.0796017 2.0905763 0.4891206
## AF700 CD45_asinh APCCy7 CD48_asinh BUV395 CD11b_asinh BUV737 B220_asinh
## <num> <num> <num> <num>
## 1: 1.9669670 2.3510881 1.0504123 4.2636438
## 2: 1.6006640 0.7216390 0.5767234 3.8577846
## 3: 1.9247953 1.7806527 2.9452507 0.3638149
## 4: 0.8500738 1.9977594 3.6319912 1.4928782
## 5: 0.9703694 2.7630370 1.9011864 2.9109315
## ---
## 159996: 2.1598848 3.5048055 0.2514770 0.4971396
## 159997: 4.3648336 3.4509812 0.2419893 0.9511485
## 159998: 2.3428298 -0.6723679 3.8792326 0.2409789
## 159999: 1.8484549 0.9870066 3.9442068 0.6332984
## 160000: 1.9252352 2.1914959 4.4884166 1.2379493
## BV605 Ly6C_asinh Sample Group Batch
## <num> <char> <char> <char>
## 1: 0.2550924 Mock_01_A Mock A
## 2: 1.3575720 Mock_01_A Mock A
## 3: 2.7553704 Mock_01_A Mock A
## 4: 3.2159434 Mock_01_A Mock A
## 5: 0.5246514 Mock_01_A Mock A
## ---
## 159996: 3.2061873 WNV_08_B Virus B
## 159997: 0.2618862 WNV_08_B Virus B
## 159998: 2.8385188 WNV_08_B Virus B
## 159999: 1.4702206 WNV_08_B Virus B
## 160000: 3.9097814 WNV_08_B Virus B
Check the column names.
### Define cellular columns
as.matrix(names(cell.dat))
## [,1]
## [1,] "PECy5-5 CD3e"
## [2,] "PECy7 CD16-32"
## [3,] "DL800 Ly6G"
## [4,] "AF700 CD45"
## [5,] "APCCy7 CD48"
## [6,] "BUV395 CD11b"
## [7,] "BUV737 B220"
## [8,] "BV605 Ly6C"
## [9,] "FileName"
## [10,] "FileNo"
## [11,] "PECy5-5 CD3e_asinh"
## [12,] "PECy7 CD16-32_asinh"
## [13,] "DL800 Ly6G_asinh"
## [14,] "AF700 CD45_asinh"
## [15,] "APCCy7 CD48_asinh"
## [16,] "BUV395 CD11b_asinh"
## [17,] "BUV737 B220_asinh"
## [18,] "BV605 Ly6C_asinh"
## [19,] "Sample"
## [20,] "Group"
## [21,] "Batch"
Specify columns that represent transformed cellular features (in this
case, the arcsinh transformed data, defined by “
cellular.cols <- names(cell.dat)[c(11:18)]
as.matrix(cellular.cols)
## [,1]
## [1,] "PECy5-5 CD3e_asinh"
## [2,] "PECy7 CD16-32_asinh"
## [3,] "DL800 Ly6G_asinh"
## [4,] "AF700 CD45_asinh"
## [5,] "APCCy7 CD48_asinh"
## [6,] "BUV395 CD11b_asinh"
## [7,] "BUV737 B220_asinh"
## [8,] "BV605 Ly6C_asinh"
Additionally, specify the columns that will be used to generate cluster and tSNE/UMAP results. Columns that are not specified here will still be analysed, but won’t contributed to the generation of clusters. There are a couple of strategies to take here: use all cellular columns for clustering to looks for all possible cell types/states, or use only stably expressed markers to cluster stable phenotypes, which can then be examined for changes in more dynamic markers. For more guidance, see this page.
### Define clustering columns
as.matrix(names(cell.dat))
## [,1]
## [1,] "PECy5-5 CD3e"
## [2,] "PECy7 CD16-32"
## [3,] "DL800 Ly6G"
## [4,] "AF700 CD45"
## [5,] "APCCy7 CD48"
## [6,] "BUV395 CD11b"
## [7,] "BUV737 B220"
## [8,] "BV605 Ly6C"
## [9,] "FileName"
## [10,] "FileNo"
## [11,] "PECy5-5 CD3e_asinh"
## [12,] "PECy7 CD16-32_asinh"
## [13,] "DL800 Ly6G_asinh"
## [14,] "AF700 CD45_asinh"
## [15,] "APCCy7 CD48_asinh"
## [16,] "BUV395 CD11b_asinh"
## [17,] "BUV737 B220_asinh"
## [18,] "BV605 Ly6C_asinh"
## [19,] "Sample"
## [20,] "Group"
## [21,] "Batch"
cluster.cols <- names(cell.dat)[c(11:18)]
as.matrix(cluster.cols)
## [,1]
## [1,] "PECy5-5 CD3e_asinh"
## [2,] "PECy7 CD16-32_asinh"
## [3,] "DL800 Ly6G_asinh"
## [4,] "AF700 CD45_asinh"
## [5,] "APCCy7 CD48_asinh"
## [6,] "BUV395 CD11b_asinh"
## [7,] "BUV737 B220_asinh"
## [8,] "BV605 Ly6C_asinh"
Here we can also specify sample, group, and batch columns.
### Define other key columns
as.matrix(names(cell.dat))
## [,1]
## [1,] "PECy5-5 CD3e"
## [2,] "PECy7 CD16-32"
## [3,] "DL800 Ly6G"
## [4,] "AF700 CD45"
## [5,] "APCCy7 CD48"
## [6,] "BUV395 CD11b"
## [7,] "BUV737 B220"
## [8,] "BV605 Ly6C"
## [9,] "FileName"
## [10,] "FileNo"
## [11,] "PECy5-5 CD3e_asinh"
## [12,] "PECy7 CD16-32_asinh"
## [13,] "DL800 Ly6G_asinh"
## [14,] "AF700 CD45_asinh"
## [15,] "APCCy7 CD48_asinh"
## [16,] "BUV395 CD11b_asinh"
## [17,] "BUV737 B220_asinh"
## [18,] "BV605 Ly6C_asinh"
## [19,] "Sample"
## [20,] "Group"
## [21,] "Batch"
exp.name <- "BM experiment"
sample.col <- "Sample"
group.col <- "Group"
batch.col <- "Batch"
Additionally, we want to specify the downsample targets for dimensionality reduction. This influences how many cells will be shown on a tSNE/UMAP plot, and we are specifying the number of cells per group to downsample to. Check for the number of cells (rows) in each group.
### Subsample targets per group
data.frame(table(cell.dat[[group.col]])) # Check number of cells per sample.
## Var1 Freq
## 1 Mock 80000
## 2 Virus 80000
You can then specify the number to downsample to in each group. These
must be lower than the total number of cells in each group, and must be
provided in the order that the group names appear in the dataset when
running as.matrix(unique(cell.dat[[group.col]]))
.
In this example we want 10000 cells from ‘mock’ and 10000 cells from ‘Virus’ so that we have balanced numbers visualised between each group.
as.matrix(unique(cell.dat[[group.col]]))
## [,1]
## [1,] "Mock"
## [2,] "Virus"
sub.targets <- c(10000, 10000) # target subsample numbers from each group
sub.targets
## [1] 10000 10000
Here we need to setup some preferences for batch alignment.
### Extract reference samples
sample.info
## FileName Sample Group Batch
## <char> <char> <char> <char>
## 1: Mock_01_A.csv Mock_01_A Mock A
## 2: Mock_02_A.csv Mock_02_A Mock A
## 3: Mock_03_A.csv Mock_03_A Mock A
## 4: Mock_04_A.csv Mock_04_A Mock A
## 5: Virus_01_A.csv WNV_01_A Virus A
## 6: Virus_02_A.csv WNV_02_A Virus A
## 7: Virus_03_A.csv WNV_03_A Virus A
## 8: Virus_04_A.csv WNV_04_A Virus A
## 9: Mock_05_B.csv Mock_05_B Mock B
## 10: Mock_06_B.csv Mock_06_B Mock B
## 11: Mock_07_B.csv Mock_07_B Mock B
## 12: Mock_08_B.csv Mock_08_B Mock B
## 13: Virus_05_B.csv WNV_05_B Virus B
## 14: Virus_06_B.csv WNV_06_B Virus B
## 15: Virus_07_B.csv WNV_07_B Virus B
## 16: Virus_08_B.csv WNV_08_B Virus B
In this case we need to select which samples will serve as our ‘reference’ samples. These should be individual aliquots from a single biological samples that have been stain/processed in each batch (e.g. cryopreserved PBMCs from a single blood collection event from a single individual). In this example, we have selected one bone marrow sample from a mock infected mice in each batch.
as.matrix(unique(cell.dat[[sample.col]]))
## [,1]
## [1,] "Mock_01_A"
## [2,] "Mock_02_A"
## [3,] "Mock_03_A"
## [4,] "Mock_04_A"
## [5,] "Mock_05_B"
## [6,] "Mock_06_B"
## [7,] "Mock_07_B"
## [8,] "Mock_08_B"
## [9,] "WNV_01_A"
## [10,] "WNV_02_A"
## [11,] "WNV_03_A"
## [12,] "WNV_04_A"
## [13,] "WNV_05_B"
## [14,] "WNV_06_B"
## [15,] "WNV_07_B"
## [16,] "WNV_08_B"
Here we are selecting the two samples: Mock_01_A
, from
batch A and Mock_05_B
from batch B, which will be our
reference samples.
refs <- unique(cell.dat[[sample.col]])[c(1,5)]
refs
## [1] "Mock_01_A" "Mock_05_B"
Now we can create a new dataset (ref.dat) containing only cells from
the reference samples (in this example, Mock_01_A
from
batch A, and Mock_05_B
from batch B)
ref.dat <- do.filter(cell.dat, sample.col, refs)
ref.dat
## PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48
## <num> <num> <num> <num> <num>
## 1: 51.64030 1239.360 -681.2570 1752.270 2600.430
## 2: 58.88240 -140.921 -267.0390 1188.640 392.962
## 3: 60.00770 2191.260 -362.0880 1676.960 1441.300
## 4: 313.95800 2532.700 -438.3880 478.109 1809.220
## 5: -83.03910 1603.390 -1200.3500 564.994 3946.200
## ---
## 19996: -30.65640 158.361 -1988.0000 5905.010 8831.280
## 19997: 148.32100 5819.820 15918.1000 2497.000 751.419
## 19998: 669.96300 3373.540 -33.8201 1144.100 1320.440
## 19999: 320.50700 4386.740 7465.1800 2107.970 -134.778
## 20000: -9.93316 5929.620 11076.1000 1861.070 267.681
## BUV395 CD11b BUV737 B220 BV605 Ly6C FileName FileNo PECy5-5 CD3e_asinh
## <num> <num> <num> <fctr> <int> <num>
## 1: 627.2590 17763.6000 128.934 Mock_01_A 1 0.10309786
## 2: 304.6150 11834.8000 907.365 Mock_01_A 1 0.11749428
## 3: 4740.7100 185.9470 3915.820 Mock_01_A 1 0.11972914
## 4: 9440.3800 1056.2900 6221.670 Mock_01_A 1 0.59261483
## 5: 1636.1100 4579.8700 274.527 Mock_01_A 1 -0.16532406
## ---
## 19996: -77.0832 9662.6800 568.947 Mock_05_B 5 -0.06127445
## 19997: 13401.7000 -91.9106 2209.480 Mock_05_B 5 0.29245518
## 19998: 4762.0700 781.4270 4849.040 Mock_05_B 5 1.10256164
## 19999: 15413.4000 493.2230 1767.920 Mock_05_B 5 0.60367460
## 20000: 11084.1000 50.4877 1180.210 Mock_05_B 5 -0.01986501
## PECy7 CD16-32_asinh DL800 Ly6G_asinh AF700 CD45_asinh APCCy7 CD48_asinh
## <num> <num> <num> <num>
## 1: 1.6392988 -1.11599889 1.9669670 2.3510881
## 2: -0.2782380 -0.51148258 1.6006640 0.7216390
## 3: 2.1835405 -0.67235984 1.9247953 1.7806527
## 4: 2.3251843 -0.79150471 0.8500738 1.9977594
## 5: 1.8818839 -1.60970711 0.9703694 2.7630370
## ---
## 19996: 0.3116525 -2.08887520 3.1638831 3.5653947
## 19997: 3.1494039 4.15399776 2.3112610 1.1963364
## 19998: 2.6077040 -0.06758873 1.5655634 1.6983185
## 19999: 2.8681129 3.39766352 2.1457973 -0.2663940
## 20000: 3.1680273 3.79159306 2.0250211 0.5126149
## BUV395 CD11b_asinh BUV737 B220_asinh BV605 Ly6C_asinh Sample Group
## <num> <num> <num> <char> <char>
## 1: 1.0504123 4.2636438 0.2550924 Mock_01_A Mock
## 2: 0.5767234 3.8577846 1.3575720 Mock_01_A Mock
## 3: 2.9452507 0.3638149 2.7553704 Mock_01_A Mock
## 4: 3.6319912 1.4928782 3.2159434 Mock_01_A Mock
## 5: 1.9011864 2.9109315 0.5246514 Mock_01_A Mock
## ---
## 19996: -0.1535622 3.6552341 0.9755986 Mock_05_B Mock
## 19997: 3.9820237 -0.1828014 2.1916151 Mock_05_B Mock
## 19998: 2.9497215 1.2291302 2.9677227 Mock_05_B Mock
## 19999: 4.1217946 0.8717569 1.9755202 Mock_05_B Mock
## 20000: 3.7923143 0.1008046 1.5941070 Mock_05_B Mock
## Batch
## <char>
## 1: A
## 2: A
## 3: A
## 4: A
## 5: A
## ---
## 19996: B
## 19997: B
## 19998: B
## 19999: B
## 20000: B
First, we are going to align the data of only the two reference samples.
Setup a sub-directory for some pre-alignment plots.
### Initial clustering
setwd(OutputDirectory)
dir.create("Output 2 - alignment")
## Warning in dir.create("Output 2 - alignment"): 'Output 2 - alignment' already
## exists
setwd("Output 2 - alignment")
Run prep.cytonorm
to initialise a cytonorm object,
including clustering with FlowSOM. The number of metaclustes will be
determined automatically, or you can specify a desired number of
metaclusters by using the additional argument meta.k
(e.g. meta.k = 20
).
cytnrm <- prep.cytonorm(dat = ref.dat,
cellular.cols = cellular.cols,
cluster.cols = cluster.cols,
batch.col = batch.col,
sample.col = sample.col)
## Loading required package: CytoNorm
## Loading required package: Biobase
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following object is masked from 'package:gridExtra':
##
## combine
## The following objects are masked from 'package:igraph':
##
## normalize, path, union
## The following object is masked from 'package:flowCore':
##
## normalize
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, aperm, append, as.data.frame, basename, cbind,
## colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
## get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
## match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
## Position, rank, rbind, Reduce, rownames, sapply, setdiff, table,
## tapply, union, unique, unsplit, which.max, which.min
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
## Working directory is '/Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm'
## Step 1/3. Mapping data
## Step 2/3. Merging data
## Step 3/3. Returning data
## Step 1/4 - Splitting files for use with original FlowSOM function
## Step 2/4 - Running FlowSOM
## Reading A.fcs
## Reading B.fcs
## Step 3/4 - Preparing FlowSOM object and results data.table
## Step 4/4 - FlowSOM preparation for alignment model complete
cytnrm
## $fsom
## FlowSOM model trained on 20000 cells and 8 markers,
## using a 5x5 grid (25 clusters) and 10 metaclusters.
##
## Markers used: column PECy5-5 CD3e_asinh from dataset <PECy5-5 CD3e_asinh>, column PECy7 CD16-32_asinh from dataset <PECy7 CD16-32_asinh>, column DL800 Ly6G_asinh from dataset <DL800 Ly6G_asinh>, column AF700 CD45_asinh from dataset <AF700 CD45_asinh>, column APCCy7 CD48_asinh from dataset <APCCy7 CD48_asinh>, column BUV395 CD11b_asinh from dataset <BUV395 CD11b_asinh>, column BUV737 B220_asinh from dataset <BUV737 B220_asinh>, column BV605 Ly6C_asinh from dataset <BV605 Ly6C_asinh>
##
## Metacluster cell count:
## MC 1 MC 2 MC 3 MC 4 MC 5 MC 6 MC 7 MC 8 MC 9 MC 10
## 5886 455 83 7267 2389 878 331 1708 717 286
##
## 168 cells (0.84%) are further than 4 MAD from their cluster center.
## $dt
## PECy5-5 CD3e_asinh PECy7 CD16-32_asinh DL800 Ly6G_asinh AF700 CD45_asinh
## <num> <num> <num> <num>
## 1: -0.073653787 -0.06836993 -0.71957141 -0.2601342
## 2: 0.196819022 1.04051566 0.03595166 0.5919558
## 3: -0.148785651 2.90569115 3.45313573 1.2724129
## 4: 0.189247936 3.06170583 4.11895704 1.9266738
## 5: -0.306349635 0.44284853 0.42922080 0.4019526
## ---
## 19996: 2.784854412 0.39633411 -0.98542446 3.4850731
## 19997: 0.007048962 3.26623678 3.83651400 2.1971622
## 19998: -0.144241706 1.14419019 0.08433798 1.8859901
## 19999: 0.136786833 2.73393393 2.65572119 1.3384684
## 20000: 0.101986513 3.28912425 3.33585572 2.1958396
## APCCy7 CD48_asinh BUV395 CD11b_asinh BUV737 B220_asinh BV605 Ly6C_asinh
## <num> <num> <num> <num>
## 1: 0.6676006 0.455060869 0.191696376 -0.3804775
## 2: 0.7900302 -0.004049649 4.381132603 -0.2398050
## 3: 0.9693112 3.634680748 0.466979593 2.6836505
## 4: 0.4057303 4.441645622 0.637388706 3.1590574
## 5: -0.5365349 1.400961280 -0.298591286 -0.1681118
## ---
## 19996: 1.9617901 0.664981723 -0.590357363 3.6845517
## 19997: 0.5892904 4.073036194 -0.006483534 2.1069219
## 19998: 3.1991692 -0.560110509 3.579796553 -0.1521181
## 19999: 0.7387957 4.001876354 0.324388862 1.7045259
## 20000: 1.4960654 4.071680546 0.229720220 1.9134706
## File File_scattered Original_ID prep.fsom.cluster prep.fsom.metacluster
## <num> <num> <num> <num> <num>
## 1: 1 1.0370497 2369 17 7
## 2: 1 0.8859173 5273 12 1
## 3: 1 0.9769963 9290 15 4
## 4: 1 1.1308356 1252 9 4
## 5: 1 0.9924445 8826 17 7
## ---
## 19996: 2 1.9423792 6029 21 9
## 19997: 2 2.0499919 6970 4 4
## 19998: 2 1.9680801 8165 11 1
## 19999: 2 1.9309787 2143 15 4
## 20000: 2 1.9343390 3821 4 4
##
## $cellular.cols
## [1] "PECy5-5 CD3e_asinh" "PECy7 CD16-32_asinh" "DL800 Ly6G_asinh"
## [4] "AF700 CD45_asinh" "APCCy7 CD48_asinh" "BUV395 CD11b_asinh"
## [7] "BUV737 B220_asinh" "BV605 Ly6C_asinh"
##
## $cluster.cols
## [1] "PECy5-5 CD3e_asinh" "PECy7 CD16-32_asinh" "DL800 Ly6G_asinh"
## [4] "AF700 CD45_asinh" "APCCy7 CD48_asinh" "BUV395 CD11b_asinh"
## [7] "BUV737 B220_asinh" "BV605 Ly6C_asinh"
##
## $files
## [1] "A" "B"
##
## $file.nums
## [1] 1 2
Once this is complete, we can subsample the dataset and plot it using UMAP.
cytnrm.sub <- do.subsample(cytnrm$dt, 10000)
cytnrm.sub <- run.umap(cytnrm.sub, use.cols = cluster.cols)
## [2024-05-23 15:43:15.585413] starting umap
## [2024-05-23 15:43:15.60073] creating graph of nearest neighbors
## [2024-05-23 15:43:27.204277] creating initial embedding
## [2024-05-23 15:43:27.74285] optimizing embedding
## [2024-05-23 15:43:39.423743] done
Create a subfolder for plots.
### Initial clustering plots
setwd(OutputDirectory)
setwd("Output 2 - alignment")
dir.create("1 - ref pre-alignment")
## Warning in dir.create("1 - ref pre-alignment"): '1 - ref pre-alignment' already
## exists
setwd("1 - ref pre-alignment")
First, we will colour the cells by which batch they originated from. We can see batch effects evident by the offset between the red and green cells. In this case they are coloured by ‘File’.
make.colour.plot(cytnrm.sub, 'UMAP_X', 'UMAP_Y', 'File', 'factor')
Then, we need to examine the cellular expression across these cells.
make.multi.plot(cytnrm.sub, 'UMAP_X', 'UMAP_Y', cellular.cols)
## Check your working directory for a new .png called 'Multi plot.png'
Finally, we need to examine the metaclusters that have been generated, and confirm that they capture matching populations across batches (e.g. in this example, metacluster 1 captures B220+ B cells from both batches. If too many or too few metaclusters are generated, you can try running the prep.cytonorm function again, and specify a desired number of metaclusters using the meta.k argument.
make.colour.plot(cytnrm.sub, 'UMAP_X', 'UMAP_Y', 'prep.fsom.metacluster', 'factor', add.label = TRUE)
You can see which file number corresponds to each batch by running
cytnrm$files
and cytnrm$file.nums
.
cytnrm$files
## [1] "A" "B"
cytnrm$file.nums
## [1] 1 2
If the metaclusters are suitable, we will proceed to actually
performing alignment. Because the train.cytonorm
and
run.cytonorm
functions involve the writing and reading of
files, we will set the directory beforehand.
Set a directory for the outputs
### Alignment
setwd(OutputDirectory)
dir.create("Output 2 - alignment")
## Warning in dir.create("Output 2 - alignment"): 'Output 2 - alignment' already
## exists
setwd("Output 2 - alignment")
Firstly, we need to train the model that will perform quantile alignment for each marker on each metacluster.
cytnrm <- train.cytonorm(model = cytnrm, align.cols = cellular.cols)
## Training alignment - setup
## Working directory is '/Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm'
## Training alignment - file (batch) preparation
## Training alignment - file (batch) and metacluster splitting
## -- running File '1'
## -- running File '2'
## Training alignment - learning conversions
## Processing cluster 1
## Computing Quantiles
## A (FileID 1)
##
## B (FileID 2)
##
## Computing Splines
## A
## B
## Processing cluster 2
## Computing Quantiles
## A (FileID 1)
##
## B (FileID 2)
##
## Computing Splines
## A
## B
## Processing cluster 3
## Computing Quantiles
## A (FileID 1)
##
## Less then 50 cells in A (34). No quantiles computed.
## B (FileID 2)
##
## Less then 50 cells in B (49). No quantiles computed.
## Computing Splines
## A
## Warning in (function (files, labels, channels, transformList, nQ = 101, : Not enough cells for A
## The identity function will be used.
## B
## Warning in (function (files, labels, channels, transformList, nQ = 101, : Not enough cells for B
## The identity function will be used.
## Processing cluster 4
## Computing Quantiles
## A (FileID 1)
##
## B (FileID 2)
##
## Computing Splines
## A
## B
## Processing cluster 5
## Computing Quantiles
## A (FileID 1)
##
## B (FileID 2)
##
## Computing Splines
## A
## B
## Processing cluster 6
## Computing Quantiles
## A (FileID 1)
##
## B (FileID 2)
##
## Computing Splines
## A
## B
## Processing cluster 7
## Computing Quantiles
## A (FileID 1)
##
## B (FileID 2)
##
## Computing Splines
## A
## B
## Processing cluster 8
## Computing Quantiles
## A (FileID 1)
##
## B (FileID 2)
##
## Computing Splines
## A
## B
## Processing cluster 9
## Computing Quantiles
## A (FileID 1)
##
## B (FileID 2)
##
## Computing Splines
## A
## B
## Processing cluster 10
## Computing Quantiles
## A (FileID 1)
##
## B (FileID 2)
##
## Computing Splines
## A
## B
## Training alignment - cleanup
## Training alignment - training complete
Once complete, this model can be used to align the full dataset.
cell.dat <- run.cytonorm(dat = cell.dat, model = cytnrm, batch.col = batch.col)
## Working directory is '/Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm'
## Alignment -- splitting files (batches)
## Alignment -- processing files (batches)
## -- Splitting A.fcs
## Warning in dir.create("tmp-target"): 'tmp-target' already exists
## Warning in FlowSOM::NewData(model$fsom, ff): 1258 cells (1.57%) seem far from
## their cluster centers.
## -- Splitting B.fcs
## Warning in dir.create("tmp-target"): 'tmp-target' already exists
## Warning in FlowSOM::NewData(model$fsom, ff): 1648 cells (2.06%) seem far from
## their cluster centers.
## Alignment -- aligning data
## -- Processing cluster 1
## /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom1.fcs (A)
## Normalizing A
## /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom1.fcs (B)
## Normalizing B
## -- Processing cluster 2
## /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom2.fcs (A)
## Normalizing A
## /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom2.fcs (B)
## Normalizing B
## -- Processing cluster 3
## /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom3.fcs (A)
## Normalizing A
## /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom3.fcs (B)
## Normalizing B
## -- Processing cluster 4
## /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom4.fcs (A)
## Normalizing A
## /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom4.fcs (B)
## Normalizing B
## -- Processing cluster 5
## /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom5.fcs (A)
## Normalizing A
## /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom5.fcs (B)
## Normalizing B
## -- Processing cluster 6
## /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom6.fcs (A)
## Normalizing A
## /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom6.fcs (B)
## Normalizing B
## -- Processing cluster 7
## /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom7.fcs (A)
## Normalizing A
## /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom7.fcs (B)
## Normalizing B
## -- Processing cluster 8
## /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom8.fcs (A)
## Normalizing A
## /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom8.fcs (B)
## Normalizing B
## -- Processing cluster 9
## /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom9.fcs (A)
## Normalizing A
## /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom9.fcs (B)
## Normalizing B
## -- Processing cluster 10
## /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom10.fcs (A)
## Normalizing A
## /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom10.fcs (B)
## Normalizing B
## Alignment -- merging aligned data for each metacluster
## Alignment -- creating final data.table
## Alignment -- complete
To examine the results, we first need to specify the cellular names containing the aligned data (they will have ’_aligned’ appended to the end).
aligned.cols <- paste0(cellular.cols, '_aligned')
Then we can set a new output directory.
### Plotting reference data
setwd(OutputDirectory)
setwd("Output 2 - alignment")
dir.create("2 - ref aligned")
## Warning in dir.create("2 - ref aligned"): '2 - ref aligned' already exists
setwd("2 - ref aligned")
First, we will examine just the reference samples, to see if the alignment looks suitable. We will run UMAP using the new aligned data.
ref.sub <- do.filter(cell.dat, sample.col, refs)
ref.sub
## PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48
## <num> <num> <num> <num> <num>
## 1: 51.64030 1239.360 -681.2570 1752.270 2600.430
## 2: 58.88240 -140.921 -267.0390 1188.640 392.962
## 3: 60.00770 2191.260 -362.0880 1676.960 1441.300
## 4: 313.95800 2532.700 -438.3880 478.109 1809.220
## 5: -83.03910 1603.390 -1200.3500 564.994 3946.200
## ---
## 19996: -30.65640 158.361 -1988.0000 5905.010 8831.280
## 19997: 148.32100 5819.820 15918.1000 2497.000 751.419
## 19998: 669.96300 3373.540 -33.8201 1144.100 1320.440
## 19999: 320.50700 4386.740 7465.1800 2107.970 -134.778
## 20000: -9.93316 5929.620 11076.1000 1861.070 267.681
## BUV395 CD11b BUV737 B220 BV605 Ly6C FileName FileNo PECy5-5 CD3e_asinh
## <num> <num> <num> <fctr> <int> <num>
## 1: 627.2590 17763.6000 128.934 Mock_01_A 1 0.10309786
## 2: 304.6150 11834.8000 907.365 Mock_01_A 1 0.11749428
## 3: 4740.7100 185.9470 3915.820 Mock_01_A 1 0.11972914
## 4: 9440.3800 1056.2900 6221.670 Mock_01_A 1 0.59261483
## 5: 1636.1100 4579.8700 274.527 Mock_01_A 1 -0.16532406
## ---
## 19996: -77.0832 9662.6800 568.947 Mock_05_B 5 -0.06127445
## 19997: 13401.7000 -91.9106 2209.480 Mock_05_B 5 0.29245518
## 19998: 4762.0700 781.4270 4849.040 Mock_05_B 5 1.10256164
## 19999: 15413.4000 493.2230 1767.920 Mock_05_B 5 0.60367460
## 20000: 11084.1000 50.4877 1180.210 Mock_05_B 5 -0.01986501
## PECy7 CD16-32_asinh DL800 Ly6G_asinh AF700 CD45_asinh APCCy7 CD48_asinh
## <num> <num> <num> <num>
## 1: 1.6392988 -1.11599889 1.9669670 2.3510881
## 2: -0.2782380 -0.51148258 1.6006640 0.7216390
## 3: 2.1835405 -0.67235984 1.9247953 1.7806527
## 4: 2.3251843 -0.79150471 0.8500738 1.9977594
## 5: 1.8818839 -1.60970711 0.9703694 2.7630370
## ---
## 19996: 0.3116525 -2.08887520 3.1638831 3.5653947
## 19997: 3.1494039 4.15399776 2.3112610 1.1963364
## 19998: 2.6077040 -0.06758873 1.5655634 1.6983185
## 19999: 2.8681129 3.39766352 2.1457973 -0.2663940
## 20000: 3.1680273 3.79159306 2.0250211 0.5126149
## BUV395 CD11b_asinh BUV737 B220_asinh BV605 Ly6C_asinh Sample Group
## <num> <num> <num> <char> <char>
## 1: 1.0504123 4.2636438 0.2550924 Mock_01_A Mock
## 2: 0.5767234 3.8577846 1.3575720 Mock_01_A Mock
## 3: 2.9452507 0.3638149 2.7553704 Mock_01_A Mock
## 4: 3.6319912 1.4928782 3.2159434 Mock_01_A Mock
## 5: 1.9011864 2.9109315 0.5246514 Mock_01_A Mock
## ---
## 19996: -0.1535622 3.6552341 0.9755986 Mock_05_B Mock
## 19997: 3.9820237 -0.1828014 2.1916151 Mock_05_B Mock
## 19998: 2.9497215 1.2291302 2.9677227 Mock_05_B Mock
## 19999: 4.1217946 0.8717569 1.9755202 Mock_05_B Mock
## 20000: 3.7923143 0.1008046 1.5941070 Mock_05_B Mock
## Batch PECy5-5 CD3e_asinh_aligned PECy7 CD16-32_asinh_aligned
## <char> <num> <num>
## 1: A 0.07550371 1.6243541
## 2: A 0.09086002 -0.4109158
## 3: A 0.19667186 2.3359835
## 4: A 0.78801227 2.4700482
## 5: A -0.20452863 1.8813063
## ---
## 19996: B -0.02770719 0.3994900
## 19997: B 0.25879997 3.0681801
## 19998: B 0.89016247 2.4627149
## 19999: B 0.56146991 2.7699685
## 20000: B -0.04108953 3.0901837
## DL800 Ly6G_asinh_aligned AF700 CD45_asinh_aligned
## <num> <num>
## 1: -0.9853218 2.260301
## 2: -0.5260243 1.783121
## 3: -0.5778462 1.962701
## 4: -0.6863875 1.103076
## 5: -1.3787018 1.213712
## ---
## 19996: -3.0485964 2.892127
## 19997: 4.0582747 2.050607
## 19998: -0.1113533 1.328649
## 19999: 3.2538199 1.860001
## 20000: 3.6525168 1.720933
## APCCy7 CD48_asinh_aligned BUV395 CD11b_asinh_aligned
## <num> <num>
## 1: 2.5360262 0.45637840
## 2: 1.1129812 0.01138196
## 3: 1.8322810 2.99106073
## 4: 2.0085123 3.73671436
## 5: 2.9657056 1.33816469
## ---
## 19996: 3.3363132 0.44203448
## 19997: 1.0153860 3.88138986
## 19998: 1.5865602 2.91923833
## 19999: -0.4603302 4.03672075
## 20000: 0.2559606 3.69683123
## BUV737 B220_asinh_aligned BV605 Ly6C_asinh_aligned Alignment_MC_aligned
## <num> <num> <num>
## 1: 4.10898304 0.1759336 1
## 2: 3.71863341 1.2138813 1
## 3: 0.49721161 2.6361940 5
## 4: 1.55262542 3.0890400 5
## 5: 2.83306122 0.4209961 1
## ---
## 19996: 3.82389307 1.1142409 1
## 19997: -0.20893200 2.4836495 4
## 19998: 1.05263329 3.0943832 5
## 19999: 0.82943606 2.2702370 4
## 20000: 0.07291784 1.8810067 4
ref.sub <- do.subsample(ref.sub, 20000)
ref.sub <- run.umap(ref.sub, use.cols = aligned.cols)
## [2024-05-23 15:43:49.365568] starting umap
## [2024-05-23 15:43:49.379667] creating graph of nearest neighbors
## [2024-05-23 15:44:15.804667] creating initial embedding
## [2024-05-23 15:44:17.118676] optimizing embedding
## [2024-05-23 15:44:39.803111] done
Plotting the data by batch reveals that the two batches are now well integrated with eachother.
make.colour.plot(ref.sub, 'UMAP_X', 'UMAP_Y', batch.col, 'factor')
We can also plot the data by metacluster number, to confirm that they look suitable (these are the metaclusters created when we made the alignment model).
make.colour.plot(ref.sub, 'UMAP_X', 'UMAP_Y', 'Alignment_MC_aligned', 'factor', add.label = TRUE)
We can also colour the data by the exact samples (should be one per batch).
make.colour.plot(ref.sub, 'UMAP_X', 'UMAP_Y', sample.col, 'factor')
As a sanity check, we can also check the experimental groups that these cells come from. The reference samples should ideally all be from a single experimental group, so if there are more than one group in the plot below, there may be a problem with the setup.
make.colour.plot(ref.sub, 'UMAP_X', 'UMAP_Y', group.col, 'factor')
If the alignment looks suitable, we can then run UMAP and make plots using a subset of the whole dataset.
### Plotting all data
setwd(OutputDirectory)
setwd("Output 2 - alignment")
dir.create("3 - all aligned")
## Warning in dir.create("3 - all aligned"): '3 - all aligned' already exists
setwd("3 - all aligned")
aligned.sub <- do.subsample(cell.dat, 50000)
aligned.sub <- run.umap(aligned.sub, use.cols = aligned.cols)
## [2024-05-23 15:44:43.279394] starting umap
## [2024-05-23 15:44:43.293578] creating graph of nearest neighbors
## [2024-05-23 15:46:01.775045] creating initial embedding
## [2024-05-23 15:46:05.024758] optimizing embedding
## [2024-05-23 15:47:10.139803] done
Firstly, we can examine the distribution of cells from each batch.
make.colour.plot(aligned.sub, 'UMAP_X', 'UMAP_Y', batch.col, 'factor')
We can also check the metacluster assignments.
make.colour.plot(aligned.sub, 'UMAP_X', 'UMAP_Y', 'Alignment_MC_aligned', 'factor', add.label = TRUE)
We can then check the distribution of cells from each sample, and each group. In this case the different distribution of cells in each experimental group reflects biologicall relevant changes, and not batch effects.
make.colour.plot(aligned.sub, 'UMAP_X', 'UMAP_Y', sample.col, 'factor')
make.colour.plot(aligned.sub, 'UMAP_X', 'UMAP_Y', group.col, 'factor')
For checking later, we can save a copy of the subsetted data to disk as well.
fwrite(aligned.sub, 'aligned.sub.csv')
Set a directory for outputs.
setwd(OutputDirectory)
dir.create("Output 3 - clustering")
## Warning in dir.create("Output 3 - clustering"): 'Output 3 - clustering' already
## exists
setwd("Output 3 - clustering")
Adjust our saved column names so that we can perform clustering on the new ‘aligned’ data.
### Re-set cellular and clustering cols
aligned.cellular.cols <- paste0(cellular.cols, '_aligned')
aligned.cellular.cols
## [1] "PECy5-5 CD3e_asinh_aligned" "PECy7 CD16-32_asinh_aligned"
## [3] "DL800 Ly6G_asinh_aligned" "AF700 CD45_asinh_aligned"
## [5] "APCCy7 CD48_asinh_aligned" "BUV395 CD11b_asinh_aligned"
## [7] "BUV737 B220_asinh_aligned" "BV605 Ly6C_asinh_aligned"
aligned.cluster.cols <- paste0(cluster.cols, '_aligned')
aligned.cluster.cols
## [1] "PECy5-5 CD3e_asinh_aligned" "PECy7 CD16-32_asinh_aligned"
## [3] "DL800 Ly6G_asinh_aligned" "AF700 CD45_asinh_aligned"
## [5] "APCCy7 CD48_asinh_aligned" "BUV395 CD11b_asinh_aligned"
## [7] "BUV737 B220_asinh_aligned" "BV605 Ly6C_asinh_aligned"
We can run clustering using the run.flowsom function. In this case we can define the number of desired metaclusters manually, with the meta.k argument (in this case we have chosen 30). This can be increased or decreased as required. Typically, overclustering is preferred, as multiple clusters that represent a single cellular population can always be annotated as such. Subsequently, we can write the clustered dataset to disk.
### Clustering
cell.dat <- run.flowsom(cell.dat, aligned.cluster.cols, meta.k = 30)
## Preparing data
## Starting FlowSOM
## Building SOM
## Mapping data to SOM
## Building MST
## Binding metacluster labels to starting dataset
## Binding cluster labels to starting dataset
fwrite(cell.dat, "clustered.data.csv")
We can then run dimensionality reduction on a subset of the data, allow us to visualise the data and resulting clusters. In this case we have used run.umap, though other options are available, including run.fitsne and run.tsne. As before, this subsampled dataset with DR coordinates is saved to disk.
### Dimensionality reduction
cell.sub <- do.subsample(cell.dat, sub.targets, group.col)
cell.sub <- run.umap(cell.sub, aligned.cluster.cols)
## [2024-05-23 15:47:32.324913] starting umap
## [2024-05-23 15:47:32.341717] creating graph of nearest neighbors
## [2024-05-23 15:47:57.407] creating initial embedding
## [2024-05-23 15:47:58.565374] optimizing embedding
## [2024-05-23 15:48:22.906857] done
fwrite(cell.sub, "clustered.data.DR.csv")
We can visualise the DR data to asses which clusters represent cellular populations
### DR plots
make.colour.plot(cell.sub, "UMAP_X", "UMAP_Y", "FlowSOM_metacluster", col.type = 'factor', add.label = TRUE)
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_label()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_label()`).
make.multi.plot(cell.sub, "UMAP_X", "UMAP_Y", aligned.cellular.cols)
## Check your working directory for a new .png called 'Multi plot.png'
We can also generate some multi plots to compare between experimental groups or batches.
make.multi.plot(cell.sub, "UMAP_X", "UMAP_Y", "FlowSOM_metacluster", group.col, col.type = 'factor')
## Check your working directory for a new .png called 'Multi plot divided by Group.png'
We can also produce expression heatmaps to help guide our interpretation of cluster identities.
### Expression heatmap
exp <- do.aggregate(cell.dat, aligned.cellular.cols, by = "FlowSOM_metacluster")
make.pheatmap(exp, "FlowSOM_metacluster", aligned.cellular.cols)
Set an output directory.
setwd(OutputDirectory)
dir.create("Output 4 - annotation")
## Warning in dir.create("Output 4 - annotation"): 'Output 4 - annotation' already
## exists
setwd("Output 4 - annotation")
Review the cluster labels and marker expression patterns, so you can annotate the clusters. This annotation is optional, as all subsequent steps can be performed on the ‘clusters’ instead of the ‘populations’. Here we can create a list of population names, and then specify which clusters make up that population (e.g. Immature B cells are contained within cluster ‘6’).
### Annotate
annots <- list("Mature neutrophils" = c(24,25,23,27,30),
"Immature neutrophils" = c(22),
"Monocytes" = c(29,26),
"T cells" = c(2,1,8,7),
"Mature B cells" = c(3,4,9,5,10,13),
"Immature B cells" = c(6)
)
Once the annotation list is created, we can switch the list into a table format to annotate our data.
annots <- do.list.switch(annots)
names(annots) <- c("Values", "Population")
setorderv(annots, 'Values')
annots
## Values Population
## <num> <char>
## 1: 1 T cells
## 2: 2 T cells
## 3: 3 Mature B cells
## 4: 4 Mature B cells
## 5: 5 Mature B cells
## 6: 6 Immature B cells
## 7: 7 T cells
## 8: 8 T cells
## 9: 9 Mature B cells
## 10: 10 Mature B cells
## 11: 13 Mature B cells
## 12: 22 Immature neutrophils
## 13: 23 Mature neutrophils
## 14: 24 Mature neutrophils
## 15: 25 Mature neutrophils
## 16: 26 Monocytes
## 17: 27 Mature neutrophils
## 18: 29 Monocytes
## 19: 30 Mature neutrophils
Using the do.add.cols function, we can add the population names to the corresponding clusters.
### Add annotations
cell.dat <- do.add.cols(cell.dat, "FlowSOM_metacluster", annots, "Values")
## Step 1/3. Mapping data
## Step 2/3. Merging data
## Step 3/3. Returning data
cell.dat
## PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48
## <num> <num> <num> <num> <num>
## 1: 51.6403 1239.3600 -681.257 1752.270 2600.430
## 2: 58.8824 -140.9210 -267.039 1188.640 392.962
## 3: 60.0077 2191.2600 -362.088 1676.960 1441.300
## 4: 313.9580 2532.7000 -438.388 478.109 1809.220
## 5: -83.0391 1603.3900 -1200.350 564.994 3946.200
## ---
## 159996: 106.3880 1661.1200 -114.892 2138.700 8311.230
## 159997: 1750.2300 65.4275 -843.984 19655.900 7874.900
## 159998: -159.2870 5363.3900 17389.300 2578.650 -362.093
## 159999: -62.6836 3132.7100 1941.660 1548.130 577.625
## 160000: 650.9440 1991.4900 254.429 1677.730 2209.210
## BUV395 CD11b BUV737 B220 BV605 Ly6C FileName FileNo
## <num> <num> <num> <fctr> <int>
## 1: 627.259 17763.600 128.934 Mock_01_A 1
## 2: 304.615 11834.800 907.365 Mock_01_A 1
## 3: 4740.710 185.947 3915.820 Mock_01_A 1
## 4: 9440.380 1056.290 6221.670 Mock_01_A 1
## 5: 1636.110 4579.870 274.527 Mock_01_A 1
## ---
## 159996: 127.068 258.936 6161.070 Virus_08_B 16
## 159997: 122.179 550.596 132.445 Virus_08_B 16
## 159998: 12091.600 121.659 4257.980 Virus_08_B 16
## 159999: 12904.000 338.244 1030.080 Virus_08_B 16
## 160000: 22242.300 789.639 12467.000 Virus_08_B 16
## PECy5-5 CD3e_asinh PECy7 CD16-32_asinh DL800 Ly6G_asinh
## <num> <num> <num>
## 1: 0.1030979 1.6392988 -1.1159989
## 2: 0.1174943 -0.2782380 -0.5114826
## 3: 0.1197291 2.1835405 -0.6723598
## 4: 0.5926148 2.3251843 -0.7915047
## 5: -0.1653241 1.8818839 -1.6097071
## ---
## 159996: 0.2112023 1.9157040 -0.2278085
## 159997: 1.9658468 0.1304844 -1.2947039
## 159998: -0.3134175 3.0680563 4.2423561
## 159999: -0.1250411 2.5345014 2.0660180
## 160000: 1.0796017 2.0905763 0.4891206
## AF700 CD45_asinh APCCy7 CD48_asinh BUV395 CD11b_asinh BUV737 B220_asinh
## <num> <num> <num> <num>
## 1: 1.9669670 2.3510881 1.0504123 4.2636438
## 2: 1.6006640 0.7216390 0.5767234 3.8577846
## 3: 1.9247953 1.7806527 2.9452507 0.3638149
## 4: 0.8500738 1.9977594 3.6319912 1.4928782
## 5: 0.9703694 2.7630370 1.9011864 2.9109315
## ---
## 159996: 2.1598848 3.5048055 0.2514770 0.4971396
## 159997: 4.3648336 3.4509812 0.2419893 0.9511485
## 159998: 2.3428298 -0.6723679 3.8792326 0.2409789
## 159999: 1.8484549 0.9870066 3.9442068 0.6332984
## 160000: 1.9252352 2.1914959 4.4884166 1.2379493
## BV605 Ly6C_asinh Sample Group Batch PECy5-5 CD3e_asinh_aligned
## <num> <char> <char> <char> <num>
## 1: 0.2550924 Mock_01_A Mock A 0.07550371
## 2: 1.3575720 Mock_01_A Mock A 0.09086002
## 3: 2.7553704 Mock_01_A Mock A 0.19667186
## 4: 3.2159434 Mock_01_A Mock A 0.78801227
## 5: 0.5246514 Mock_01_A Mock A -0.20452863
## ---
## 159996: 3.2061873 WNV_08_B Virus B 0.20014960
## 159997: 0.2618862 WNV_08_B Virus B 1.89280009
## 159998: 2.8385188 WNV_08_B Virus B -0.31652537
## 159999: 1.4702206 WNV_08_B Virus B -0.13956173
## 160000: 3.9097814 WNV_08_B Virus B 1.00749552
## PECy7 CD16-32_asinh_aligned DL800 Ly6G_asinh_aligned
## <num> <num>
## 1: 1.62435412 -0.9853218
## 2: -0.41091576 -0.5260243
## 3: 2.33598351 -0.5778462
## 4: 2.47004819 -0.6863875
## 5: 1.88130629 -1.3787018
## ---
## 159996: 1.92336309 -0.1570388
## 159997: 0.02324442 -1.8558453
## 159998: 2.97609425 4.1403537
## 159999: 2.39929986 2.0580065
## 160000: 2.13440585 0.5363485
## AF700 CD45_asinh_aligned APCCy7 CD48_asinh_aligned
## <num> <num>
## 1: 2.260301 2.5360262
## 2: 1.783121 1.1129812
## 3: 1.962701 1.8322810
## 4: 1.103076 2.0085123
## 5: 1.213712 2.9657056
## ---
## 159996: 1.971623 3.3322656
## 159997: 4.071003 3.1807175
## 159998: 2.090997 -0.8144654
## 159999: 1.537023 0.7848281
## 160000: 1.755526 2.0961668
## BUV395 CD11b_asinh_aligned BUV737 B220_asinh_aligned
## <num> <num>
## 1: 0.45637840 4.1089830
## 2: 0.01138196 3.7186334
## 3: 2.99106073 0.4972116
## 4: 3.73671436 1.5526254
## 5: 1.33816469 2.8330612
## ---
## 159996: 0.88995486 0.4703643
## 159997: 0.77705491 0.8439717
## 159998: 3.77656460 0.2098913
## 159999: 3.83918405 0.5993028
## 160000: 4.56159830 1.2319804
## BV605 Ly6C_asinh_aligned Alignment_MC_aligned FlowSOM_cluster
## <num> <num> <num>
## 1: 0.1759336 1 12
## 2: 1.2138813 1 8
## 3: 2.6361940 5 89
## 4: 3.0890400 5 102
## 5: 0.4209961 1 14
## ---
## 159996: 3.3719137 8 100
## 159997: 0.3941718 9 43
## 159998: 3.1250730 4 154
## 159999: 1.7465904 4 162
## 160000: 4.1770687 8 130
## FlowSOM_metacluster Population
## <fctr> <char>
## 1: 5 Mature B cells
## 2: 5 Mature B cells
## 3: 22 Immature neutrophils
## 4: 22 Immature neutrophils
## 5: 6 Immature B cells
## ---
## 159996: 21 <NA>
## 159997: 7 T cells
## 159998: 25 Mature neutrophils
## 159999: 25 Mature neutrophils
## 160000: 22 Immature neutrophils
cell.sub <- do.add.cols(cell.sub, "FlowSOM_metacluster", annots, "Values")
## Step 1/3. Mapping data
## Step 2/3. Merging data
## Step 3/3. Returning data
cell.sub
## PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48
## <num> <num> <num> <num> <num>
## 1: -116.3130 2099.3600 -52.1773 633.551 -126.840
## 2: 493.0460 3853.7100 230.9840 5479.640 3270.230
## 3: -17.6281 5875.5000 12478.6000 2462.980 448.307
## 4: 121.5190 -40.2623 -250.6370 1289.630 6305.760
## 5: 256.6870 3783.4100 12428.0000 2114.770 415.951
## ---
## 19996: -48.2851 357.1330 -671.5610 1880.030 4578.660
## 19997: 205.7200 2776.6000 64.1222 2991.120 2186.440
## 19998: -298.2410 10829.8000 936.0370 4009.910 4486.420
## 19999: 29.3559 2172.4700 -1229.2500 974.453 2179.840
## 20000: 236.7730 13589.3000 449.5360 4660.120 7047.160
## BUV395 CD11b BUV737 B220 BV605 Ly6C FileName FileNo PECy5-5 CD3e_asinh
## <num> <num> <num> <fctr> <int> <num>
## 1: 21323.700 462.1640 2659.420 Mock_07_B 7 -0.23057742
## 2: 6902.790 111.1600 176.694 Mock_06_B 6 0.87150487
## 3: 23718.200 56.5674 2174.620 Mock_08_B 8 -0.03524890
## 4: 81.388 5606.4300 276.940 Mock_05_B 5 0.24070684
## 5: 20976.300 357.1200 1481.540 Mock_05_B 5 0.49314179
## ---
## 19996: 1735.620 3867.0800 2118.510 Virus_02_A 10 -0.09642073
## 19997: 6004.450 547.5590 2535.700 Virus_06_B 14 0.40063603
## 19998: 11029.800 188.3320 7611.190 Virus_05_B 13 -0.56580590
## 19999: 551.834 101.0540 1712.870 Virus_02_A 10 0.05867812
## 20000: 18265.900 182.9020 26457.600 Virus_01_A 9 0.45742638
## PECy7 CD16-32_asinh DL800 Ly6G_asinh AF700 CD45_asinh APCCy7 CD48_asinh
## <num> <num> <num> <num>
## 1: 2.14181539 -0.1041661 1.058232 -0.2510350
## 2: 2.73951281 0.4469390 3.089409 2.5769483
## 3: 3.15889109 3.9107107 2.297814 0.8063480
## 4: -0.08043783 -0.4823510 1.676272 3.2293260
## 5: 2.72125805 3.9066508 2.148931 0.7573862
## ---
## 19996: 0.66431456 -1.1044717 2.034813 2.9106688
## 19997: 2.41553155 0.1278954 2.488856 2.1813937
## 19998: 3.76912842 1.3849166 2.778928 2.8904402
## 19999: 2.17514614 -1.6317072 1.420549 2.1784468
## 20000: 3.99591535 0.8081770 2.928201 3.3401752
## BUV395 CD11b_asinh BUV737 B220_asinh BV605 Ly6C_asinh Sample Group
## <num> <num> <num> <char> <char>
## 1: 4.4462509 0.8268408 2.3731246 Mock_07_B Mock
## 2: 3.3195292 0.2205282 0.3464177 Mock_06_B Mock
## 3: 4.5526481 0.1128948 2.1761101 Mock_08_B Mock
## 4: 0.1620656 3.1121910 0.5288774 Mock_05_B Mock
## 5: 4.4298296 0.6642934 1.8067062 Mock_05_B Mock
## ---
## 19996: 1.9577893 2.7429475 2.1506509 WNV_02_A Virus
## 19997: 3.1805243 0.9470589 2.3263458 WNV_06_B Virus
## 19998: 3.7874084 0.3682822 3.4169910 WNV_05_B Virus
## 19999: 0.9528120 0.2007568 1.9451170 WNV_02_A Virus
## 20000: 4.2915176 0.3581012 4.6619271 WNV_01_A Virus
## Batch PECy5-5 CD3e_asinh_aligned PECy7 CD16-32_asinh_aligned
## <char> <num> <num>
## 1: B -0.29369602 2.00385475
## 2: B 0.73757279 2.71793723
## 3: B -0.05559657 3.07941747
## 4: B 0.26125368 0.02720127
## 5: B 0.43726748 2.60483789
## ---
## 19996: A -0.15290843 0.56069624
## 19997: B 0.31178620 2.25823164
## 19998: B -0.90997732 3.74038410
## 19999: A 0.06718409 2.29398131
## 20000: A 0.47635502 3.96355510
## DL800 Ly6G_asinh_aligned AF700 CD45_asinh_aligned
## <num> <num>
## 1: -0.1480608 0.7780063
## 2: 0.3374757 2.8061771
## 3: 3.7818818 2.0387959
## 4: -0.4421591 1.4738377
## 5: 3.7771759 1.8637621
## ---
## 19996: -1.0745652 2.3543241
## 19997: 0.1235801 2.6152031
## 19998: 1.3456432 2.6467636
## 19999: -1.6143754 1.5896653
## 20000: 0.7784286 3.0210304
## APCCy7 CD48_asinh_aligned BUV395 CD11b_asinh_aligned
## <num> <num>
## 1: -0.6031739 4.3643961
## 2: 2.4243431 3.1393259
## 3: 0.5808588 4.5825343
## 4: 3.0247295 0.7374537
## 5: 0.5254776 4.4549651
## ---
## 19996: 3.1603029 1.3455793
## 19997: 2.2317555 3.1402769
## 19998: 2.7106078 3.9415228
## 19999: 2.3983061 0.3903685
## 20000: 3.5182674 4.1516290
## BUV737 B220_asinh_aligned BV605 Ly6C_asinh_aligned Alignment_MC_aligned
## <num> <num> <num>
## 1: 0.67665219 2.5046093 5
## 2: 0.22383082 0.3302887 10
## 3: 0.08498045 2.4683566 4
## 4: 3.18983173 0.6547109 1
## 5: 0.62932587 2.1033373 4
## ---
## 19996: 2.86870980 2.0855556 2
## 19997: 0.78986788 2.4517956 5
## 19998: 0.33557367 3.6496732 8
## 19999: 0.26035574 2.1166730 6
## 20000: 0.38186085 4.3858743 8
## FlowSOM_cluster FlowSOM_metacluster UMAP_X UMAP_Y
## <num> <fctr> <num> <num>
## 1: 119 22 1.924202 2.700375
## 2: 75 18 4.288721 -2.231453
## 3: 179 25 -4.118332 7.331559
## 4: 70 6 -5.826405 -6.053842
## 5: 180 25 -4.021310 6.415935
## ---
## 19996: 32 9 -0.907486 -7.260880
## 19997: 116 22 3.595872 1.210391
## 19998: 173 27 6.418618 2.405232
## 19999: 87 14 3.245920 -1.075086
## 20000: 185 26 6.894060 2.271706
## Population
## <char>
## 1: Immature neutrophils
## 2: <NA>
## 3: Mature neutrophils
## 4: Immature B cells
## 5: Mature neutrophils
## ---
## 19996: Mature B cells
## 19997: Immature neutrophils
## 19998: Mature neutrophils
## 19999: <NA>
## 20000: Monocytes
Any unannotated clusters will then gain the label ‘Other’.
### Fill in NAs
cell.dat[['Population']][is.na(cell.dat[, 'Population'])] <- 'Other'
cell.dat
## PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48
## <num> <num> <num> <num> <num>
## 1: 51.6403 1239.3600 -681.257 1752.270 2600.430
## 2: 58.8824 -140.9210 -267.039 1188.640 392.962
## 3: 60.0077 2191.2600 -362.088 1676.960 1441.300
## 4: 313.9580 2532.7000 -438.388 478.109 1809.220
## 5: -83.0391 1603.3900 -1200.350 564.994 3946.200
## ---
## 159996: 106.3880 1661.1200 -114.892 2138.700 8311.230
## 159997: 1750.2300 65.4275 -843.984 19655.900 7874.900
## 159998: -159.2870 5363.3900 17389.300 2578.650 -362.093
## 159999: -62.6836 3132.7100 1941.660 1548.130 577.625
## 160000: 650.9440 1991.4900 254.429 1677.730 2209.210
## BUV395 CD11b BUV737 B220 BV605 Ly6C FileName FileNo
## <num> <num> <num> <fctr> <int>
## 1: 627.259 17763.600 128.934 Mock_01_A 1
## 2: 304.615 11834.800 907.365 Mock_01_A 1
## 3: 4740.710 185.947 3915.820 Mock_01_A 1
## 4: 9440.380 1056.290 6221.670 Mock_01_A 1
## 5: 1636.110 4579.870 274.527 Mock_01_A 1
## ---
## 159996: 127.068 258.936 6161.070 Virus_08_B 16
## 159997: 122.179 550.596 132.445 Virus_08_B 16
## 159998: 12091.600 121.659 4257.980 Virus_08_B 16
## 159999: 12904.000 338.244 1030.080 Virus_08_B 16
## 160000: 22242.300 789.639 12467.000 Virus_08_B 16
## PECy5-5 CD3e_asinh PECy7 CD16-32_asinh DL800 Ly6G_asinh
## <num> <num> <num>
## 1: 0.1030979 1.6392988 -1.1159989
## 2: 0.1174943 -0.2782380 -0.5114826
## 3: 0.1197291 2.1835405 -0.6723598
## 4: 0.5926148 2.3251843 -0.7915047
## 5: -0.1653241 1.8818839 -1.6097071
## ---
## 159996: 0.2112023 1.9157040 -0.2278085
## 159997: 1.9658468 0.1304844 -1.2947039
## 159998: -0.3134175 3.0680563 4.2423561
## 159999: -0.1250411 2.5345014 2.0660180
## 160000: 1.0796017 2.0905763 0.4891206
## AF700 CD45_asinh APCCy7 CD48_asinh BUV395 CD11b_asinh BUV737 B220_asinh
## <num> <num> <num> <num>
## 1: 1.9669670 2.3510881 1.0504123 4.2636438
## 2: 1.6006640 0.7216390 0.5767234 3.8577846
## 3: 1.9247953 1.7806527 2.9452507 0.3638149
## 4: 0.8500738 1.9977594 3.6319912 1.4928782
## 5: 0.9703694 2.7630370 1.9011864 2.9109315
## ---
## 159996: 2.1598848 3.5048055 0.2514770 0.4971396
## 159997: 4.3648336 3.4509812 0.2419893 0.9511485
## 159998: 2.3428298 -0.6723679 3.8792326 0.2409789
## 159999: 1.8484549 0.9870066 3.9442068 0.6332984
## 160000: 1.9252352 2.1914959 4.4884166 1.2379493
## BV605 Ly6C_asinh Sample Group Batch PECy5-5 CD3e_asinh_aligned
## <num> <char> <char> <char> <num>
## 1: 0.2550924 Mock_01_A Mock A 0.07550371
## 2: 1.3575720 Mock_01_A Mock A 0.09086002
## 3: 2.7553704 Mock_01_A Mock A 0.19667186
## 4: 3.2159434 Mock_01_A Mock A 0.78801227
## 5: 0.5246514 Mock_01_A Mock A -0.20452863
## ---
## 159996: 3.2061873 WNV_08_B Virus B 0.20014960
## 159997: 0.2618862 WNV_08_B Virus B 1.89280009
## 159998: 2.8385188 WNV_08_B Virus B -0.31652537
## 159999: 1.4702206 WNV_08_B Virus B -0.13956173
## 160000: 3.9097814 WNV_08_B Virus B 1.00749552
## PECy7 CD16-32_asinh_aligned DL800 Ly6G_asinh_aligned
## <num> <num>
## 1: 1.62435412 -0.9853218
## 2: -0.41091576 -0.5260243
## 3: 2.33598351 -0.5778462
## 4: 2.47004819 -0.6863875
## 5: 1.88130629 -1.3787018
## ---
## 159996: 1.92336309 -0.1570388
## 159997: 0.02324442 -1.8558453
## 159998: 2.97609425 4.1403537
## 159999: 2.39929986 2.0580065
## 160000: 2.13440585 0.5363485
## AF700 CD45_asinh_aligned APCCy7 CD48_asinh_aligned
## <num> <num>
## 1: 2.260301 2.5360262
## 2: 1.783121 1.1129812
## 3: 1.962701 1.8322810
## 4: 1.103076 2.0085123
## 5: 1.213712 2.9657056
## ---
## 159996: 1.971623 3.3322656
## 159997: 4.071003 3.1807175
## 159998: 2.090997 -0.8144654
## 159999: 1.537023 0.7848281
## 160000: 1.755526 2.0961668
## BUV395 CD11b_asinh_aligned BUV737 B220_asinh_aligned
## <num> <num>
## 1: 0.45637840 4.1089830
## 2: 0.01138196 3.7186334
## 3: 2.99106073 0.4972116
## 4: 3.73671436 1.5526254
## 5: 1.33816469 2.8330612
## ---
## 159996: 0.88995486 0.4703643
## 159997: 0.77705491 0.8439717
## 159998: 3.77656460 0.2098913
## 159999: 3.83918405 0.5993028
## 160000: 4.56159830 1.2319804
## BV605 Ly6C_asinh_aligned Alignment_MC_aligned FlowSOM_cluster
## <num> <num> <num>
## 1: 0.1759336 1 12
## 2: 1.2138813 1 8
## 3: 2.6361940 5 89
## 4: 3.0890400 5 102
## 5: 0.4209961 1 14
## ---
## 159996: 3.3719137 8 100
## 159997: 0.3941718 9 43
## 159998: 3.1250730 4 154
## 159999: 1.7465904 4 162
## 160000: 4.1770687 8 130
## FlowSOM_metacluster Population
## <fctr> <char>
## 1: 5 Mature B cells
## 2: 5 Mature B cells
## 3: 22 Immature neutrophils
## 4: 22 Immature neutrophils
## 5: 6 Immature B cells
## ---
## 159996: 21 Other
## 159997: 7 T cells
## 159998: 25 Mature neutrophils
## 159999: 25 Mature neutrophils
## 160000: 22 Immature neutrophils
cell.sub[['Population']][is.na(cell.sub[, 'Population'])] <- 'Other'
cell.sub
## PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48
## <num> <num> <num> <num> <num>
## 1: -116.3130 2099.3600 -52.1773 633.551 -126.840
## 2: 493.0460 3853.7100 230.9840 5479.640 3270.230
## 3: -17.6281 5875.5000 12478.6000 2462.980 448.307
## 4: 121.5190 -40.2623 -250.6370 1289.630 6305.760
## 5: 256.6870 3783.4100 12428.0000 2114.770 415.951
## ---
## 19996: -48.2851 357.1330 -671.5610 1880.030 4578.660
## 19997: 205.7200 2776.6000 64.1222 2991.120 2186.440
## 19998: -298.2410 10829.8000 936.0370 4009.910 4486.420
## 19999: 29.3559 2172.4700 -1229.2500 974.453 2179.840
## 20000: 236.7730 13589.3000 449.5360 4660.120 7047.160
## BUV395 CD11b BUV737 B220 BV605 Ly6C FileName FileNo PECy5-5 CD3e_asinh
## <num> <num> <num> <fctr> <int> <num>
## 1: 21323.700 462.1640 2659.420 Mock_07_B 7 -0.23057742
## 2: 6902.790 111.1600 176.694 Mock_06_B 6 0.87150487
## 3: 23718.200 56.5674 2174.620 Mock_08_B 8 -0.03524890
## 4: 81.388 5606.4300 276.940 Mock_05_B 5 0.24070684
## 5: 20976.300 357.1200 1481.540 Mock_05_B 5 0.49314179
## ---
## 19996: 1735.620 3867.0800 2118.510 Virus_02_A 10 -0.09642073
## 19997: 6004.450 547.5590 2535.700 Virus_06_B 14 0.40063603
## 19998: 11029.800 188.3320 7611.190 Virus_05_B 13 -0.56580590
## 19999: 551.834 101.0540 1712.870 Virus_02_A 10 0.05867812
## 20000: 18265.900 182.9020 26457.600 Virus_01_A 9 0.45742638
## PECy7 CD16-32_asinh DL800 Ly6G_asinh AF700 CD45_asinh APCCy7 CD48_asinh
## <num> <num> <num> <num>
## 1: 2.14181539 -0.1041661 1.058232 -0.2510350
## 2: 2.73951281 0.4469390 3.089409 2.5769483
## 3: 3.15889109 3.9107107 2.297814 0.8063480
## 4: -0.08043783 -0.4823510 1.676272 3.2293260
## 5: 2.72125805 3.9066508 2.148931 0.7573862
## ---
## 19996: 0.66431456 -1.1044717 2.034813 2.9106688
## 19997: 2.41553155 0.1278954 2.488856 2.1813937
## 19998: 3.76912842 1.3849166 2.778928 2.8904402
## 19999: 2.17514614 -1.6317072 1.420549 2.1784468
## 20000: 3.99591535 0.8081770 2.928201 3.3401752
## BUV395 CD11b_asinh BUV737 B220_asinh BV605 Ly6C_asinh Sample Group
## <num> <num> <num> <char> <char>
## 1: 4.4462509 0.8268408 2.3731246 Mock_07_B Mock
## 2: 3.3195292 0.2205282 0.3464177 Mock_06_B Mock
## 3: 4.5526481 0.1128948 2.1761101 Mock_08_B Mock
## 4: 0.1620656 3.1121910 0.5288774 Mock_05_B Mock
## 5: 4.4298296 0.6642934 1.8067062 Mock_05_B Mock
## ---
## 19996: 1.9577893 2.7429475 2.1506509 WNV_02_A Virus
## 19997: 3.1805243 0.9470589 2.3263458 WNV_06_B Virus
## 19998: 3.7874084 0.3682822 3.4169910 WNV_05_B Virus
## 19999: 0.9528120 0.2007568 1.9451170 WNV_02_A Virus
## 20000: 4.2915176 0.3581012 4.6619271 WNV_01_A Virus
## Batch PECy5-5 CD3e_asinh_aligned PECy7 CD16-32_asinh_aligned
## <char> <num> <num>
## 1: B -0.29369602 2.00385475
## 2: B 0.73757279 2.71793723
## 3: B -0.05559657 3.07941747
## 4: B 0.26125368 0.02720127
## 5: B 0.43726748 2.60483789
## ---
## 19996: A -0.15290843 0.56069624
## 19997: B 0.31178620 2.25823164
## 19998: B -0.90997732 3.74038410
## 19999: A 0.06718409 2.29398131
## 20000: A 0.47635502 3.96355510
## DL800 Ly6G_asinh_aligned AF700 CD45_asinh_aligned
## <num> <num>
## 1: -0.1480608 0.7780063
## 2: 0.3374757 2.8061771
## 3: 3.7818818 2.0387959
## 4: -0.4421591 1.4738377
## 5: 3.7771759 1.8637621
## ---
## 19996: -1.0745652 2.3543241
## 19997: 0.1235801 2.6152031
## 19998: 1.3456432 2.6467636
## 19999: -1.6143754 1.5896653
## 20000: 0.7784286 3.0210304
## APCCy7 CD48_asinh_aligned BUV395 CD11b_asinh_aligned
## <num> <num>
## 1: -0.6031739 4.3643961
## 2: 2.4243431 3.1393259
## 3: 0.5808588 4.5825343
## 4: 3.0247295 0.7374537
## 5: 0.5254776 4.4549651
## ---
## 19996: 3.1603029 1.3455793
## 19997: 2.2317555 3.1402769
## 19998: 2.7106078 3.9415228
## 19999: 2.3983061 0.3903685
## 20000: 3.5182674 4.1516290
## BUV737 B220_asinh_aligned BV605 Ly6C_asinh_aligned Alignment_MC_aligned
## <num> <num> <num>
## 1: 0.67665219 2.5046093 5
## 2: 0.22383082 0.3302887 10
## 3: 0.08498045 2.4683566 4
## 4: 3.18983173 0.6547109 1
## 5: 0.62932587 2.1033373 4
## ---
## 19996: 2.86870980 2.0855556 2
## 19997: 0.78986788 2.4517956 5
## 19998: 0.33557367 3.6496732 8
## 19999: 0.26035574 2.1166730 6
## 20000: 0.38186085 4.3858743 8
## FlowSOM_cluster FlowSOM_metacluster UMAP_X UMAP_Y
## <num> <fctr> <num> <num>
## 1: 119 22 1.924202 2.700375
## 2: 75 18 4.288721 -2.231453
## 3: 179 25 -4.118332 7.331559
## 4: 70 6 -5.826405 -6.053842
## 5: 180 25 -4.021310 6.415935
## ---
## 19996: 32 9 -0.907486 -7.260880
## 19997: 116 22 3.595872 1.210391
## 19998: 173 27 6.418618 2.405232
## 19999: 87 14 3.245920 -1.075086
## 20000: 185 26 6.894060 2.271706
## Population
## <char>
## 1: Immature neutrophils
## 2: Other
## 3: Mature neutrophils
## 4: Immature B cells
## 5: Mature neutrophils
## ---
## 19996: Mature B cells
## 19997: Immature neutrophils
## 19998: Mature neutrophils
## 19999: Other
## 20000: Monocytes
Save data.
### Save data and plots
fwrite(cell.dat, "Annotated.data.csv")
fwrite(cell.sub, "Annotated.data.DR.csv")
Subsequently, we can visualise the population labels on a UMAP plot.
make.colour.plot(cell.sub, "UMAP_X", "UMAP_Y", "Population", col.type = 'factor', add.label = TRUE)
make.multi.plot(cell.sub, "UMAP_X", "UMAP_Y", "Population", group.col, col.type = 'factor')
## Check your working directory for a new .png called 'Multi plot divided by Group.png'
We can also generate an expression heatmap to summarise the expression levels of each marker on our populations.
### Expression heatmap
rm(exp)
exp <- do.aggregate(cell.dat, aligned.cellular.cols, by = "Population")
make.pheatmap(exp, "Population", aligned.cellular.cols)
Save FCS for further analysis and evaluation.
### Write FCS files
setwd(OutputDirectory)
setwd("Output 4 - annotation")
dir.create('FCS files')
setwd('FCS files')
write.files(cell.dat,
file.prefix = exp.name,
divide.by = sample.col,
write.csv = FALSE,
write.fcs = TRUE)
Here we can create ‘summary’ data for our experiment. This involves calculating the percentage of each population in each sample, along with the corresponding cell counts if the information is available. In addition, we calculate the MFI for selected markers on each population in each sample.
First, set the working directory, and select which columns we will measure the MFI of. In this case, CD11b_asinh and Ly6C_asinh.
Set output directory.
setwd(OutputDirectory)
dir.create("Output 5 - summary data")
## Warning in dir.create("Output 5 - summary data"): 'Output 5 - summary data'
## already exists
setwd("Output 5 - summary data")
### Setup
variance.test <- 'kruskal.test'
pairwise.test <- "wilcox.test"
comparisons <- list(c("Mock", "Virus"))
comparisons
## [[1]]
## [1] "Mock" "Virus"
grp.order <- c("Mock", "Virus")
grp.order
## [1] "Mock" "Virus"
We can also specify which columns we wish to measure MFI levels on.
### Select columns to measure MFI
as.matrix(aligned.cellular.cols)
## [,1]
## [1,] "PECy5-5 CD3e_asinh_aligned"
## [2,] "PECy7 CD16-32_asinh_aligned"
## [3,] "DL800 Ly6G_asinh_aligned"
## [4,] "AF700 CD45_asinh_aligned"
## [5,] "APCCy7 CD48_asinh_aligned"
## [6,] "BUV395 CD11b_asinh_aligned"
## [7,] "BUV737 B220_asinh_aligned"
## [8,] "BV605 Ly6C_asinh_aligned"
dyn.cols <- aligned.cellular.cols[c(5,8)]
dyn.cols
## [1] "APCCy7 CD48_asinh_aligned" "BV605 Ly6C_asinh_aligned"
Use the new create.sumtable function to generate summary data – a data.table of samples (rows) vs measurements (columns).
### Create summary tables
sum.dat <- create.sumtable(dat = cell.dat,
sample.col = sample.col,
pop.col = "Population",
use.cols = dyn.cols,
annot.cols = c(group.col, batch.col)
)
## Creating sumtable
## -- running some initial tests
## -- calculting cell proportions
## -- calculting expression levels
## -- wrapping up
## -- sumtable complete!
Once the summary data has been generated, we can review it and select which columns to plot. In each case, the column names (i.e. name of each summary measure) are structured as ‘MEASURE TYPE – POPULATION’. This provides a useful structure, as we can use regular expression searches to split the name into just the MEASURE TYPE or POPULATION segment.
### Review summary data
sum.dat
## Sample Group Batch Exp APCCy7 CD48_asinh_aligned -- Immature B cells
## <fctr> <char> <char> <num>
## 1: Mock_01_A Mock A 2.424887
## 2: Mock_02_A Mock A 2.374245
## 3: Mock_03_A Mock A 2.508753
## 4: Mock_04_A Mock A 2.600931
## 5: Mock_05_B Mock B 2.419923
## 6: Mock_06_B Mock B 2.385357
## 7: Mock_07_B Mock B 2.325857
## 8: Mock_08_B Mock B 2.465879
## 9: WNV_01_A Virus A 2.122836
## 10: WNV_02_A Virus A 2.305655
## 11: WNV_03_A Virus A 2.232820
## 12: WNV_04_A Virus A 2.447445
## 13: WNV_05_B Virus B 2.049278
## 14: WNV_06_B Virus B 2.183903
## 15: WNV_07_B Virus B 2.152955
## 16: WNV_08_B Virus B 2.173844
## Exp APCCy7 CD48_asinh_aligned -- Immature neutrophils
## <num>
## 1: 1.572872
## 2: 1.526846
## 3: 1.566277
## 4: 1.682951
## 5: 1.486138
## 6: 1.373131
## 7: 1.466973
## 8: 1.640794
## 9: 1.915684
## 10: 1.904565
## 11: 1.852920
## 12: 1.889313
## 13: 1.843289
## 14: 1.817669
## 15: 1.875538
## 16: 1.859179
## Exp APCCy7 CD48_asinh_aligned -- Mature B cells
## <num>
## 1: 2.412014
## 2: 2.426953
## 3: 2.623877
## 4: 2.714248
## 5: 2.419015
## 6: 2.413756
## 7: 2.362291
## 8: 2.469881
## 9: 2.542520
## 10: 2.643040
## 11: 2.678252
## 12: 2.798805
## 13: 2.205732
## 14: 2.393001
## 15: 2.346441
## 16: 2.400605
## Exp APCCy7 CD48_asinh_aligned -- Mature neutrophils
## <num>
## 1: 0.5129836
## 2: 0.5528726
## 3: 0.5352846
## 4: 0.6251231
## 5: 0.5175413
## 6: 0.5001851
## 7: 0.5884818
## 8: 0.6476299
## 9: 0.8659670
## 10: 1.0622124
## 11: 1.0449080
## 12: 1.1598637
## 13: 0.9772988
## 14: 1.0035599
## 15: 1.1106013
## 16: 1.1197803
## Exp APCCy7 CD48_asinh_aligned -- Monocytes
## <num>
## 1: 3.261353
## 2: 3.254478
## 3: 3.419966
## 4: 3.435929
## 5: 3.250326
## 6: 3.249777
## 7: 3.218665
## 8: 3.316341
## 9: 3.186699
## 10: 3.261326
## 11: 3.347447
## 12: 3.497759
## 13: 3.129637
## 14: 3.258258
## 15: 3.277384
## 16: 3.345560
## Exp APCCy7 CD48_asinh_aligned -- Other
## <num>
## 1: 2.488965
## 2: 2.388345
## 3: 2.613971
## 4: 2.684664
## 5: 2.756410
## 6: 2.649867
## 7: 2.699764
## 8: 2.834465
## 9: 2.715744
## 10: 2.869109
## 11: 2.865703
## 12: 3.140419
## 13: 2.672039
## 14: 2.828839
## 15: 2.879763
## 16: 2.891880
## Exp APCCy7 CD48_asinh_aligned -- T cells
## <num>
## 1: 2.759116
## 2: 2.669215
## 3: 2.970334
## 4: 3.001218
## 5: 2.766417
## 6: 2.683297
## 7: 2.610325
## 8: 2.930828
## 9: 2.764899
## 10: 2.842706
## 11: 2.931075
## 12: 2.937672
## 13: 2.666584
## 14: 2.647255
## 15: 2.766093
## 16: 2.782212
## Exp BV605 Ly6C_asinh_aligned -- Immature B cells
## <num>
## 1: 0.2387287
## 2: 0.2969931
## 3: 0.2263609
## 4: 0.2079162
## 5: 0.2410776
## 6: 0.2074710
## 7: 0.2253116
## 8: 0.2424775
## 9: 0.5038403
## 10: 0.4630924
## 11: 0.4633731
## 12: 0.4460627
## 13: 0.5610659
## 14: 0.3308880
## 15: 0.3395451
## 16: 0.3197377
## Exp BV605 Ly6C_asinh_aligned -- Immature neutrophils
## <num>
## 1: 2.991968
## 2: 2.937831
## 3: 2.823442
## 4: 2.776157
## 5: 2.922687
## 6: 2.740521
## 7: 2.718578
## 8: 2.915434
## 9: 3.335838
## 10: 3.239150
## 11: 3.135962
## 12: 3.318283
## 13: 3.222065
## 14: 3.154325
## 15: 3.258957
## 16: 3.101161
## Exp BV605 Ly6C_asinh_aligned -- Mature B cells
## <num>
## 1: 0.5643588
## 2: 0.4958642
## 3: 0.4701363
## 4: 0.5119049
## 5: 0.6077903
## 6: 0.4059218
## 7: 0.4717474
## 8: 0.4684568
## 9: 1.4602033
## 10: 1.9881967
## 11: 1.7394467
## 12: 2.3235193
## 13: 1.4512756
## 14: 1.3436108
## 15: 1.9374443
## 16: 1.3764834
## Exp BV605 Ly6C_asinh_aligned -- Mature neutrophils
## <num>
## 1: 2.452764
## 2: 2.473359
## 3: 2.320761
## 4: 2.220885
## 5: 2.441565
## 6: 2.423296
## 7: 2.470727
## 8: 2.361716
## 9: 2.228977
## 10: 2.179033
## 11: 2.204247
## 12: 2.225025
## 13: 2.233103
## 14: 2.153610
## 15: 2.110596
## 16: 2.183070
## Exp BV605 Ly6C_asinh_aligned -- Monocytes
## <num>
## 1: 4.540271
## 2: 4.584795
## 3: 4.458234
## 4: 4.307355
## 5: 4.539643
## 6: 4.525972
## 7: 4.572508
## 8: 4.577179
## 9: 4.304258
## 10: 4.264818
## 11: 4.213613
## 12: 4.346373
## 13: 4.148381
## 14: 4.499382
## 15: 4.429550
## 16: 4.306105
## Exp BV605 Ly6C_asinh_aligned -- Other
## <num>
## 1: 0.7109253
## 2: 0.7375577
## 3: 0.6353084
## 4: 0.6839464
## 5: 0.7102996
## 6: 0.7267920
## 7: 0.7004569
## 8: 0.7801929
## 9: 1.4603817
## 10: 1.5642416
## 11: 1.6769586
## 12: 1.5278105
## 13: 1.3646209
## 14: 1.4066820
## 15: 1.3737616
## 16: 1.3836509
## Exp BV605 Ly6C_asinh_aligned -- T cells
## <num>
## 1: 1.5734507
## 2: 0.8458289
## 3: 1.0760717
## 4: 1.1502750
## 5: 1.4533346
## 6: 1.2713361
## 7: 1.3909613
## 8: 3.0236276
## 9: 2.8637326
## 10: 3.0743690
## 11: 3.1711638
## 12: 3.1395780
## 13: 3.3345189
## 14: 2.8204267
## 15: 3.1807756
## 16: 3.2684104
## Percent of sample -- Immature B cells
## <num>
## 1: 18.78
## 2: 20.30
## 3: 20.03
## 4: 20.32
## 5: 20.13
## 6: 16.84
## 7: 13.97
## 8: 22.61
## 9: 3.27
## 10: 4.31
## 11: 9.86
## 12: 7.41
## 13: 1.13
## 14: 6.46
## 15: 6.41
## 16: 4.71
## Percent of sample -- Immature neutrophils
## <num>
## 1: 6.63
## 2: 8.80
## 3: 6.67
## 4: 5.77
## 5: 6.65
## 6: 6.49
## 7: 5.44
## 8: 7.45
## 9: 8.86
## 10: 11.06
## 11: 12.02
## 12: 9.58
## 13: 9.11
## 14: 9.90
## 15: 8.36
## 16: 8.73
## Percent of sample -- Mature B cells Percent of sample -- Mature neutrophils
## <num> <num>
## 1: 12.60 41.92
## 2: 12.54 40.20
## 3: 12.90 39.13
## 4: 12.61 40.79
## 5: 11.26 42.80
## 6: 15.23 41.29
## 7: 17.03 41.20
## 8: 7.52 42.82
## 9: 15.96 36.22
## 10: 15.32 33.94
## 11: 17.60 29.29
## 12: 13.14 40.20
## 13: 15.56 37.60
## 14: 18.27 35.16
## 15: 14.48 37.44
## 16: 17.62 35.51
## Percent of sample -- Monocytes Percent of sample -- Other
## <num> <num>
## 1: 8.01 7.59
## 2: 5.49 8.49
## 3: 8.45 8.82
## 4: 8.79 8.09
## 5: 7.43 8.46
## 6: 8.82 7.56
## 7: 9.10 7.37
## 8: 9.87 7.55
## 9: 15.22 12.12
## 10: 14.52 13.06
## 11: 13.19 11.61
## 12: 9.48 11.31
## 13: 17.94 11.17
## 14: 9.57 11.79
## 15: 12.53 11.42
## 16: 13.04 11.75
## Percent of sample -- T cells
## <num>
## 1: 4.47
## 2: 4.18
## 3: 4.00
## 4: 3.63
## 5: 3.27
## 6: 3.77
## 7: 5.89
## 8: 2.18
## 9: 8.35
## 10: 7.79
## 11: 6.43
## 12: 8.88
## 13: 7.49
## 14: 8.85
## 15: 9.36
## 16: 8.64
as.matrix(names(sum.dat))
## [,1]
## [1,] "Sample"
## [2,] "Group"
## [3,] "Batch"
## [4,] "Exp APCCy7 CD48_asinh_aligned -- Immature B cells"
## [5,] "Exp APCCy7 CD48_asinh_aligned -- Immature neutrophils"
## [6,] "Exp APCCy7 CD48_asinh_aligned -- Mature B cells"
## [7,] "Exp APCCy7 CD48_asinh_aligned -- Mature neutrophils"
## [8,] "Exp APCCy7 CD48_asinh_aligned -- Monocytes"
## [9,] "Exp APCCy7 CD48_asinh_aligned -- Other"
## [10,] "Exp APCCy7 CD48_asinh_aligned -- T cells"
## [11,] "Exp BV605 Ly6C_asinh_aligned -- Immature B cells"
## [12,] "Exp BV605 Ly6C_asinh_aligned -- Immature neutrophils"
## [13,] "Exp BV605 Ly6C_asinh_aligned -- Mature B cells"
## [14,] "Exp BV605 Ly6C_asinh_aligned -- Mature neutrophils"
## [15,] "Exp BV605 Ly6C_asinh_aligned -- Monocytes"
## [16,] "Exp BV605 Ly6C_asinh_aligned -- Other"
## [17,] "Exp BV605 Ly6C_asinh_aligned -- T cells"
## [18,] "Percent of sample -- Immature B cells"
## [19,] "Percent of sample -- Immature neutrophils"
## [20,] "Percent of sample -- Mature B cells"
## [21,] "Percent of sample -- Mature neutrophils"
## [22,] "Percent of sample -- Monocytes"
## [23,] "Percent of sample -- Other"
## [24,] "Percent of sample -- T cells"
Specify which columns we want to plot.
annot.cols <- c(group.col, batch.col)
plot.cols <- names(sum.dat)[c(4:24)]
plot.cols
## [1] "Exp APCCy7 CD48_asinh_aligned -- Immature B cells"
## [2] "Exp APCCy7 CD48_asinh_aligned -- Immature neutrophils"
## [3] "Exp APCCy7 CD48_asinh_aligned -- Mature B cells"
## [4] "Exp APCCy7 CD48_asinh_aligned -- Mature neutrophils"
## [5] "Exp APCCy7 CD48_asinh_aligned -- Monocytes"
## [6] "Exp APCCy7 CD48_asinh_aligned -- Other"
## [7] "Exp APCCy7 CD48_asinh_aligned -- T cells"
## [8] "Exp BV605 Ly6C_asinh_aligned -- Immature B cells"
## [9] "Exp BV605 Ly6C_asinh_aligned -- Immature neutrophils"
## [10] "Exp BV605 Ly6C_asinh_aligned -- Mature B cells"
## [11] "Exp BV605 Ly6C_asinh_aligned -- Mature neutrophils"
## [12] "Exp BV605 Ly6C_asinh_aligned -- Monocytes"
## [13] "Exp BV605 Ly6C_asinh_aligned -- Other"
## [14] "Exp BV605 Ly6C_asinh_aligned -- T cells"
## [15] "Percent of sample -- Immature B cells"
## [16] "Percent of sample -- Immature neutrophils"
## [17] "Percent of sample -- Mature B cells"
## [18] "Percent of sample -- Mature neutrophils"
## [19] "Percent of sample -- Monocytes"
## [20] "Percent of sample -- Other"
## [21] "Percent of sample -- T cells"
Reorder the data such that sample appear in the specify group order.
### Reorder summary data and SAVE
sum.dat <- do.reorder(sum.dat, group.col, grp.order)
sum.dat[,c(1:3)]
## Sample Group Batch
## <fctr> <fctr> <char>
## 1: Mock_01_A Mock A
## 2: Mock_02_A Mock A
## 3: Mock_03_A Mock A
## 4: Mock_04_A Mock A
## 5: Mock_05_B Mock B
## 6: Mock_06_B Mock B
## 7: Mock_07_B Mock B
## 8: Mock_08_B Mock B
## 9: WNV_01_A Virus A
## 10: WNV_02_A Virus A
## 11: WNV_03_A Virus A
## 12: WNV_04_A Virus A
## 13: WNV_05_B Virus B
## 14: WNV_06_B Virus B
## 15: WNV_07_B Virus B
## 16: WNV_08_B Virus B
Save the summary data.
fwrite(sum.dat, 'sum.dat.csv')
We can use the run.autograph function to create violin/scatter plots with embedded statistic – one per population/measurement type.
### Autographs
for(i in plot.cols){
measure <- gsub("\\ --.*", "", i)
measure
pop <- gsub("^[^--]*.-- ", "", i)
pop
make.autograph(sum.dat,
x.axis = group.col,
y.axis = i,
y.axis.label = measure,
violin = FALSE,
colour.by = batch.col,
grp.order = grp.order,
my_comparisons = comparisons,
Variance_test = variance.test,
Pairwise_test = pairwise.test,
title = pop,
subtitle = measure,
filename = paste0(i, '.pdf'))
}
We can also create a global heatmap show the z-score of each population/measurement type against each sample.
### Create a fold change heatmap
## Z-score calculation
sum.dat.z <- do.zscore(sum.dat, plot.cols, replace = TRUE)
## Group
t.first <- match(grp.order, sum.dat.z[[group.col]])
t.first <- t.first -1
t.first
## Make heatmap
make.pheatmap(sum.dat.z,
sample.col = sample.col,
plot.cols = plot.cols,
is.fold = TRUE,
plot.title = 'Z-score',
annot.cols = annot.cols,
dendrograms = 'column',
row.sep = t.first,
cutree_cols = 3)
Create “Output-info” directory and save session data
For the final step of our setup, we want to record the session info our R session, and save this in a folder we’ll call “Output-info”.
### Session info and metadata
setwd(OutputDirectory)
dir.create("Output - info", showWarnings = FALSE)
setwd("Output - info")
sink(file = "session_info.txt", append=TRUE, split=FALSE, type = c("output", "message"))
sessionInfo()
## R version 4.4.0 (2024-04-24)
## Platform: aarch64-apple-darwin20
## Running under: macOS Ventura 13.5.2
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: Europe/London
## tzcode source: internal
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] Biobase_2.62.0 BiocGenerics_0.48.1 CytoNorm_2.0.1
## [4] remotes_2.5.0 viridis_0.6.5 viridisLite_0.4.2
## [7] uwot_0.2.2 umap_0.2.10.0 scattermore_1.2
## [10] scales_1.3.0 Rtsne_0.17 rsvd_1.0.5
## [13] rstudioapi_0.16.0 RColorBrewer_1.1-3 pheatmap_1.0.12
## [16] patchwork_1.2.0 irlba_2.3.5.1 Matrix_1.7-0
## [19] gtools_3.9.5 gridExtra_2.3 ggthemes_5.1.0
## [22] ggpubr_0.6.0 ggpointdensity_0.1.0 FlowSOM_2.10.0
## [25] igraph_2.0.3 flowCore_2.14.2 factoextra_1.0.7
## [28] ggplot2_3.5.1 dendsort_0.3.4 data.table_1.15.4
## [31] colorRamps_2.3.4 Spectre_1.2.0
##
## loaded via a namespace (and not attached):
## [1] rlang_1.1.3 magrittr_2.0.3
## [3] matrixStats_1.3.0 compiler_4.4.0
## [5] png_0.1-8 vctrs_0.6.5
## [7] stringr_1.5.1 pkgconfig_2.0.3
## [9] fastmap_1.2.0 backports_1.4.1
## [11] labeling_0.4.3 utf8_1.2.4
## [13] rmarkdown_2.27 purrr_1.0.2
## [15] xfun_0.44 cachem_1.1.0
## [17] jsonlite_1.8.8 highr_0.10
## [19] tweenr_2.0.3 broom_1.0.6
## [21] cluster_2.1.6 R6_2.5.1
## [23] bslib_0.7.0 stringi_1.8.4
## [25] reticulate_1.37.0 car_3.1-2
## [27] jquerylib_0.1.4 Rcpp_1.0.12
## [29] knitr_1.46 tidyselect_1.2.1
## [31] abind_1.4-5 yaml_2.3.8
## [33] curl_5.2.1 lattice_0.22-6
## [35] tibble_3.2.1 withr_3.0.0
## [37] askpass_1.2.0 evaluate_0.23
## [39] polyclip_1.10-6 ConsensusClusterPlus_1.66.0
## [41] pillar_1.9.0 carData_3.0-5
## [43] stats4_4.4.0 generics_0.1.3
## [45] S4Vectors_0.40.2 munsell_0.5.1
## [47] glue_1.7.0 tools_4.4.0
## [49] ggnewscale_0.4.10 RSpectra_0.16-1
## [51] ggsignif_0.6.4 XML_3.99-0.16.1
## [53] grid_4.4.0 tidyr_1.3.1
## [55] RProtoBufLib_2.14.1 colorspace_2.1-0
## [57] ggforce_0.4.2 cli_3.6.2
## [59] fansi_1.0.6 cytolib_2.14.1
## [61] dplyr_1.1.4 gtable_0.3.5
## [63] rstatix_0.7.2 sass_0.4.9
## [65] digest_0.6.35 ggrepel_0.9.5
## [67] farver_2.1.2 htmltools_0.5.8.1
## [69] lifecycle_1.0.4 openssl_2.2.0
## [71] MASS_7.3-60.2
sink()
We will also save the cellular and clustering columns.
write(aligned.cellular.cols, "cellular.cols.txt")
write(aligned.cluster.cols, "cluster.cols.txt")