Spectre - batch alignment and analysis of cytometry data using CytoNorm

Introduction

Overview

The batch alignment and analysis workflow builds on the ‘simple discovery’ workflow by adding a step to facilitate batch alignment. This workflow allows for the correction of technical variation or shifts in signal levels in samples stained and/or acquired across multiple batches. To do this, we have implemented the CytoNorm algorithm (Van Gassen 2020). CytoNorm uses reference control samples that are prepared and recorded along with each batch of samples to identify and correct technical variations between individual batches, while preserving biologically relevant differences. For more information on CytoNorm, see Van Gassen et al 2020, and for more information on our implementation in Spectre, see Ashhurst et al 2021.

The demo dataset used for this worked example are cells extracted from mock- or virally-infected mouse bone marrow, measured by flow cytometry. Expression level values in these datasets have been manipulated to simulate acquisition over two batches.

Reference controls for CytoNorm

An example of reference control samples are aliquots of peripheral blood mononuclear cells (PBMCs) that are derived from a single donor at one time point, and cryopreserved (i.e. multiple aliquots of a biologically identical sample). Each time a set of PBMC samples from the study cohort are thawed, stained, and recorded, a reference controls is also thawed, stained, and recorded. Differences in signal level between the reference controls allows CytoNorm to learn the differences in signal levels due to the batches, and correct them, while preserving biological differences between the individual samples. In our demo dataset, we are using bone marrow samples derived from separate mice. Though not derived from the same mouse, these are similar enough that they can be used successfully as reference controls.

Correction of batch effects with CytoNorm

Reference samples are clustered together using FlowSOM, where we attempt to captures cells from matching populations in each batch in a single metacluster (e.g. neutrophils from batch 1 and 2 are captured in metacluster 1, etc). This assumes a reasonably low level of batch effect, consisting of small shifts in the expression levels for one or more markers. Because the proportion of cells in each population of the reference samples are identical, FlowSOM can use quantile distributions for each marker on each metacluser to create a model which will adjust the data values, removing technical variation between batches. This model is then applied to all samples in each batch.

Requirements

The reference controls can only correct batch effects for markers which are actually expressed on the reference controls. For example, some activation markers may be expressed on samples from the ‘disease’ group, but won’t be present on reference controls derived from healthy donors. In this case, we would simply not attempt to perform alignment on that marker. Typically, stable phenotyping markers (e.g. CD4, CD8) would be expressed strongly enough in reference samples, and would be suitable for alignment.

Other approaches

If you have samples derived from multiple batches, but do not have reference controls processed with each batch, then other forms of batch alignment might be possible – see our data integration options for more information.

Citation and methods

Citation

If you use Spectre in your work, please consider citing Ashhurst TM, Marsh-Wakefield F, Putri GH et al. (2022). Cytometry Part A 101 (3), 237-253. Please also consider citing the authors of the individual packages or tools (e.g. CytoNorm, FlowSOM, tSNE, UMAP, etc) that are critical elements of your analysis work.

Sample methods blurb

Here is a sample methods blurb for this workflow. You may need to adapt this text to reflect any changes made in your analysis.

Computational analysis of data was performed using the Spectre R package (Ashhurst et al., 2022), with instructions and source code provided at https://github.com/ImmuneDynamics/spectre. Samples were initially prepared in FlowJo, and the population of interest was exported as raw value CSV files. Arcsinh transformation was performed on the data in R using a co-factor of 15 to redistribute the data on a linear scale and compress low end values near zero. The dataset was then merged into a single data.table, with keywords denoting the sample, group, and other factors added to each row (cell). The CytoNorm algorithm was used to correct for batch effects in the data using matched reference samples processed in each batch. The FlowSOM algorithm (Van Gassen et al., 2015) was then run on the merged dataset to cluster the data, where every cell is assigned to a specific cluster and metacluster. Subsequently, the data was downsampled and analysed by the dimensionality reduction algorithm Uniform Manifold Approximation and Projection (UMAP) (McInnes, Healy, Melville, 2018) for cellular visualisation.

Setup

Directories

Create a master folder with a meaningful name. Then inside that folder, insert the following:

One folder called ‘data’ – this will contain your data CSV or FCS files
One folder called ‘metadata’ – this will contain a CSV containg your sample metadata
One folder called ‘Spectre CytoNorm’ or similar – place this analysis script there

Example:

  BM analysis
    /data
        -- Contains data files, one CSV or FCS per sample
    /metadata
        -- Contains a CSV containing sample metadata (group, batch, etc)
    /Spectre CytoNorm
        -- CytoNorm workflow.R

Analysis script

You can download the CytoNorm script from this link – place this inside the Spectre CytoNorm folder.

Data files

If you would to use the demo data as a test run for the CytoNorm workflow, nothing to do at this step. Simply follow the relevant instructions further down this page to download the demo data (under 2. Import and prep data).

If you would like to use your own data, add your data and metadata files:

Place the sample CSV or FCS files in the data folder you created above.
Place the metadata CSV files in the metadata folder you created above.

Please see this page for detailed instructions on exporting data for Spectre and setting up a metadata file.

1. Packages and directories

Load packages

Load the Spectre and other required libraries

Running library(Spectre) will load the Spectre package. We can then use package.check() to see if the standard dependency packages are installed, and package.load() to load those packages.

    ### Load libraries

        library(Spectre)
        Spectre::package.check()    # Check that all required packages are installed
        Spectre::package.load()     # Load required packages

    ### Install CytoNorm package
        
        if(!require('remotes')) {install.packages('remotes')}
        library('remotes')
        remotes::install_github(repo = "saeyslab/CytoNorm")

Set directories

Set ‘PrimaryDirectory’

Initially, we will set the location of the script as PrimaryDirectory. We’ll use this as a sort of ‘home page’ for where our analysis is going to be performed – including where to find our input data, metadata, and where our output data will go.

    ### Set PrimaryDirectory
        
        dirname(rstudioapi::getActiveDocumentContext()$path)            # Finds the directory where this script is located
        setwd(dirname(rstudioapi::getActiveDocumentContext()$path))     # Sets the working directory to where the script is located
        getwd()
        PrimaryDirectory <- getwd()
        PrimaryDirectory

Set ‘InputDirectory’

Next we need to set the location of the ‘data’ folder – where our samples for analysis are stored. In this example they are stored in a sub-folder called data.

    ### Set 'input' directory
        
        setwd(PrimaryDirectory)
        dir.create('../data', showWarnings = FALSE)
        setwd("../data/")
        InputDirectory <- getwd()
        setwd(PrimaryDirectory)

Set ‘MetaDirectory’

We need to set the location of the ‘metadata’ folder. This is where we can store a CSV file that contains any relevant metadata that we want to embed in our samples. In this example, it is located in a sub-folder called metadata.

    ### Set 'metadata' directory
        
        setwd(PrimaryDirectory)
        dir.create('../metadata', showWarnings = FALSE)
        setwd("../metadata/")
        MetaDirectory <- getwd()
        setwd(PrimaryDirectory)

Create ‘OutputDirectory’

We need to create a folder where our output data can go once our analysis is finished. In this example we will call this Output_Spectre.

    ### Create output directory
        
        setwd(PrimaryDirectory)
        dir.create("Output_Spectre", showWarnings = FALSE)
        setwd("Output_Spectre")
        OutputDirectory <- getwd()
        setwd(PrimaryDirectory)

2. Import and prep data

Demo data

If you need the demo dataset, uncomment the following code in the analysis script (select all, CMD+SHIFT+C) and run to download. If you are using your own datasets, then skip this step.

This code will download the demo dataset files and metadata file, and place them in the data and metadata folders respectively.

# setwd(PrimaryDirectory)
# setwd("../")
# getwd()
# download.file(url = "https://github.com/ImmuneDynamics/data/blob/main/simBatches.zip?raw=TRUE", destfile = 'simBatches.zip', mode = 'wb')
# unzip(zipfile = 'simBatches.zip')
# for(i in list.files('simBatches/data', full.names = TRUE)){
#   file.rename(from = i,  to = gsub('simBatches/', '', i))
# }
# for(i in list.files('simBatches/metadata', full.names = TRUE)){
#   file.rename(from = i,  to = gsub('simBatches/', '', i))
# }
# unlink(c('simBatches/', 'simBatches.zip', '__MACOSX'), recursive = TRUE)

Import data

To begin, we will change our working directory to ‘InputDirectory’ and list all the files in that directory – CSV files in this example.

    ### Import data

        setwd(InputDirectory)
        list.files(InputDirectory, ".csv")

##  [1] "Mock_01_A.csv"  "Mock_02_A.csv"  "Mock_03_A.csv"  "Mock_04_A.csv" 
##  [5] "Mock_05_B.csv"  "Mock_06_B.csv"  "Mock_07_B.csv"  "Mock_08_B.csv" 
##  [9] "Virus_01_A.csv" "Virus_02_A.csv" "Virus_03_A.csv" "Virus_04_A.csv"
## [13] "Virus_05_B.csv" "Virus_06_B.csv" "Virus_07_B.csv" "Virus_08_B.csv"

We can then read in all of our samples (in this example, one CSV file per sample) into a list called data.list. Spectre uses the data.table framework to store data, which reads, writes, and performs operations on data very quickly.

        data.list <- Spectre::read.files(file.loc = InputDirectory,
                                         file.type = ".csv",
                                         do.embed.file.names = TRUE)

Next we will review the data files by running the do.list.summary() function.

Review data

    ### Check the data

        check <- do.list.summary(data.list)

        check$name.table # Review column names and their subsequent values

##              X1            X2         X3         X4          X5           X6
## 1  PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 2  PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 3  PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 4  PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 5  PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 6  PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 7  PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 8  PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 9  PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 10 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 11 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 12 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 13 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 14 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 15 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 16 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
##             X7         X8       X9    X10
## 1  BUV737 B220 BV605 Ly6C FileName FileNo
## 2  BUV737 B220 BV605 Ly6C FileName FileNo
## 3  BUV737 B220 BV605 Ly6C FileName FileNo
## 4  BUV737 B220 BV605 Ly6C FileName FileNo
## 5  BUV737 B220 BV605 Ly6C FileName FileNo
## 6  BUV737 B220 BV605 Ly6C FileName FileNo
## 7  BUV737 B220 BV605 Ly6C FileName FileNo
## 8  BUV737 B220 BV605 Ly6C FileName FileNo
## 9  BUV737 B220 BV605 Ly6C FileName FileNo
## 10 BUV737 B220 BV605 Ly6C FileName FileNo
## 11 BUV737 B220 BV605 Ly6C FileName FileNo
## 12 BUV737 B220 BV605 Ly6C FileName FileNo
## 13 BUV737 B220 BV605 Ly6C FileName FileNo
## 14 BUV737 B220 BV605 Ly6C FileName FileNo
## 15 BUV737 B220 BV605 Ly6C FileName FileNo
## 16 BUV737 B220 BV605 Ly6C FileName FileNo

        check$ncol.check # Review number of columns (features, markers) in each sample

##       [,1]
##  [1,] 10  
##  [2,] 10  
##  [3,] 10  
##  [4,] 10  
##  [5,] 10  
##  [6,] 10  
##  [7,] 10  
##  [8,] 10  
##  [9,] 10  
## [10,] 10  
## [11,] 10  
## [12,] 10  
## [13,] 10  
## [14,] 10  
## [15,] 10  
## [16,] 10

        check$nrow.check # Review number of rows (cells) in each sample

##       [,1] 
##  [1,] 10000
##  [2,] 10000
##  [3,] 10000
##  [4,] 10000
##  [5,] 10000
##  [6,] 10000
##  [7,] 10000
##  [8,] 10000
##  [9,] 10000
## [10,] 10000
## [11,] 10000
## [12,] 10000
## [13,] 10000
## [14,] 10000
## [15,] 10000
## [16,] 10000

You can review the first sample by running data.list[[1]].

        data.list[[1]]

##        PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48
##               <num>         <num>      <num>      <num>       <num>
##     1:      51.6403      1239.360   -681.257   1752.270    2600.430
##     2:      58.8824      -140.921   -267.039   1188.640     392.962
##     3:      60.0077      2191.260   -362.088   1676.960    1441.300
##     4:     313.9580      2532.700   -438.388    478.109    1809.220
##     5:     -83.0391      1603.390  -1200.350    564.994    3946.200
##    ---                                                             
##  9996:    -121.9210      4626.660   7377.120   1600.950     197.503
##  9997:    5635.1700       807.513   -831.751   1750.210    2388.820
##  9998:     132.8540       185.348    780.195    611.077    1471.310
##  9999:     -78.8966      2564.800  10150.800   2161.430     264.904
## 10000:      17.2606       531.683   -801.073    350.229    1963.560
##        BUV395 CD11b BUV737 B220 BV605 Ly6C  FileName FileNo
##               <num>       <num>      <num>    <char>  <int>
##     1:      627.259  17763.6000   128.9340 Mock_01_A      1
##     2:      304.615  11834.8000   907.3650 Mock_01_A      1
##     3:     4740.710    185.9470  3915.8200 Mock_01_A      1
##     4:     9440.380   1056.2900  6221.6700 Mock_01_A      1
##     5:     1636.110   4579.8700   274.5270 Mock_01_A      1
##    ---                                                     
##  9996:    12732.300    -18.9127  3826.8000 Mock_01_A      1
##  9997:     1106.890   -394.7910    79.3191 Mock_01_A      1
##  9998:      823.702   6442.3800  -345.6880 Mock_01_A      1
##  9999:    12933.400   -106.9610  8881.1600 Mock_01_A      1
## 10000:      793.921   8161.8700   288.1360 Mock_01_A      1

Merge data

Merge data.tables

We can merge the data into a single data.table using do.merge.files(). By default, columns with matching names will be aligned in the new table, and any columns that are present in some samples, but not others, will be added and filled with ‘NA’ for any samples that didn’t have that column initially.

    ### Merge data

        cell.dat <- Spectre::do.merge.files(dat = data.list)
        cell.dat

##         PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48
##                <num>         <num>      <num>      <num>       <num>
##      1:      51.6403     1239.3600   -681.257   1752.270    2600.430
##      2:      58.8824     -140.9210   -267.039   1188.640     392.962
##      3:      60.0077     2191.2600   -362.088   1676.960    1441.300
##      4:     313.9580     2532.7000   -438.388    478.109    1809.220
##      5:     -83.0391     1603.3900  -1200.350    564.994    3946.200
##     ---                                                             
## 159996:     106.3880     1661.1200   -114.892   2138.700    8311.230
## 159997:    1750.2300       65.4275   -843.984  19655.900    7874.900
## 159998:    -159.2870     5363.3900  17389.300   2578.650    -362.093
## 159999:     -62.6836     3132.7100   1941.660   1548.130     577.625
## 160000:     650.9440     1991.4900    254.429   1677.730    2209.210
##         BUV395 CD11b BUV737 B220 BV605 Ly6C   FileName FileNo
##                <num>       <num>      <num>     <char>  <int>
##      1:      627.259   17763.600    128.934  Mock_01_A      1
##      2:      304.615   11834.800    907.365  Mock_01_A      1
##      3:     4740.710     185.947   3915.820  Mock_01_A      1
##      4:     9440.380    1056.290   6221.670  Mock_01_A      1
##      5:     1636.110    4579.870    274.527  Mock_01_A      1
##     ---                                                      
## 159996:      127.068     258.936   6161.070 Virus_08_B     16
## 159997:      122.179     550.596    132.445 Virus_08_B     16
## 159998:    12091.600     121.659   4257.980 Virus_08_B     16
## 159999:    12904.000     338.244   1030.080 Virus_08_B     16
## 160000:    22242.300     789.639  12467.000 Virus_08_B     16

Read in metadata

Read in sample metadata.

    ### Read in metadata  
       
        setwd(MetaDirectory)
        
        meta.dat <- fread("sample.details.csv")
        meta.dat

##           FileName    Sample  Group  Batch
##             <char>    <char> <char> <char>
##  1:  Mock_01_A.csv Mock_01_A   Mock      A
##  2:  Mock_02_A.csv Mock_02_A   Mock      A
##  3:  Mock_03_A.csv Mock_03_A   Mock      A
##  4:  Mock_04_A.csv Mock_04_A   Mock      A
##  5: Virus_01_A.csv  WNV_01_A  Virus      A
##  6: Virus_02_A.csv  WNV_02_A  Virus      A
##  7: Virus_03_A.csv  WNV_03_A  Virus      A
##  8: Virus_04_A.csv  WNV_04_A  Virus      A
##  9:  Mock_05_B.csv Mock_05_B   Mock      B
## 10:  Mock_06_B.csv Mock_06_B   Mock      B
## 11:  Mock_07_B.csv Mock_07_B   Mock      B
## 12:  Mock_08_B.csv Mock_08_B   Mock      B
## 13: Virus_05_B.csv  WNV_05_B  Virus      B
## 14: Virus_06_B.csv  WNV_06_B  Virus      B
## 15: Virus_07_B.csv  WNV_07_B  Virus      B
## 16: Virus_08_B.csv  WNV_08_B  Virus      B

3. Data transformation

Before we perform clustering etc, we need to meaningfully transform the data. For more information on why this is necessary, please see this page.

    setwd(OutputDirectory)
    dir.create("Output 1 - transformed plots")

## Warning in dir.create("Output 1 - transformed plots"): 'Output 1 - transformed
## plots' already exists

    setwd("Output 1 - transformed plots")

First, check the column names of the dataset.

    ### Arcsinh transformation

        as.matrix(names(cell.dat))

##       [,1]           
##  [1,] "PECy5-5 CD3e" 
##  [2,] "PECy7 CD16-32"
##  [3,] "DL800 Ly6G"   
##  [4,] "AF700 CD45"   
##  [5,] "APCCy7 CD48"  
##  [6,] "BUV395 CD11b" 
##  [7,] "BUV737 B220"  
##  [8,] "BV605 Ly6C"   
##  [9,] "FileName"     
## [10,] "FileNo"

In this example, the columns we want to apply arcsinh transformation to are the cellular columns – column 1 to column 8. We can specify those columns using the code below.

        to.asinh <- names(cell.dat)[c(1:8)]
        to.asinh

## [1] "PECy5-5 CD3e"  "PECy7 CD16-32" "DL800 Ly6G"    "AF700 CD45"   
## [5] "APCCy7 CD48"   "BUV395 CD11b"  "BUV737 B220"   "BV605 Ly6C"

Define the cofactor we will use for transformation. As a general recommendation, we suggest using cofactor = 15 for CyTOF data, and cofactor between 100 and 1000 for flow data (we suggest 500 as a starting point). For more detailed guidance on setting the co-factors, see this page.

        cofactor <- 500

You can also choose a column to use for plotting the transformed result – ideally something that is expressed on a variety of cell types in your dataset.

        plot.against <- "BV605 Ly6C_asinh"

Now we need to apply arcsinh transformation to the data in those columns, using a specific co-factor.

        cell.dat <- do.asinh(cell.dat, to.asinh, cofactor = cofactor)

We can then make some plots to see if the arcsinh transformation is appropriate. Check the plots and see if you are happy with the transformation. If happy, the proceed with analysis. Otherwise, go back to the merging of the data.list (to create cell.dat) and try with another co-factor.

        transformed.cols <- paste0(to.asinh, "_asinh")

        for(i in transformed.cols){
          make.colour.plot(do.subsample(cell.dat, 20000), i, plot.against)
        }

4. Add metadata and set some preferences

Add metadata

We also want to attach some sample metadata, to aid with our analysis.

    ### Add metadata to data.table

        meta.dat

##           FileName    Sample  Group  Batch
##             <char>    <char> <char> <char>
##  1:  Mock_01_A.csv Mock_01_A   Mock      A
##  2:  Mock_02_A.csv Mock_02_A   Mock      A
##  3:  Mock_03_A.csv Mock_03_A   Mock      A
##  4:  Mock_04_A.csv Mock_04_A   Mock      A
##  5: Virus_01_A.csv  WNV_01_A  Virus      A
##  6: Virus_02_A.csv  WNV_02_A  Virus      A
##  7: Virus_03_A.csv  WNV_03_A  Virus      A
##  8: Virus_04_A.csv  WNV_04_A  Virus      A
##  9:  Mock_05_B.csv Mock_05_B   Mock      B
## 10:  Mock_06_B.csv Mock_06_B   Mock      B
## 11:  Mock_07_B.csv Mock_07_B   Mock      B
## 12:  Mock_08_B.csv Mock_08_B   Mock      B
## 13: Virus_05_B.csv  WNV_05_B  Virus      B
## 14: Virus_06_B.csv  WNV_06_B  Virus      B
## 15: Virus_07_B.csv  WNV_07_B  Virus      B
## 16: Virus_08_B.csv  WNV_08_B  Virus      B

Here we will select just the columns we would like to add to our data. In this example we only want to include use first four columns: Filename, Sample, Group, and Batch. We will use Filename for matching between cell.dat and meta.dat, and the other three columns will be the information that gets added to cell.dat.

        sample.info <- meta.dat[,c(1:4)]
        sample.info

##           FileName    Sample  Group  Batch
##             <char>    <char> <char> <char>
##  1:  Mock_01_A.csv Mock_01_A   Mock      A
##  2:  Mock_02_A.csv Mock_02_A   Mock      A
##  3:  Mock_03_A.csv Mock_03_A   Mock      A
##  4:  Mock_04_A.csv Mock_04_A   Mock      A
##  5: Virus_01_A.csv  WNV_01_A  Virus      A
##  6: Virus_02_A.csv  WNV_02_A  Virus      A
##  7: Virus_03_A.csv  WNV_03_A  Virus      A
##  8: Virus_04_A.csv  WNV_04_A  Virus      A
##  9:  Mock_05_B.csv Mock_05_B   Mock      B
## 10:  Mock_06_B.csv Mock_06_B   Mock      B
## 11:  Mock_07_B.csv Mock_07_B   Mock      B
## 12:  Mock_08_B.csv Mock_08_B   Mock      B
## 13: Virus_05_B.csv  WNV_05_B  Virus      B
## 14: Virus_06_B.csv  WNV_06_B  Virus      B
## 15: Virus_07_B.csv  WNV_07_B  Virus      B
## 16: Virus_08_B.csv  WNV_08_B  Virus      B

Now we can add this information to cell.dat. Essentially, the file names are listed in the metadata table, and we can use that to add any listed metadata in the table to the corresponding files in data.list. We can review the data to ensure the metadata has been correctly embedded.

        cell.dat <- do.add.cols(cell.dat, "FileName", sample.info, "FileName", rmv.ext = TRUE)

## Removing '.csv' or '.fcs' extension

## Step 1/3. Mapping data

## Step 2/3. Merging data

## Step 3/3. Returning data

        cell.dat

##         PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48
##                <num>         <num>      <num>      <num>       <num>
##      1:      51.6403     1239.3600   -681.257   1752.270    2600.430
##      2:      58.8824     -140.9210   -267.039   1188.640     392.962
##      3:      60.0077     2191.2600   -362.088   1676.960    1441.300
##      4:     313.9580     2532.7000   -438.388    478.109    1809.220
##      5:     -83.0391     1603.3900  -1200.350    564.994    3946.200
##     ---                                                             
## 159996:     106.3880     1661.1200   -114.892   2138.700    8311.230
## 159997:    1750.2300       65.4275   -843.984  19655.900    7874.900
## 159998:    -159.2870     5363.3900  17389.300   2578.650    -362.093
## 159999:     -62.6836     3132.7100   1941.660   1548.130     577.625
## 160000:     650.9440     1991.4900    254.429   1677.730    2209.210
##         BUV395 CD11b BUV737 B220 BV605 Ly6C   FileName FileNo
##                <num>       <num>      <num>     <fctr>  <int>
##      1:      627.259   17763.600    128.934  Mock_01_A      1
##      2:      304.615   11834.800    907.365  Mock_01_A      1
##      3:     4740.710     185.947   3915.820  Mock_01_A      1
##      4:     9440.380    1056.290   6221.670  Mock_01_A      1
##      5:     1636.110    4579.870    274.527  Mock_01_A      1
##     ---                                                      
## 159996:      127.068     258.936   6161.070 Virus_08_B     16
## 159997:      122.179     550.596    132.445 Virus_08_B     16
## 159998:    12091.600     121.659   4257.980 Virus_08_B     16
## 159999:    12904.000     338.244   1030.080 Virus_08_B     16
## 160000:    22242.300     789.639  12467.000 Virus_08_B     16
##         PECy5-5 CD3e_asinh PECy7 CD16-32_asinh DL800 Ly6G_asinh
##                      <num>               <num>            <num>
##      1:          0.1030979           1.6392988       -1.1159989
##      2:          0.1174943          -0.2782380       -0.5114826
##      3:          0.1197291           2.1835405       -0.6723598
##      4:          0.5926148           2.3251843       -0.7915047
##      5:         -0.1653241           1.8818839       -1.6097071
##     ---                                                        
## 159996:          0.2112023           1.9157040       -0.2278085
## 159997:          1.9658468           0.1304844       -1.2947039
## 159998:         -0.3134175           3.0680563        4.2423561
## 159999:         -0.1250411           2.5345014        2.0660180
## 160000:          1.0796017           2.0905763        0.4891206
##         AF700 CD45_asinh APCCy7 CD48_asinh BUV395 CD11b_asinh BUV737 B220_asinh
##                    <num>             <num>              <num>             <num>
##      1:        1.9669670         2.3510881          1.0504123         4.2636438
##      2:        1.6006640         0.7216390          0.5767234         3.8577846
##      3:        1.9247953         1.7806527          2.9452507         0.3638149
##      4:        0.8500738         1.9977594          3.6319912         1.4928782
##      5:        0.9703694         2.7630370          1.9011864         2.9109315
##     ---                                                                        
## 159996:        2.1598848         3.5048055          0.2514770         0.4971396
## 159997:        4.3648336         3.4509812          0.2419893         0.9511485
## 159998:        2.3428298        -0.6723679          3.8792326         0.2409789
## 159999:        1.8484549         0.9870066          3.9442068         0.6332984
## 160000:        1.9252352         2.1914959          4.4884166         1.2379493
##         BV605 Ly6C_asinh    Sample  Group  Batch
##                    <num>    <char> <char> <char>
##      1:        0.2550924 Mock_01_A   Mock      A
##      2:        1.3575720 Mock_01_A   Mock      A
##      3:        2.7553704 Mock_01_A   Mock      A
##      4:        3.2159434 Mock_01_A   Mock      A
##      5:        0.5246514 Mock_01_A   Mock      A
##     ---                                         
## 159996:        3.2061873  WNV_08_B  Virus      B
## 159997:        0.2618862  WNV_08_B  Virus      B
## 159998:        2.8385188  WNV_08_B  Virus      B
## 159999:        1.4702206  WNV_08_B  Virus      B
## 160000:        3.9097814  WNV_08_B  Virus      B

Setting preferences

Check the column names.

    ### Define cellular columns

        as.matrix(names(cell.dat))

##       [,1]                 
##  [1,] "PECy5-5 CD3e"       
##  [2,] "PECy7 CD16-32"      
##  [3,] "DL800 Ly6G"         
##  [4,] "AF700 CD45"         
##  [5,] "APCCy7 CD48"        
##  [6,] "BUV395 CD11b"       
##  [7,] "BUV737 B220"        
##  [8,] "BV605 Ly6C"         
##  [9,] "FileName"           
## [10,] "FileNo"             
## [11,] "PECy5-5 CD3e_asinh" 
## [12,] "PECy7 CD16-32_asinh"
## [13,] "DL800 Ly6G_asinh"   
## [14,] "AF700 CD45_asinh"   
## [15,] "APCCy7 CD48_asinh"  
## [16,] "BUV395 CD11b_asinh" 
## [17,] "BUV737 B220_asinh"  
## [18,] "BV605 Ly6C_asinh"   
## [19,] "Sample"             
## [20,] "Group"              
## [21,] "Batch"

Specify columns that represent transformed cellular features (in this case, the arcsinh transformed data, defined by “_asinh”). In this case, columns 11 to 18.

        cellular.cols <- names(cell.dat)[c(11:18)]
        as.matrix(cellular.cols)

##      [,1]                 
## [1,] "PECy5-5 CD3e_asinh" 
## [2,] "PECy7 CD16-32_asinh"
## [3,] "DL800 Ly6G_asinh"   
## [4,] "AF700 CD45_asinh"   
## [5,] "APCCy7 CD48_asinh"  
## [6,] "BUV395 CD11b_asinh" 
## [7,] "BUV737 B220_asinh"  
## [8,] "BV605 Ly6C_asinh"

Additionally, specify the columns that will be used to generate cluster and tSNE/UMAP results. Columns that are not specified here will still be analysed, but won’t contributed to the generation of clusters. There are a couple of strategies to take here: use all cellular columns for clustering to looks for all possible cell types/states, or use only stably expressed markers to cluster stable phenotypes, which can then be examined for changes in more dynamic markers. For more guidance, see this page.

    ### Define clustering columns    
        
        as.matrix(names(cell.dat))

##       [,1]                 
##  [1,] "PECy5-5 CD3e"       
##  [2,] "PECy7 CD16-32"      
##  [3,] "DL800 Ly6G"         
##  [4,] "AF700 CD45"         
##  [5,] "APCCy7 CD48"        
##  [6,] "BUV395 CD11b"       
##  [7,] "BUV737 B220"        
##  [8,] "BV605 Ly6C"         
##  [9,] "FileName"           
## [10,] "FileNo"             
## [11,] "PECy5-5 CD3e_asinh" 
## [12,] "PECy7 CD16-32_asinh"
## [13,] "DL800 Ly6G_asinh"   
## [14,] "AF700 CD45_asinh"   
## [15,] "APCCy7 CD48_asinh"  
## [16,] "BUV395 CD11b_asinh" 
## [17,] "BUV737 B220_asinh"  
## [18,] "BV605 Ly6C_asinh"   
## [19,] "Sample"             
## [20,] "Group"              
## [21,] "Batch"

        cluster.cols <- names(cell.dat)[c(11:18)]
        as.matrix(cluster.cols)

##      [,1]                 
## [1,] "PECy5-5 CD3e_asinh" 
## [2,] "PECy7 CD16-32_asinh"
## [3,] "DL800 Ly6G_asinh"   
## [4,] "AF700 CD45_asinh"   
## [5,] "APCCy7 CD48_asinh"  
## [6,] "BUV395 CD11b_asinh" 
## [7,] "BUV737 B220_asinh"  
## [8,] "BV605 Ly6C_asinh"

Here we can also specify sample, group, and batch columns.

    ### Define other key columns
        
        as.matrix(names(cell.dat))

##       [,1]                 
##  [1,] "PECy5-5 CD3e"       
##  [2,] "PECy7 CD16-32"      
##  [3,] "DL800 Ly6G"         
##  [4,] "AF700 CD45"         
##  [5,] "APCCy7 CD48"        
##  [6,] "BUV395 CD11b"       
##  [7,] "BUV737 B220"        
##  [8,] "BV605 Ly6C"         
##  [9,] "FileName"           
## [10,] "FileNo"             
## [11,] "PECy5-5 CD3e_asinh" 
## [12,] "PECy7 CD16-32_asinh"
## [13,] "DL800 Ly6G_asinh"   
## [14,] "AF700 CD45_asinh"   
## [15,] "APCCy7 CD48_asinh"  
## [16,] "BUV395 CD11b_asinh" 
## [17,] "BUV737 B220_asinh"  
## [18,] "BV605 Ly6C_asinh"   
## [19,] "Sample"             
## [20,] "Group"              
## [21,] "Batch"

        exp.name <- "BM experiment"
        
        sample.col <- "Sample"
        group.col <- "Group"
        batch.col <- "Batch"

Set downsampling targets for visualisation

Additionally, we want to specify the downsample targets for dimensionality reduction. This influences how many cells will be shown on a tSNE/UMAP plot, and we are specifying the number of cells per group to downsample to. Check for the number of cells (rows) in each group.

    ### Subsample targets per group

        data.frame(table(cell.dat[[group.col]])) # Check number of cells per sample.

##    Var1  Freq
## 1  Mock 80000
## 2 Virus 80000

You can then specify the number to downsample to in each group. These must be lower than the total number of cells in each group, and must be provided in the order that the group names appear in the dataset when running as.matrix(unique(cell.dat[[group.col]])).

In this example we want 10000 cells from ‘mock’ and 10000 cells from ‘Virus’ so that we have balanced numbers visualised between each group.

        as.matrix(unique(cell.dat[[group.col]]))

##      [,1]   
## [1,] "Mock" 
## [2,] "Virus"

        sub.targets <- c(10000, 10000) # target subsample numbers from each group
        sub.targets

## [1] 10000 10000

5. Batch alignment

Setup

Here we need to setup some preferences for batch alignment.

    ### Extract reference samples
        
        sample.info

##           FileName    Sample  Group  Batch
##             <char>    <char> <char> <char>
##  1:  Mock_01_A.csv Mock_01_A   Mock      A
##  2:  Mock_02_A.csv Mock_02_A   Mock      A
##  3:  Mock_03_A.csv Mock_03_A   Mock      A
##  4:  Mock_04_A.csv Mock_04_A   Mock      A
##  5: Virus_01_A.csv  WNV_01_A  Virus      A
##  6: Virus_02_A.csv  WNV_02_A  Virus      A
##  7: Virus_03_A.csv  WNV_03_A  Virus      A
##  8: Virus_04_A.csv  WNV_04_A  Virus      A
##  9:  Mock_05_B.csv Mock_05_B   Mock      B
## 10:  Mock_06_B.csv Mock_06_B   Mock      B
## 11:  Mock_07_B.csv Mock_07_B   Mock      B
## 12:  Mock_08_B.csv Mock_08_B   Mock      B
## 13: Virus_05_B.csv  WNV_05_B  Virus      B
## 14: Virus_06_B.csv  WNV_06_B  Virus      B
## 15: Virus_07_B.csv  WNV_07_B  Virus      B
## 16: Virus_08_B.csv  WNV_08_B  Virus      B

In this case we need to select which samples will serve as our ‘reference’ samples. These should be individual aliquots from a single biological samples that have been stain/processed in each batch (e.g. cryopreserved PBMCs from a single blood collection event from a single individual). In this example, we have selected one bone marrow sample from a mock infected mice in each batch.

        as.matrix(unique(cell.dat[[sample.col]]))

##       [,1]       
##  [1,] "Mock_01_A"
##  [2,] "Mock_02_A"
##  [3,] "Mock_03_A"
##  [4,] "Mock_04_A"
##  [5,] "Mock_05_B"
##  [6,] "Mock_06_B"
##  [7,] "Mock_07_B"
##  [8,] "Mock_08_B"
##  [9,] "WNV_01_A" 
## [10,] "WNV_02_A" 
## [11,] "WNV_03_A" 
## [12,] "WNV_04_A" 
## [13,] "WNV_05_B" 
## [14,] "WNV_06_B" 
## [15,] "WNV_07_B" 
## [16,] "WNV_08_B"

Here we are selecting the two samples: Mock_01_A, from batch A and Mock_05_B from batch B, which will be our reference samples.

        refs <- unique(cell.dat[[sample.col]])[c(1,5)]
        refs

## [1] "Mock_01_A" "Mock_05_B"

Now we can create a new dataset (ref.dat) containing only cells from the reference samples (in this example, Mock_01_A from batch A, and Mock_05_B from batch B)

        ref.dat <- do.filter(cell.dat, sample.col, refs)
        ref.dat

##        PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48
##               <num>         <num>      <num>      <num>       <num>
##     1:     51.64030      1239.360  -681.2570   1752.270    2600.430
##     2:     58.88240      -140.921  -267.0390   1188.640     392.962
##     3:     60.00770      2191.260  -362.0880   1676.960    1441.300
##     4:    313.95800      2532.700  -438.3880    478.109    1809.220
##     5:    -83.03910      1603.390 -1200.3500    564.994    3946.200
##    ---                                                             
## 19996:    -30.65640       158.361 -1988.0000   5905.010    8831.280
## 19997:    148.32100      5819.820 15918.1000   2497.000     751.419
## 19998:    669.96300      3373.540   -33.8201   1144.100    1320.440
## 19999:    320.50700      4386.740  7465.1800   2107.970    -134.778
## 20000:     -9.93316      5929.620 11076.1000   1861.070     267.681
##        BUV395 CD11b BUV737 B220 BV605 Ly6C  FileName FileNo PECy5-5 CD3e_asinh
##               <num>       <num>      <num>    <fctr>  <int>              <num>
##     1:     627.2590  17763.6000    128.934 Mock_01_A      1         0.10309786
##     2:     304.6150  11834.8000    907.365 Mock_01_A      1         0.11749428
##     3:    4740.7100    185.9470   3915.820 Mock_01_A      1         0.11972914
##     4:    9440.3800   1056.2900   6221.670 Mock_01_A      1         0.59261483
##     5:    1636.1100   4579.8700    274.527 Mock_01_A      1        -0.16532406
##    ---                                                                        
## 19996:     -77.0832   9662.6800    568.947 Mock_05_B      5        -0.06127445
## 19997:   13401.7000    -91.9106   2209.480 Mock_05_B      5         0.29245518
## 19998:    4762.0700    781.4270   4849.040 Mock_05_B      5         1.10256164
## 19999:   15413.4000    493.2230   1767.920 Mock_05_B      5         0.60367460
## 20000:   11084.1000     50.4877   1180.210 Mock_05_B      5        -0.01986501
##        PECy7 CD16-32_asinh DL800 Ly6G_asinh AF700 CD45_asinh APCCy7 CD48_asinh
##                      <num>            <num>            <num>             <num>
##     1:           1.6392988      -1.11599889        1.9669670         2.3510881
##     2:          -0.2782380      -0.51148258        1.6006640         0.7216390
##     3:           2.1835405      -0.67235984        1.9247953         1.7806527
##     4:           2.3251843      -0.79150471        0.8500738         1.9977594
##     5:           1.8818839      -1.60970711        0.9703694         2.7630370
##    ---                                                                        
## 19996:           0.3116525      -2.08887520        3.1638831         3.5653947
## 19997:           3.1494039       4.15399776        2.3112610         1.1963364
## 19998:           2.6077040      -0.06758873        1.5655634         1.6983185
## 19999:           2.8681129       3.39766352        2.1457973        -0.2663940
## 20000:           3.1680273       3.79159306        2.0250211         0.5126149
##        BUV395 CD11b_asinh BUV737 B220_asinh BV605 Ly6C_asinh    Sample  Group
##                     <num>             <num>            <num>    <char> <char>
##     1:          1.0504123         4.2636438        0.2550924 Mock_01_A   Mock
##     2:          0.5767234         3.8577846        1.3575720 Mock_01_A   Mock
##     3:          2.9452507         0.3638149        2.7553704 Mock_01_A   Mock
##     4:          3.6319912         1.4928782        3.2159434 Mock_01_A   Mock
##     5:          1.9011864         2.9109315        0.5246514 Mock_01_A   Mock
##    ---                                                                       
## 19996:         -0.1535622         3.6552341        0.9755986 Mock_05_B   Mock
## 19997:          3.9820237        -0.1828014        2.1916151 Mock_05_B   Mock
## 19998:          2.9497215         1.2291302        2.9677227 Mock_05_B   Mock
## 19999:          4.1217946         0.8717569        1.9755202 Mock_05_B   Mock
## 20000:          3.7923143         0.1008046        1.5941070 Mock_05_B   Mock
##         Batch
##        <char>
##     1:      A
##     2:      A
##     3:      A
##     4:      A
##     5:      A
##    ---       
## 19996:      B
## 19997:      B
## 19998:      B
## 19999:      B
## 20000:      B

Initial clustering

First, we are going to align the data of only the two reference samples.

Setup a sub-directory for some pre-alignment plots.

    ### Initial clustering
        
        setwd(OutputDirectory)
        dir.create("Output 2 - alignment")

## Warning in dir.create("Output 2 - alignment"): 'Output 2 - alignment' already
## exists

        setwd("Output 2 - alignment")

Run prep.cytonorm to initialise a cytonorm object, including clustering with FlowSOM. The number of metaclustes will be determined automatically, or you can specify a desired number of metaclusters by using the additional argument meta.k (e.g. meta.k = 20).

        cytnrm <- prep.cytonorm(dat = ref.dat, 
                                cellular.cols = cellular.cols, 
                                cluster.cols = cluster.cols, 
                                batch.col = batch.col, 
                                sample.col = sample.col)

## Loading required package: CytoNorm

## Loading required package: Biobase

## Loading required package: BiocGenerics

## 
## Attaching package: 'BiocGenerics'

## The following object is masked from 'package:gridExtra':
## 
##     combine

## The following objects are masked from 'package:igraph':
## 
##     normalize, path, union

## The following object is masked from 'package:flowCore':
## 
##     normalize

## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs

## The following objects are masked from 'package:base':
## 
##     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
##     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
##     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
##     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
##     Position, rank, rbind, Reduce, rownames, sapply, setdiff, table,
##     tapply, union, unique, unsplit, which.max, which.min

## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.

## Working directory is '/Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm'

## Step 1/3. Mapping data

## Step 2/3. Merging data

## Step 3/3. Returning data

## Step 1/4 - Splitting files for use with original FlowSOM function

## Step 2/4 - Running FlowSOM

## Reading A.fcs

## Reading B.fcs

## Step 3/4 - Preparing FlowSOM object and results data.table

## Step 4/4 - FlowSOM preparation for alignment model complete

        cytnrm

## $fsom
## FlowSOM model trained on 20000 cells and 8 markers, 
##  using a 5x5 grid (25 clusters) and 10 metaclusters.
## 
## Markers used:  column PECy5-5 CD3e_asinh from dataset <PECy5-5 CD3e_asinh>, column PECy7 CD16-32_asinh from dataset <PECy7 CD16-32_asinh>, column DL800 Ly6G_asinh from dataset <DL800 Ly6G_asinh>, column AF700 CD45_asinh from dataset <AF700 CD45_asinh>, column APCCy7 CD48_asinh from dataset <APCCy7 CD48_asinh>, column BUV395 CD11b_asinh from dataset <BUV395 CD11b_asinh>, column BUV737 B220_asinh from dataset <BUV737 B220_asinh>, column BV605 Ly6C_asinh from dataset <BV605 Ly6C_asinh>
## 
## Metacluster cell count:
##  MC 1  MC 2  MC 3  MC 4  MC 5  MC 6  MC 7  MC 8  MC 9 MC 10 
##  5886   455    83  7267  2389   878   331  1708   717   286 
## 
##  168 cells (0.84%) are further than 4 MAD from their cluster center.
## $dt
##        PECy5-5 CD3e_asinh PECy7 CD16-32_asinh DL800 Ly6G_asinh AF700 CD45_asinh
##                     <num>               <num>            <num>            <num>
##     1:       -0.073653787         -0.06836993      -0.71957141       -0.2601342
##     2:        0.196819022          1.04051566       0.03595166        0.5919558
##     3:       -0.148785651          2.90569115       3.45313573        1.2724129
##     4:        0.189247936          3.06170583       4.11895704        1.9266738
##     5:       -0.306349635          0.44284853       0.42922080        0.4019526
##    ---                                                                         
## 19996:        2.784854412          0.39633411      -0.98542446        3.4850731
## 19997:        0.007048962          3.26623678       3.83651400        2.1971622
## 19998:       -0.144241706          1.14419019       0.08433798        1.8859901
## 19999:        0.136786833          2.73393393       2.65572119        1.3384684
## 20000:        0.101986513          3.28912425       3.33585572        2.1958396
##        APCCy7 CD48_asinh BUV395 CD11b_asinh BUV737 B220_asinh BV605 Ly6C_asinh
##                    <num>              <num>             <num>            <num>
##     1:         0.6676006        0.455060869       0.191696376       -0.3804775
##     2:         0.7900302       -0.004049649       4.381132603       -0.2398050
##     3:         0.9693112        3.634680748       0.466979593        2.6836505
##     4:         0.4057303        4.441645622       0.637388706        3.1590574
##     5:        -0.5365349        1.400961280      -0.298591286       -0.1681118
##    ---                                                                        
## 19996:         1.9617901        0.664981723      -0.590357363        3.6845517
## 19997:         0.5892904        4.073036194      -0.006483534        2.1069219
## 19998:         3.1991692       -0.560110509       3.579796553       -0.1521181
## 19999:         0.7387957        4.001876354       0.324388862        1.7045259
## 20000:         1.4960654        4.071680546       0.229720220        1.9134706
##         File File_scattered Original_ID prep.fsom.cluster prep.fsom.metacluster
##        <num>          <num>       <num>             <num>                 <num>
##     1:     1      1.0370497        2369                17                     7
##     2:     1      0.8859173        5273                12                     1
##     3:     1      0.9769963        9290                15                     4
##     4:     1      1.1308356        1252                 9                     4
##     5:     1      0.9924445        8826                17                     7
##    ---                                                                         
## 19996:     2      1.9423792        6029                21                     9
## 19997:     2      2.0499919        6970                 4                     4
## 19998:     2      1.9680801        8165                11                     1
## 19999:     2      1.9309787        2143                15                     4
## 20000:     2      1.9343390        3821                 4                     4
## 
## $cellular.cols
## [1] "PECy5-5 CD3e_asinh"  "PECy7 CD16-32_asinh" "DL800 Ly6G_asinh"   
## [4] "AF700 CD45_asinh"    "APCCy7 CD48_asinh"   "BUV395 CD11b_asinh" 
## [7] "BUV737 B220_asinh"   "BV605 Ly6C_asinh"   
## 
## $cluster.cols
## [1] "PECy5-5 CD3e_asinh"  "PECy7 CD16-32_asinh" "DL800 Ly6G_asinh"   
## [4] "AF700 CD45_asinh"    "APCCy7 CD48_asinh"   "BUV395 CD11b_asinh" 
## [7] "BUV737 B220_asinh"   "BV605 Ly6C_asinh"   
## 
## $files
## [1] "A" "B"
## 
## $file.nums
## [1] 1 2

Once this is complete, we can subsample the dataset and plot it using UMAP.

        cytnrm.sub <- do.subsample(cytnrm$dt, 10000)
        cytnrm.sub <- run.umap(cytnrm.sub, use.cols = cluster.cols)

## [2024-05-23 15:43:15.585413]  starting umap

## [2024-05-23 15:43:15.60073]  creating graph of nearest neighbors

## [2024-05-23 15:43:27.204277]  creating initial embedding

## [2024-05-23 15:43:27.74285]  optimizing embedding

## [2024-05-23 15:43:39.423743]  done

Create a subfolder for plots.

    ### Initial clustering plots
        
        setwd(OutputDirectory)
        setwd("Output 2 - alignment")
        dir.create("1 - ref pre-alignment")

## Warning in dir.create("1 - ref pre-alignment"): '1 - ref pre-alignment' already
## exists

        setwd("1 - ref pre-alignment")

First, we will colour the cells by which batch they originated from. We can see batch effects evident by the offset between the red and green cells. In this case they are coloured by ‘File’.

        make.colour.plot(cytnrm.sub, 'UMAP_X', 'UMAP_Y', 'File', 'factor')

Then, we need to examine the cellular expression across these cells.

        make.multi.plot(cytnrm.sub, 'UMAP_X', 'UMAP_Y', cellular.cols)

## Check your working directory for a new .png called 'Multi plot.png'

Finally, we need to examine the metaclusters that have been generated, and confirm that they capture matching populations across batches (e.g. in this example, metacluster 1 captures B220+ B cells from both batches. If too many or too few metaclusters are generated, you can try running the prep.cytonorm function again, and specify a desired number of metaclusters using the meta.k argument.

        make.colour.plot(cytnrm.sub, 'UMAP_X', 'UMAP_Y', 'prep.fsom.metacluster', 'factor', add.label = TRUE)

You can see which file number corresponds to each batch by running cytnrm$files and cytnrm$file.nums.

        cytnrm$files

## [1] "A" "B"

        cytnrm$file.nums

## [1] 1 2

If the metaclusters are suitable, we will proceed to actually performing alignment. Because the train.cytonorm and run.cytonorm functions involve the writing and reading of files, we will set the directory beforehand.

Full alignment

Set a directory for the outputs

    ### Alignment
        
        setwd(OutputDirectory)
        dir.create("Output 2 - alignment")

## Warning in dir.create("Output 2 - alignment"): 'Output 2 - alignment' already
## exists

        setwd("Output 2 - alignment")

Firstly, we need to train the model that will perform quantile alignment for each marker on each metacluster.

        cytnrm <- train.cytonorm(model = cytnrm, align.cols = cellular.cols)

## Training alignment - setup

## Working directory is '/Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm'

## Training alignment - file (batch) preparation

## Training alignment - file (batch) and metacluster splitting

##  -- running File '1'

##  -- running File '2'

## Training alignment - learning conversions

## Processing cluster 1

## Computing Quantiles

##   A (FileID 1)

##

##   B (FileID 2)

##

## Computing Splines

##   A

##   B

## Processing cluster 2

## Computing Quantiles

##   A (FileID 1)

##

##   B (FileID 2)

##

## Computing Splines

##   A

##   B

## Processing cluster 3

## Computing Quantiles

##   A (FileID 1)

##

##   Less then 50 cells in A (34). No quantiles computed.

##   B (FileID 2)

##

##   Less then 50 cells in B (49). No quantiles computed.

## Computing Splines

##   A

## Warning in (function (files, labels, channels, transformList, nQ = 101, : Not enough cells for A
## The identity function will be used.

##   B

## Warning in (function (files, labels, channels, transformList, nQ = 101, : Not enough cells for B
## The identity function will be used.

## Processing cluster 4

## Computing Quantiles

##   A (FileID 1)

##

##   B (FileID 2)

##

## Computing Splines

##   A

##   B

## Processing cluster 5

## Computing Quantiles

##   A (FileID 1)

##

##   B (FileID 2)

##

## Computing Splines

##   A

##   B

## Processing cluster 6

## Computing Quantiles

##   A (FileID 1)

##

##   B (FileID 2)

##

## Computing Splines

##   A

##   B

## Processing cluster 7

## Computing Quantiles

##   A (FileID 1)

##

##   B (FileID 2)

##

## Computing Splines

##   A

##   B

## Processing cluster 8

## Computing Quantiles

##   A (FileID 1)

##

##   B (FileID 2)

##

## Computing Splines

##   A

##   B

## Processing cluster 9

## Computing Quantiles

##   A (FileID 1)

##

##   B (FileID 2)

##

## Computing Splines

##   A

##   B

## Processing cluster 10

## Computing Quantiles

##   A (FileID 1)

##

##   B (FileID 2)

##

## Computing Splines

##   A

##   B

## Training alignment - cleanup

## Training alignment - training complete

Once complete, this model can be used to align the full dataset.

        cell.dat <- run.cytonorm(dat = cell.dat, model = cytnrm, batch.col = batch.col)

## Working directory is '/Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm'

## Alignment -- splitting files (batches)

## Alignment -- processing files (batches)

## -- Splitting A.fcs

## Warning in dir.create("tmp-target"): 'tmp-target' already exists

## Warning in FlowSOM::NewData(model$fsom, ff): 1258 cells (1.57%) seem far from
## their cluster centers.

## -- Splitting B.fcs

## Warning in dir.create("tmp-target"): 'tmp-target' already exists

## Warning in FlowSOM::NewData(model$fsom, ff): 1648 cells (2.06%) seem far from
## their cluster centers.

## Alignment -- aligning data

## -- Processing cluster 1

##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom1.fcs (A)

## Normalizing A

##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom1.fcs (B)

## Normalizing B

## -- Processing cluster 2

##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom2.fcs (A)

## Normalizing A

##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom2.fcs (B)

## Normalizing B

## -- Processing cluster 3

##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom3.fcs (A)

## Normalizing A

##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom3.fcs (B)

## Normalizing B

## -- Processing cluster 4

##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom4.fcs (A)

## Normalizing A

##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom4.fcs (B)

## Normalizing B

## -- Processing cluster 5

##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom5.fcs (A)

## Normalizing A

##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom5.fcs (B)

## Normalizing B

## -- Processing cluster 6

##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom6.fcs (A)

## Normalizing A

##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom6.fcs (B)

## Normalizing B

## -- Processing cluster 7

##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom7.fcs (A)

## Normalizing A

##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom7.fcs (B)

## Normalizing B

## -- Processing cluster 8

##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom8.fcs (A)

## Normalizing A

##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom8.fcs (B)

## Normalizing B

## -- Processing cluster 9

##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom9.fcs (A)

## Normalizing A

##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom9.fcs (B)

## Normalizing B

## -- Processing cluster 10

##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom10.fcs (A)

## Normalizing A

##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom10.fcs (B)

## Normalizing B

## Alignment -- merging aligned data for each metacluster

## Alignment -- creating final data.table

## Alignment -- complete

Review alignment

To examine the results, we first need to specify the cellular names containing the aligned data (they will have ’_aligned’ appended to the end).

        aligned.cols <- paste0(cellular.cols, '_aligned')

Then we can set a new output directory.

    ### Plotting reference data
        
        setwd(OutputDirectory)
        setwd("Output 2 - alignment")
        dir.create("2 - ref aligned")

## Warning in dir.create("2 - ref aligned"): '2 - ref aligned' already exists

        setwd("2 - ref aligned")

First, we will examine just the reference samples, to see if the alignment looks suitable. We will run UMAP using the new aligned data.

        ref.sub <- do.filter(cell.dat, sample.col, refs)
        ref.sub

##        PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48
##               <num>         <num>      <num>      <num>       <num>
##     1:     51.64030      1239.360  -681.2570   1752.270    2600.430
##     2:     58.88240      -140.921  -267.0390   1188.640     392.962
##     3:     60.00770      2191.260  -362.0880   1676.960    1441.300
##     4:    313.95800      2532.700  -438.3880    478.109    1809.220
##     5:    -83.03910      1603.390 -1200.3500    564.994    3946.200
##    ---                                                             
## 19996:    -30.65640       158.361 -1988.0000   5905.010    8831.280
## 19997:    148.32100      5819.820 15918.1000   2497.000     751.419
## 19998:    669.96300      3373.540   -33.8201   1144.100    1320.440
## 19999:    320.50700      4386.740  7465.1800   2107.970    -134.778
## 20000:     -9.93316      5929.620 11076.1000   1861.070     267.681
##        BUV395 CD11b BUV737 B220 BV605 Ly6C  FileName FileNo PECy5-5 CD3e_asinh
##               <num>       <num>      <num>    <fctr>  <int>              <num>
##     1:     627.2590  17763.6000    128.934 Mock_01_A      1         0.10309786
##     2:     304.6150  11834.8000    907.365 Mock_01_A      1         0.11749428
##     3:    4740.7100    185.9470   3915.820 Mock_01_A      1         0.11972914
##     4:    9440.3800   1056.2900   6221.670 Mock_01_A      1         0.59261483
##     5:    1636.1100   4579.8700    274.527 Mock_01_A      1        -0.16532406
##    ---                                                                        
## 19996:     -77.0832   9662.6800    568.947 Mock_05_B      5        -0.06127445
## 19997:   13401.7000    -91.9106   2209.480 Mock_05_B      5         0.29245518
## 19998:    4762.0700    781.4270   4849.040 Mock_05_B      5         1.10256164
## 19999:   15413.4000    493.2230   1767.920 Mock_05_B      5         0.60367460
## 20000:   11084.1000     50.4877   1180.210 Mock_05_B      5        -0.01986501
##        PECy7 CD16-32_asinh DL800 Ly6G_asinh AF700 CD45_asinh APCCy7 CD48_asinh
##                      <num>            <num>            <num>             <num>
##     1:           1.6392988      -1.11599889        1.9669670         2.3510881
##     2:          -0.2782380      -0.51148258        1.6006640         0.7216390
##     3:           2.1835405      -0.67235984        1.9247953         1.7806527
##     4:           2.3251843      -0.79150471        0.8500738         1.9977594
##     5:           1.8818839      -1.60970711        0.9703694         2.7630370
##    ---                                                                        
## 19996:           0.3116525      -2.08887520        3.1638831         3.5653947
## 19997:           3.1494039       4.15399776        2.3112610         1.1963364
## 19998:           2.6077040      -0.06758873        1.5655634         1.6983185
## 19999:           2.8681129       3.39766352        2.1457973        -0.2663940
## 20000:           3.1680273       3.79159306        2.0250211         0.5126149
##        BUV395 CD11b_asinh BUV737 B220_asinh BV605 Ly6C_asinh    Sample  Group
##                     <num>             <num>            <num>    <char> <char>
##     1:          1.0504123         4.2636438        0.2550924 Mock_01_A   Mock
##     2:          0.5767234         3.8577846        1.3575720 Mock_01_A   Mock
##     3:          2.9452507         0.3638149        2.7553704 Mock_01_A   Mock
##     4:          3.6319912         1.4928782        3.2159434 Mock_01_A   Mock
##     5:          1.9011864         2.9109315        0.5246514 Mock_01_A   Mock
##    ---                                                                       
## 19996:         -0.1535622         3.6552341        0.9755986 Mock_05_B   Mock
## 19997:          3.9820237        -0.1828014        2.1916151 Mock_05_B   Mock
## 19998:          2.9497215         1.2291302        2.9677227 Mock_05_B   Mock
## 19999:          4.1217946         0.8717569        1.9755202 Mock_05_B   Mock
## 20000:          3.7923143         0.1008046        1.5941070 Mock_05_B   Mock
##         Batch PECy5-5 CD3e_asinh_aligned PECy7 CD16-32_asinh_aligned
##        <char>                      <num>                       <num>
##     1:      A                 0.07550371                   1.6243541
##     2:      A                 0.09086002                  -0.4109158
##     3:      A                 0.19667186                   2.3359835
##     4:      A                 0.78801227                   2.4700482
##     5:      A                -0.20452863                   1.8813063
##    ---                                                              
## 19996:      B                -0.02770719                   0.3994900
## 19997:      B                 0.25879997                   3.0681801
## 19998:      B                 0.89016247                   2.4627149
## 19999:      B                 0.56146991                   2.7699685
## 20000:      B                -0.04108953                   3.0901837
##        DL800 Ly6G_asinh_aligned AF700 CD45_asinh_aligned
##                           <num>                    <num>
##     1:               -0.9853218                 2.260301
##     2:               -0.5260243                 1.783121
##     3:               -0.5778462                 1.962701
##     4:               -0.6863875                 1.103076
##     5:               -1.3787018                 1.213712
##    ---                                                  
## 19996:               -3.0485964                 2.892127
## 19997:                4.0582747                 2.050607
## 19998:               -0.1113533                 1.328649
## 19999:                3.2538199                 1.860001
## 20000:                3.6525168                 1.720933
##        APCCy7 CD48_asinh_aligned BUV395 CD11b_asinh_aligned
##                            <num>                      <num>
##     1:                 2.5360262                 0.45637840
##     2:                 1.1129812                 0.01138196
##     3:                 1.8322810                 2.99106073
##     4:                 2.0085123                 3.73671436
##     5:                 2.9657056                 1.33816469
##    ---                                                     
## 19996:                 3.3363132                 0.44203448
## 19997:                 1.0153860                 3.88138986
## 19998:                 1.5865602                 2.91923833
## 19999:                -0.4603302                 4.03672075
## 20000:                 0.2559606                 3.69683123
##        BUV737 B220_asinh_aligned BV605 Ly6C_asinh_aligned Alignment_MC_aligned
##                            <num>                    <num>                <num>
##     1:                4.10898304                0.1759336                    1
##     2:                3.71863341                1.2138813                    1
##     3:                0.49721161                2.6361940                    5
##     4:                1.55262542                3.0890400                    5
##     5:                2.83306122                0.4209961                    1
##    ---                                                                        
## 19996:                3.82389307                1.1142409                    1
## 19997:               -0.20893200                2.4836495                    4
## 19998:                1.05263329                3.0943832                    5
## 19999:                0.82943606                2.2702370                    4
## 20000:                0.07291784                1.8810067                    4

        ref.sub <- do.subsample(ref.sub, 20000)
        ref.sub <- run.umap(ref.sub, use.cols = aligned.cols)

## [2024-05-23 15:43:49.365568]  starting umap

## [2024-05-23 15:43:49.379667]  creating graph of nearest neighbors

## [2024-05-23 15:44:15.804667]  creating initial embedding

## [2024-05-23 15:44:17.118676]  optimizing embedding

## [2024-05-23 15:44:39.803111]  done

Plotting the data by batch reveals that the two batches are now well integrated with eachother.

        make.colour.plot(ref.sub, 'UMAP_X', 'UMAP_Y', batch.col, 'factor')

We can also plot the data by metacluster number, to confirm that they look suitable (these are the metaclusters created when we made the alignment model).

        make.colour.plot(ref.sub, 'UMAP_X', 'UMAP_Y', 'Alignment_MC_aligned', 'factor', add.label = TRUE)

We can also colour the data by the exact samples (should be one per batch).

        make.colour.plot(ref.sub, 'UMAP_X', 'UMAP_Y', sample.col, 'factor')

As a sanity check, we can also check the experimental groups that these cells come from. The reference samples should ideally all be from a single experimental group, so if there are more than one group in the plot below, there may be a problem with the setup.

        make.colour.plot(ref.sub, 'UMAP_X', 'UMAP_Y', group.col, 'factor')

If the alignment looks suitable, we can then run UMAP and make plots using a subset of the whole dataset.

    ### Plotting all data
        
        setwd(OutputDirectory)
        setwd("Output 2 - alignment")
        dir.create("3 - all aligned")

## Warning in dir.create("3 - all aligned"): '3 - all aligned' already exists

        setwd("3 - all aligned")
        
        aligned.sub <- do.subsample(cell.dat, 50000)
        aligned.sub <- run.umap(aligned.sub, use.cols = aligned.cols)

## [2024-05-23 15:44:43.279394]  starting umap

## [2024-05-23 15:44:43.293578]  creating graph of nearest neighbors

## [2024-05-23 15:46:01.775045]  creating initial embedding

## [2024-05-23 15:46:05.024758]  optimizing embedding

## [2024-05-23 15:47:10.139803]  done

Firstly, we can examine the distribution of cells from each batch.

        make.colour.plot(aligned.sub, 'UMAP_X', 'UMAP_Y', batch.col, 'factor')

We can also check the metacluster assignments.

        make.colour.plot(aligned.sub, 'UMAP_X', 'UMAP_Y', 'Alignment_MC_aligned', 'factor', add.label = TRUE)

We can then check the distribution of cells from each sample, and each group. In this case the different distribution of cells in each experimental group reflects biologicall relevant changes, and not batch effects.

        make.colour.plot(aligned.sub, 'UMAP_X', 'UMAP_Y', sample.col, 'factor')

        make.colour.plot(aligned.sub, 'UMAP_X', 'UMAP_Y', group.col, 'factor')

For checking later, we can save a copy of the subsetted data to disk as well.

        fwrite(aligned.sub, 'aligned.sub.csv')

6. Clustering and dimensionality reduction

Set a directory for outputs.

    setwd(OutputDirectory)
    dir.create("Output 3 - clustering")

## Warning in dir.create("Output 3 - clustering"): 'Output 3 - clustering' already
## exists

    setwd("Output 3 - clustering")

Adjust our saved column names so that we can perform clustering on the new ‘aligned’ data.

    ### Re-set cellular and clustering cols
    
        aligned.cellular.cols <- paste0(cellular.cols, '_aligned')
        aligned.cellular.cols

## [1] "PECy5-5 CD3e_asinh_aligned"  "PECy7 CD16-32_asinh_aligned"
## [3] "DL800 Ly6G_asinh_aligned"    "AF700 CD45_asinh_aligned"   
## [5] "APCCy7 CD48_asinh_aligned"   "BUV395 CD11b_asinh_aligned" 
## [7] "BUV737 B220_asinh_aligned"   "BV605 Ly6C_asinh_aligned"

        aligned.cluster.cols <- paste0(cluster.cols, '_aligned')
        aligned.cluster.cols

## [1] "PECy5-5 CD3e_asinh_aligned"  "PECy7 CD16-32_asinh_aligned"
## [3] "DL800 Ly6G_asinh_aligned"    "AF700 CD45_asinh_aligned"   
## [5] "APCCy7 CD48_asinh_aligned"   "BUV395 CD11b_asinh_aligned" 
## [7] "BUV737 B220_asinh_aligned"   "BV605 Ly6C_asinh_aligned"

We can run clustering using the run.flowsom function. In this case we can define the number of desired metaclusters manually, with the meta.k argument (in this case we have chosen 30). This can be increased or decreased as required. Typically, overclustering is preferred, as multiple clusters that represent a single cellular population can always be annotated as such. Subsequently, we can write the clustered dataset to disk.

    ### Clustering

        cell.dat <- run.flowsom(cell.dat, aligned.cluster.cols, meta.k = 30)

## Preparing data

## Starting FlowSOM

## Building SOM

## Mapping data to SOM

## Building MST

## Binding metacluster labels to starting dataset

## Binding cluster labels to starting dataset

        fwrite(cell.dat, "clustered.data.csv")

We can then run dimensionality reduction on a subset of the data, allow us to visualise the data and resulting clusters. In this case we have used run.umap, though other options are available, including run.fitsne and run.tsne. As before, this subsampled dataset with DR coordinates is saved to disk.

    ### Dimensionality reduction

        cell.sub <- do.subsample(cell.dat, sub.targets, group.col)
        cell.sub <- run.umap(cell.sub, aligned.cluster.cols)

## [2024-05-23 15:47:32.324913]  starting umap

## [2024-05-23 15:47:32.341717]  creating graph of nearest neighbors

## [2024-05-23 15:47:57.407]  creating initial embedding

## [2024-05-23 15:47:58.565374]  optimizing embedding

## [2024-05-23 15:48:22.906857]  done

        fwrite(cell.sub, "clustered.data.DR.csv")

We can visualise the DR data to asses which clusters represent cellular populations

    ### DR plots

        make.colour.plot(cell.sub, "UMAP_X", "UMAP_Y", "FlowSOM_metacluster", col.type = 'factor', add.label = TRUE)

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_label()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_label()`).

        make.multi.plot(cell.sub, "UMAP_X", "UMAP_Y", aligned.cellular.cols)

## Check your working directory for a new .png called 'Multi plot.png'

We can also generate some multi plots to compare between experimental groups or batches.

        make.multi.plot(cell.sub, "UMAP_X", "UMAP_Y", "FlowSOM_metacluster", group.col, col.type = 'factor')

## Check your working directory for a new .png called 'Multi plot divided by Group.png'

We can also produce expression heatmaps to help guide our interpretation of cluster identities.

    ### Expression heatmap

        exp <- do.aggregate(cell.dat, aligned.cellular.cols, by = "FlowSOM_metacluster")
        make.pheatmap(exp, "FlowSOM_metacluster", aligned.cellular.cols)

7. Annotate clusters

Set an output directory.

    setwd(OutputDirectory)
    dir.create("Output 4 - annotation")

## Warning in dir.create("Output 4 - annotation"): 'Output 4 - annotation' already
## exists

    setwd("Output 4 - annotation")

Review the cluster labels and marker expression patterns, so you can annotate the clusters. This annotation is optional, as all subsequent steps can be performed on the ‘clusters’ instead of the ‘populations’. Here we can create a list of population names, and then specify which clusters make up that population (e.g. Immature B cells are contained within cluster ‘6’).

    ### Annotate

        annots <- list("Mature neutrophils" = c(24,25,23,27,30),
                       "Immature neutrophils" = c(22),
                       "Monocytes" = c(29,26),
                       "T cells" = c(2,1,8,7),
                       "Mature B cells" = c(3,4,9,5,10,13),
                       "Immature B cells" = c(6)
        )

Once the annotation list is created, we can switch the list into a table format to annotate our data.

        annots <- do.list.switch(annots)
        names(annots) <- c("Values", "Population")
        setorderv(annots, 'Values')
        annots

##     Values           Population
##      <num>               <char>
##  1:      1              T cells
##  2:      2              T cells
##  3:      3       Mature B cells
##  4:      4       Mature B cells
##  5:      5       Mature B cells
##  6:      6     Immature B cells
##  7:      7              T cells
##  8:      8              T cells
##  9:      9       Mature B cells
## 10:     10       Mature B cells
## 11:     13       Mature B cells
## 12:     22 Immature neutrophils
## 13:     23   Mature neutrophils
## 14:     24   Mature neutrophils
## 15:     25   Mature neutrophils
## 16:     26            Monocytes
## 17:     27   Mature neutrophils
## 18:     29            Monocytes
## 19:     30   Mature neutrophils

Using the do.add.cols function, we can add the population names to the corresponding clusters.

    ### Add annotations

        cell.dat <- do.add.cols(cell.dat, "FlowSOM_metacluster", annots, "Values")

## Step 1/3. Mapping data

## Step 2/3. Merging data

## Step 3/3. Returning data

        cell.dat

##         PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48
##                <num>         <num>      <num>      <num>       <num>
##      1:      51.6403     1239.3600   -681.257   1752.270    2600.430
##      2:      58.8824     -140.9210   -267.039   1188.640     392.962
##      3:      60.0077     2191.2600   -362.088   1676.960    1441.300
##      4:     313.9580     2532.7000   -438.388    478.109    1809.220
##      5:     -83.0391     1603.3900  -1200.350    564.994    3946.200
##     ---                                                             
## 159996:     106.3880     1661.1200   -114.892   2138.700    8311.230
## 159997:    1750.2300       65.4275   -843.984  19655.900    7874.900
## 159998:    -159.2870     5363.3900  17389.300   2578.650    -362.093
## 159999:     -62.6836     3132.7100   1941.660   1548.130     577.625
## 160000:     650.9440     1991.4900    254.429   1677.730    2209.210
##         BUV395 CD11b BUV737 B220 BV605 Ly6C   FileName FileNo
##                <num>       <num>      <num>     <fctr>  <int>
##      1:      627.259   17763.600    128.934  Mock_01_A      1
##      2:      304.615   11834.800    907.365  Mock_01_A      1
##      3:     4740.710     185.947   3915.820  Mock_01_A      1
##      4:     9440.380    1056.290   6221.670  Mock_01_A      1
##      5:     1636.110    4579.870    274.527  Mock_01_A      1
##     ---                                                      
## 159996:      127.068     258.936   6161.070 Virus_08_B     16
## 159997:      122.179     550.596    132.445 Virus_08_B     16
## 159998:    12091.600     121.659   4257.980 Virus_08_B     16
## 159999:    12904.000     338.244   1030.080 Virus_08_B     16
## 160000:    22242.300     789.639  12467.000 Virus_08_B     16
##         PECy5-5 CD3e_asinh PECy7 CD16-32_asinh DL800 Ly6G_asinh
##                      <num>               <num>            <num>
##      1:          0.1030979           1.6392988       -1.1159989
##      2:          0.1174943          -0.2782380       -0.5114826
##      3:          0.1197291           2.1835405       -0.6723598
##      4:          0.5926148           2.3251843       -0.7915047
##      5:         -0.1653241           1.8818839       -1.6097071
##     ---                                                        
## 159996:          0.2112023           1.9157040       -0.2278085
## 159997:          1.9658468           0.1304844       -1.2947039
## 159998:         -0.3134175           3.0680563        4.2423561
## 159999:         -0.1250411           2.5345014        2.0660180
## 160000:          1.0796017           2.0905763        0.4891206
##         AF700 CD45_asinh APCCy7 CD48_asinh BUV395 CD11b_asinh BUV737 B220_asinh
##                    <num>             <num>              <num>             <num>
##      1:        1.9669670         2.3510881          1.0504123         4.2636438
##      2:        1.6006640         0.7216390          0.5767234         3.8577846
##      3:        1.9247953         1.7806527          2.9452507         0.3638149
##      4:        0.8500738         1.9977594          3.6319912         1.4928782
##      5:        0.9703694         2.7630370          1.9011864         2.9109315
##     ---                                                                        
## 159996:        2.1598848         3.5048055          0.2514770         0.4971396
## 159997:        4.3648336         3.4509812          0.2419893         0.9511485
## 159998:        2.3428298        -0.6723679          3.8792326         0.2409789
## 159999:        1.8484549         0.9870066          3.9442068         0.6332984
## 160000:        1.9252352         2.1914959          4.4884166         1.2379493
##         BV605 Ly6C_asinh    Sample  Group  Batch PECy5-5 CD3e_asinh_aligned
##                    <num>    <char> <char> <char>                      <num>
##      1:        0.2550924 Mock_01_A   Mock      A                 0.07550371
##      2:        1.3575720 Mock_01_A   Mock      A                 0.09086002
##      3:        2.7553704 Mock_01_A   Mock      A                 0.19667186
##      4:        3.2159434 Mock_01_A   Mock      A                 0.78801227
##      5:        0.5246514 Mock_01_A   Mock      A                -0.20452863
##     ---                                                                    
## 159996:        3.2061873  WNV_08_B  Virus      B                 0.20014960
## 159997:        0.2618862  WNV_08_B  Virus      B                 1.89280009
## 159998:        2.8385188  WNV_08_B  Virus      B                -0.31652537
## 159999:        1.4702206  WNV_08_B  Virus      B                -0.13956173
## 160000:        3.9097814  WNV_08_B  Virus      B                 1.00749552
##         PECy7 CD16-32_asinh_aligned DL800 Ly6G_asinh_aligned
##                               <num>                    <num>
##      1:                  1.62435412               -0.9853218
##      2:                 -0.41091576               -0.5260243
##      3:                  2.33598351               -0.5778462
##      4:                  2.47004819               -0.6863875
##      5:                  1.88130629               -1.3787018
##     ---                                                     
## 159996:                  1.92336309               -0.1570388
## 159997:                  0.02324442               -1.8558453
## 159998:                  2.97609425                4.1403537
## 159999:                  2.39929986                2.0580065
## 160000:                  2.13440585                0.5363485
##         AF700 CD45_asinh_aligned APCCy7 CD48_asinh_aligned
##                            <num>                     <num>
##      1:                 2.260301                 2.5360262
##      2:                 1.783121                 1.1129812
##      3:                 1.962701                 1.8322810
##      4:                 1.103076                 2.0085123
##      5:                 1.213712                 2.9657056
##     ---                                                   
## 159996:                 1.971623                 3.3322656
## 159997:                 4.071003                 3.1807175
## 159998:                 2.090997                -0.8144654
## 159999:                 1.537023                 0.7848281
## 160000:                 1.755526                 2.0961668
##         BUV395 CD11b_asinh_aligned BUV737 B220_asinh_aligned
##                              <num>                     <num>
##      1:                 0.45637840                 4.1089830
##      2:                 0.01138196                 3.7186334
##      3:                 2.99106073                 0.4972116
##      4:                 3.73671436                 1.5526254
##      5:                 1.33816469                 2.8330612
##     ---                                                     
## 159996:                 0.88995486                 0.4703643
## 159997:                 0.77705491                 0.8439717
## 159998:                 3.77656460                 0.2098913
## 159999:                 3.83918405                 0.5993028
## 160000:                 4.56159830                 1.2319804
##         BV605 Ly6C_asinh_aligned Alignment_MC_aligned FlowSOM_cluster
##                            <num>                <num>           <num>
##      1:                0.1759336                    1              12
##      2:                1.2138813                    1               8
##      3:                2.6361940                    5              89
##      4:                3.0890400                    5             102
##      5:                0.4209961                    1              14
##     ---                                                              
## 159996:                3.3719137                    8             100
## 159997:                0.3941718                    9              43
## 159998:                3.1250730                    4             154
## 159999:                1.7465904                    4             162
## 160000:                4.1770687                    8             130
##         FlowSOM_metacluster           Population
##                      <fctr>               <char>
##      1:                   5       Mature B cells
##      2:                   5       Mature B cells
##      3:                  22 Immature neutrophils
##      4:                  22 Immature neutrophils
##      5:                   6     Immature B cells
##     ---                                         
## 159996:                  21                 <NA>
## 159997:                   7              T cells
## 159998:                  25   Mature neutrophils
## 159999:                  25   Mature neutrophils
## 160000:                  22 Immature neutrophils

        cell.sub <- do.add.cols(cell.sub, "FlowSOM_metacluster", annots, "Values")

## Step 1/3. Mapping data

## Step 2/3. Merging data

## Step 3/3. Returning data

        cell.sub

##        PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48
##               <num>         <num>      <num>      <num>       <num>
##     1:    -116.3130     2099.3600   -52.1773    633.551    -126.840
##     2:     493.0460     3853.7100   230.9840   5479.640    3270.230
##     3:     -17.6281     5875.5000 12478.6000   2462.980     448.307
##     4:     121.5190      -40.2623  -250.6370   1289.630    6305.760
##     5:     256.6870     3783.4100 12428.0000   2114.770     415.951
##    ---                                                             
## 19996:     -48.2851      357.1330  -671.5610   1880.030    4578.660
## 19997:     205.7200     2776.6000    64.1222   2991.120    2186.440
## 19998:    -298.2410    10829.8000   936.0370   4009.910    4486.420
## 19999:      29.3559     2172.4700 -1229.2500    974.453    2179.840
## 20000:     236.7730    13589.3000   449.5360   4660.120    7047.160
##        BUV395 CD11b BUV737 B220 BV605 Ly6C   FileName FileNo PECy5-5 CD3e_asinh
##               <num>       <num>      <num>     <fctr>  <int>              <num>
##     1:    21323.700    462.1640   2659.420  Mock_07_B      7        -0.23057742
##     2:     6902.790    111.1600    176.694  Mock_06_B      6         0.87150487
##     3:    23718.200     56.5674   2174.620  Mock_08_B      8        -0.03524890
##     4:       81.388   5606.4300    276.940  Mock_05_B      5         0.24070684
##     5:    20976.300    357.1200   1481.540  Mock_05_B      5         0.49314179
##    ---                                                                         
## 19996:     1735.620   3867.0800   2118.510 Virus_02_A     10        -0.09642073
## 19997:     6004.450    547.5590   2535.700 Virus_06_B     14         0.40063603
## 19998:    11029.800    188.3320   7611.190 Virus_05_B     13        -0.56580590
## 19999:      551.834    101.0540   1712.870 Virus_02_A     10         0.05867812
## 20000:    18265.900    182.9020  26457.600 Virus_01_A      9         0.45742638
##        PECy7 CD16-32_asinh DL800 Ly6G_asinh AF700 CD45_asinh APCCy7 CD48_asinh
##                      <num>            <num>            <num>             <num>
##     1:          2.14181539       -0.1041661         1.058232        -0.2510350
##     2:          2.73951281        0.4469390         3.089409         2.5769483
##     3:          3.15889109        3.9107107         2.297814         0.8063480
##     4:         -0.08043783       -0.4823510         1.676272         3.2293260
##     5:          2.72125805        3.9066508         2.148931         0.7573862
##    ---                                                                        
## 19996:          0.66431456       -1.1044717         2.034813         2.9106688
## 19997:          2.41553155        0.1278954         2.488856         2.1813937
## 19998:          3.76912842        1.3849166         2.778928         2.8904402
## 19999:          2.17514614       -1.6317072         1.420549         2.1784468
## 20000:          3.99591535        0.8081770         2.928201         3.3401752
##        BUV395 CD11b_asinh BUV737 B220_asinh BV605 Ly6C_asinh    Sample  Group
##                     <num>             <num>            <num>    <char> <char>
##     1:          4.4462509         0.8268408        2.3731246 Mock_07_B   Mock
##     2:          3.3195292         0.2205282        0.3464177 Mock_06_B   Mock
##     3:          4.5526481         0.1128948        2.1761101 Mock_08_B   Mock
##     4:          0.1620656         3.1121910        0.5288774 Mock_05_B   Mock
##     5:          4.4298296         0.6642934        1.8067062 Mock_05_B   Mock
##    ---                                                                       
## 19996:          1.9577893         2.7429475        2.1506509  WNV_02_A  Virus
## 19997:          3.1805243         0.9470589        2.3263458  WNV_06_B  Virus
## 19998:          3.7874084         0.3682822        3.4169910  WNV_05_B  Virus
## 19999:          0.9528120         0.2007568        1.9451170  WNV_02_A  Virus
## 20000:          4.2915176         0.3581012        4.6619271  WNV_01_A  Virus
##         Batch PECy5-5 CD3e_asinh_aligned PECy7 CD16-32_asinh_aligned
##        <char>                      <num>                       <num>
##     1:      B                -0.29369602                  2.00385475
##     2:      B                 0.73757279                  2.71793723
##     3:      B                -0.05559657                  3.07941747
##     4:      B                 0.26125368                  0.02720127
##     5:      B                 0.43726748                  2.60483789
##    ---                                                              
## 19996:      A                -0.15290843                  0.56069624
## 19997:      B                 0.31178620                  2.25823164
## 19998:      B                -0.90997732                  3.74038410
## 19999:      A                 0.06718409                  2.29398131
## 20000:      A                 0.47635502                  3.96355510
##        DL800 Ly6G_asinh_aligned AF700 CD45_asinh_aligned
##                           <num>                    <num>
##     1:               -0.1480608                0.7780063
##     2:                0.3374757                2.8061771
##     3:                3.7818818                2.0387959
##     4:               -0.4421591                1.4738377
##     5:                3.7771759                1.8637621
##    ---                                                  
## 19996:               -1.0745652                2.3543241
## 19997:                0.1235801                2.6152031
## 19998:                1.3456432                2.6467636
## 19999:               -1.6143754                1.5896653
## 20000:                0.7784286                3.0210304
##        APCCy7 CD48_asinh_aligned BUV395 CD11b_asinh_aligned
##                            <num>                      <num>
##     1:                -0.6031739                  4.3643961
##     2:                 2.4243431                  3.1393259
##     3:                 0.5808588                  4.5825343
##     4:                 3.0247295                  0.7374537
##     5:                 0.5254776                  4.4549651
##    ---                                                     
## 19996:                 3.1603029                  1.3455793
## 19997:                 2.2317555                  3.1402769
## 19998:                 2.7106078                  3.9415228
## 19999:                 2.3983061                  0.3903685
## 20000:                 3.5182674                  4.1516290
##        BUV737 B220_asinh_aligned BV605 Ly6C_asinh_aligned Alignment_MC_aligned
##                            <num>                    <num>                <num>
##     1:                0.67665219                2.5046093                    5
##     2:                0.22383082                0.3302887                   10
##     3:                0.08498045                2.4683566                    4
##     4:                3.18983173                0.6547109                    1
##     5:                0.62932587                2.1033373                    4
##    ---                                                                        
## 19996:                2.86870980                2.0855556                    2
## 19997:                0.78986788                2.4517956                    5
## 19998:                0.33557367                3.6496732                    8
## 19999:                0.26035574                2.1166730                    6
## 20000:                0.38186085                4.3858743                    8
##        FlowSOM_cluster FlowSOM_metacluster    UMAP_X    UMAP_Y
##                  <num>              <fctr>     <num>     <num>
##     1:             119                  22  1.924202  2.700375
##     2:              75                  18  4.288721 -2.231453
##     3:             179                  25 -4.118332  7.331559
##     4:              70                   6 -5.826405 -6.053842
##     5:             180                  25 -4.021310  6.415935
##    ---                                                        
## 19996:              32                   9 -0.907486 -7.260880
## 19997:             116                  22  3.595872  1.210391
## 19998:             173                  27  6.418618  2.405232
## 19999:              87                  14  3.245920 -1.075086
## 20000:             185                  26  6.894060  2.271706
##                  Population
##                      <char>
##     1: Immature neutrophils
##     2:                 <NA>
##     3:   Mature neutrophils
##     4:     Immature B cells
##     5:   Mature neutrophils
##    ---                     
## 19996:       Mature B cells
## 19997: Immature neutrophils
## 19998:   Mature neutrophils
## 19999:                 <NA>
## 20000:            Monocytes

Any unannotated clusters will then gain the label ‘Other’.

    ### Fill in NAs
        
        cell.dat[['Population']][is.na(cell.dat[, 'Population'])] <- 'Other'
        cell.dat

##         PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48
##                <num>         <num>      <num>      <num>       <num>
##      1:      51.6403     1239.3600   -681.257   1752.270    2600.430
##      2:      58.8824     -140.9210   -267.039   1188.640     392.962
##      3:      60.0077     2191.2600   -362.088   1676.960    1441.300
##      4:     313.9580     2532.7000   -438.388    478.109    1809.220
##      5:     -83.0391     1603.3900  -1200.350    564.994    3946.200
##     ---                                                             
## 159996:     106.3880     1661.1200   -114.892   2138.700    8311.230
## 159997:    1750.2300       65.4275   -843.984  19655.900    7874.900
## 159998:    -159.2870     5363.3900  17389.300   2578.650    -362.093
## 159999:     -62.6836     3132.7100   1941.660   1548.130     577.625
## 160000:     650.9440     1991.4900    254.429   1677.730    2209.210
##         BUV395 CD11b BUV737 B220 BV605 Ly6C   FileName FileNo
##                <num>       <num>      <num>     <fctr>  <int>
##      1:      627.259   17763.600    128.934  Mock_01_A      1
##      2:      304.615   11834.800    907.365  Mock_01_A      1
##      3:     4740.710     185.947   3915.820  Mock_01_A      1
##      4:     9440.380    1056.290   6221.670  Mock_01_A      1
##      5:     1636.110    4579.870    274.527  Mock_01_A      1
##     ---                                                      
## 159996:      127.068     258.936   6161.070 Virus_08_B     16
## 159997:      122.179     550.596    132.445 Virus_08_B     16
## 159998:    12091.600     121.659   4257.980 Virus_08_B     16
## 159999:    12904.000     338.244   1030.080 Virus_08_B     16
## 160000:    22242.300     789.639  12467.000 Virus_08_B     16
##         PECy5-5 CD3e_asinh PECy7 CD16-32_asinh DL800 Ly6G_asinh
##                      <num>               <num>            <num>
##      1:          0.1030979           1.6392988       -1.1159989
##      2:          0.1174943          -0.2782380       -0.5114826
##      3:          0.1197291           2.1835405       -0.6723598
##      4:          0.5926148           2.3251843       -0.7915047
##      5:         -0.1653241           1.8818839       -1.6097071
##     ---                                                        
## 159996:          0.2112023           1.9157040       -0.2278085
## 159997:          1.9658468           0.1304844       -1.2947039
## 159998:         -0.3134175           3.0680563        4.2423561
## 159999:         -0.1250411           2.5345014        2.0660180
## 160000:          1.0796017           2.0905763        0.4891206
##         AF700 CD45_asinh APCCy7 CD48_asinh BUV395 CD11b_asinh BUV737 B220_asinh
##                    <num>             <num>              <num>             <num>
##      1:        1.9669670         2.3510881          1.0504123         4.2636438
##      2:        1.6006640         0.7216390          0.5767234         3.8577846
##      3:        1.9247953         1.7806527          2.9452507         0.3638149
##      4:        0.8500738         1.9977594          3.6319912         1.4928782
##      5:        0.9703694         2.7630370          1.9011864         2.9109315
##     ---                                                                        
## 159996:        2.1598848         3.5048055          0.2514770         0.4971396
## 159997:        4.3648336         3.4509812          0.2419893         0.9511485
## 159998:        2.3428298        -0.6723679          3.8792326         0.2409789
## 159999:        1.8484549         0.9870066          3.9442068         0.6332984
## 160000:        1.9252352         2.1914959          4.4884166         1.2379493
##         BV605 Ly6C_asinh    Sample  Group  Batch PECy5-5 CD3e_asinh_aligned
##                    <num>    <char> <char> <char>                      <num>
##      1:        0.2550924 Mock_01_A   Mock      A                 0.07550371
##      2:        1.3575720 Mock_01_A   Mock      A                 0.09086002
##      3:        2.7553704 Mock_01_A   Mock      A                 0.19667186
##      4:        3.2159434 Mock_01_A   Mock      A                 0.78801227
##      5:        0.5246514 Mock_01_A   Mock      A                -0.20452863
##     ---                                                                    
## 159996:        3.2061873  WNV_08_B  Virus      B                 0.20014960
## 159997:        0.2618862  WNV_08_B  Virus      B                 1.89280009
## 159998:        2.8385188  WNV_08_B  Virus      B                -0.31652537
## 159999:        1.4702206  WNV_08_B  Virus      B                -0.13956173
## 160000:        3.9097814  WNV_08_B  Virus      B                 1.00749552
##         PECy7 CD16-32_asinh_aligned DL800 Ly6G_asinh_aligned
##                               <num>                    <num>
##      1:                  1.62435412               -0.9853218
##      2:                 -0.41091576               -0.5260243
##      3:                  2.33598351               -0.5778462
##      4:                  2.47004819               -0.6863875
##      5:                  1.88130629               -1.3787018
##     ---                                                     
## 159996:                  1.92336309               -0.1570388
## 159997:                  0.02324442               -1.8558453
## 159998:                  2.97609425                4.1403537
## 159999:                  2.39929986                2.0580065
## 160000:                  2.13440585                0.5363485
##         AF700 CD45_asinh_aligned APCCy7 CD48_asinh_aligned
##                            <num>                     <num>
##      1:                 2.260301                 2.5360262
##      2:                 1.783121                 1.1129812
##      3:                 1.962701                 1.8322810
##      4:                 1.103076                 2.0085123
##      5:                 1.213712                 2.9657056
##     ---                                                   
## 159996:                 1.971623                 3.3322656
## 159997:                 4.071003                 3.1807175
## 159998:                 2.090997                -0.8144654
## 159999:                 1.537023                 0.7848281
## 160000:                 1.755526                 2.0961668
##         BUV395 CD11b_asinh_aligned BUV737 B220_asinh_aligned
##                              <num>                     <num>
##      1:                 0.45637840                 4.1089830
##      2:                 0.01138196                 3.7186334
##      3:                 2.99106073                 0.4972116
##      4:                 3.73671436                 1.5526254
##      5:                 1.33816469                 2.8330612
##     ---                                                     
## 159996:                 0.88995486                 0.4703643
## 159997:                 0.77705491                 0.8439717
## 159998:                 3.77656460                 0.2098913
## 159999:                 3.83918405                 0.5993028
## 160000:                 4.56159830                 1.2319804
##         BV605 Ly6C_asinh_aligned Alignment_MC_aligned FlowSOM_cluster
##                            <num>                <num>           <num>
##      1:                0.1759336                    1              12
##      2:                1.2138813                    1               8
##      3:                2.6361940                    5              89
##      4:                3.0890400                    5             102
##      5:                0.4209961                    1              14
##     ---                                                              
## 159996:                3.3719137                    8             100
## 159997:                0.3941718                    9              43
## 159998:                3.1250730                    4             154
## 159999:                1.7465904                    4             162
## 160000:                4.1770687                    8             130
##         FlowSOM_metacluster           Population
##                      <fctr>               <char>
##      1:                   5       Mature B cells
##      2:                   5       Mature B cells
##      3:                  22 Immature neutrophils
##      4:                  22 Immature neutrophils
##      5:                   6     Immature B cells
##     ---                                         
## 159996:                  21                Other
## 159997:                   7              T cells
## 159998:                  25   Mature neutrophils
## 159999:                  25   Mature neutrophils
## 160000:                  22 Immature neutrophils

        cell.sub[['Population']][is.na(cell.sub[, 'Population'])] <- 'Other'
        cell.sub

##        PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48
##               <num>         <num>      <num>      <num>       <num>
##     1:    -116.3130     2099.3600   -52.1773    633.551    -126.840
##     2:     493.0460     3853.7100   230.9840   5479.640    3270.230
##     3:     -17.6281     5875.5000 12478.6000   2462.980     448.307
##     4:     121.5190      -40.2623  -250.6370   1289.630    6305.760
##     5:     256.6870     3783.4100 12428.0000   2114.770     415.951
##    ---                                                             
## 19996:     -48.2851      357.1330  -671.5610   1880.030    4578.660
## 19997:     205.7200     2776.6000    64.1222   2991.120    2186.440
## 19998:    -298.2410    10829.8000   936.0370   4009.910    4486.420
## 19999:      29.3559     2172.4700 -1229.2500    974.453    2179.840
## 20000:     236.7730    13589.3000   449.5360   4660.120    7047.160
##        BUV395 CD11b BUV737 B220 BV605 Ly6C   FileName FileNo PECy5-5 CD3e_asinh
##               <num>       <num>      <num>     <fctr>  <int>              <num>
##     1:    21323.700    462.1640   2659.420  Mock_07_B      7        -0.23057742
##     2:     6902.790    111.1600    176.694  Mock_06_B      6         0.87150487
##     3:    23718.200     56.5674   2174.620  Mock_08_B      8        -0.03524890
##     4:       81.388   5606.4300    276.940  Mock_05_B      5         0.24070684
##     5:    20976.300    357.1200   1481.540  Mock_05_B      5         0.49314179
##    ---                                                                         
## 19996:     1735.620   3867.0800   2118.510 Virus_02_A     10        -0.09642073
## 19997:     6004.450    547.5590   2535.700 Virus_06_B     14         0.40063603
## 19998:    11029.800    188.3320   7611.190 Virus_05_B     13        -0.56580590
## 19999:      551.834    101.0540   1712.870 Virus_02_A     10         0.05867812
## 20000:    18265.900    182.9020  26457.600 Virus_01_A      9         0.45742638
##        PECy7 CD16-32_asinh DL800 Ly6G_asinh AF700 CD45_asinh APCCy7 CD48_asinh
##                      <num>            <num>            <num>             <num>
##     1:          2.14181539       -0.1041661         1.058232        -0.2510350
##     2:          2.73951281        0.4469390         3.089409         2.5769483
##     3:          3.15889109        3.9107107         2.297814         0.8063480
##     4:         -0.08043783       -0.4823510         1.676272         3.2293260
##     5:          2.72125805        3.9066508         2.148931         0.7573862
##    ---                                                                        
## 19996:          0.66431456       -1.1044717         2.034813         2.9106688
## 19997:          2.41553155        0.1278954         2.488856         2.1813937
## 19998:          3.76912842        1.3849166         2.778928         2.8904402
## 19999:          2.17514614       -1.6317072         1.420549         2.1784468
## 20000:          3.99591535        0.8081770         2.928201         3.3401752
##        BUV395 CD11b_asinh BUV737 B220_asinh BV605 Ly6C_asinh    Sample  Group
##                     <num>             <num>            <num>    <char> <char>
##     1:          4.4462509         0.8268408        2.3731246 Mock_07_B   Mock
##     2:          3.3195292         0.2205282        0.3464177 Mock_06_B   Mock
##     3:          4.5526481         0.1128948        2.1761101 Mock_08_B   Mock
##     4:          0.1620656         3.1121910        0.5288774 Mock_05_B   Mock
##     5:          4.4298296         0.6642934        1.8067062 Mock_05_B   Mock
##    ---                                                                       
## 19996:          1.9577893         2.7429475        2.1506509  WNV_02_A  Virus
## 19997:          3.1805243         0.9470589        2.3263458  WNV_06_B  Virus
## 19998:          3.7874084         0.3682822        3.4169910  WNV_05_B  Virus
## 19999:          0.9528120         0.2007568        1.9451170  WNV_02_A  Virus
## 20000:          4.2915176         0.3581012        4.6619271  WNV_01_A  Virus
##         Batch PECy5-5 CD3e_asinh_aligned PECy7 CD16-32_asinh_aligned
##        <char>                      <num>                       <num>
##     1:      B                -0.29369602                  2.00385475
##     2:      B                 0.73757279                  2.71793723
##     3:      B                -0.05559657                  3.07941747
##     4:      B                 0.26125368                  0.02720127
##     5:      B                 0.43726748                  2.60483789
##    ---                                                              
## 19996:      A                -0.15290843                  0.56069624
## 19997:      B                 0.31178620                  2.25823164
## 19998:      B                -0.90997732                  3.74038410
## 19999:      A                 0.06718409                  2.29398131
## 20000:      A                 0.47635502                  3.96355510
##        DL800 Ly6G_asinh_aligned AF700 CD45_asinh_aligned
##                           <num>                    <num>
##     1:               -0.1480608                0.7780063
##     2:                0.3374757                2.8061771
##     3:                3.7818818                2.0387959
##     4:               -0.4421591                1.4738377
##     5:                3.7771759                1.8637621
##    ---                                                  
## 19996:               -1.0745652                2.3543241
## 19997:                0.1235801                2.6152031
## 19998:                1.3456432                2.6467636
## 19999:               -1.6143754                1.5896653
## 20000:                0.7784286                3.0210304
##        APCCy7 CD48_asinh_aligned BUV395 CD11b_asinh_aligned
##                            <num>                      <num>
##     1:                -0.6031739                  4.3643961
##     2:                 2.4243431                  3.1393259
##     3:                 0.5808588                  4.5825343
##     4:                 3.0247295                  0.7374537
##     5:                 0.5254776                  4.4549651
##    ---                                                     
## 19996:                 3.1603029                  1.3455793
## 19997:                 2.2317555                  3.1402769
## 19998:                 2.7106078                  3.9415228
## 19999:                 2.3983061                  0.3903685
## 20000:                 3.5182674                  4.1516290
##        BUV737 B220_asinh_aligned BV605 Ly6C_asinh_aligned Alignment_MC_aligned
##                            <num>                    <num>                <num>
##     1:                0.67665219                2.5046093                    5
##     2:                0.22383082                0.3302887                   10
##     3:                0.08498045                2.4683566                    4
##     4:                3.18983173                0.6547109                    1
##     5:                0.62932587                2.1033373                    4
##    ---                                                                        
## 19996:                2.86870980                2.0855556                    2
## 19997:                0.78986788                2.4517956                    5
## 19998:                0.33557367                3.6496732                    8
## 19999:                0.26035574                2.1166730                    6
## 20000:                0.38186085                4.3858743                    8
##        FlowSOM_cluster FlowSOM_metacluster    UMAP_X    UMAP_Y
##                  <num>              <fctr>     <num>     <num>
##     1:             119                  22  1.924202  2.700375
##     2:              75                  18  4.288721 -2.231453
##     3:             179                  25 -4.118332  7.331559
##     4:              70                   6 -5.826405 -6.053842
##     5:             180                  25 -4.021310  6.415935
##    ---                                                        
## 19996:              32                   9 -0.907486 -7.260880
## 19997:             116                  22  3.595872  1.210391
## 19998:             173                  27  6.418618  2.405232
## 19999:              87                  14  3.245920 -1.075086
## 20000:             185                  26  6.894060  2.271706
##                  Population
##                      <char>
##     1: Immature neutrophils
##     2:                Other
##     3:   Mature neutrophils
##     4:     Immature B cells
##     5:   Mature neutrophils
##    ---                     
## 19996:       Mature B cells
## 19997: Immature neutrophils
## 19998:   Mature neutrophils
## 19999:                Other
## 20000:            Monocytes

Save data.

    ### Save data and plots
        
        fwrite(cell.dat, "Annotated.data.csv")
        fwrite(cell.sub, "Annotated.data.DR.csv")

Subsequently, we can visualise the population labels on a UMAP plot.

        make.colour.plot(cell.sub, "UMAP_X", "UMAP_Y", "Population", col.type = 'factor', add.label = TRUE)

        make.multi.plot(cell.sub, "UMAP_X", "UMAP_Y", "Population", group.col, col.type = 'factor')

## Check your working directory for a new .png called 'Multi plot divided by Group.png'

We can also generate an expression heatmap to summarise the expression levels of each marker on our populations.

    ### Expression heatmap
        
        rm(exp)
        exp <- do.aggregate(cell.dat, aligned.cellular.cols, by = "Population")
        make.pheatmap(exp, "Population", aligned.cellular.cols)

Save FCS for further analysis and evaluation.

    ### Write FCS files
        
        setwd(OutputDirectory)
        setwd("Output 4 - annotation")
        
        dir.create('FCS files')
        setwd('FCS files')
        
        write.files(cell.dat,
                    file.prefix = exp.name,
                    divide.by = sample.col,
                    write.csv = FALSE,
                    write.fcs = TRUE)

8. Summary data and statistical analysis

Here we can create ‘summary’ data for our experiment. This involves calculating the percentage of each population in each sample, along with the corresponding cell counts if the information is available. In addition, we calculate the MFI for selected markers on each population in each sample.

First, set the working directory, and select which columns we will measure the MFI of. In this case, CD11b_asinh and Ly6C_asinh.

Set output directory.

    setwd(OutputDirectory)
    dir.create("Output 5 - summary data")

## Warning in dir.create("Output 5 - summary data"): 'Output 5 - summary data'
## already exists

    setwd("Output 5 - summary data")

    ### Setup
    
        variance.test <- 'kruskal.test'
        pairwise.test <- "wilcox.test"
    
        comparisons <- list(c("Mock", "Virus"))
        comparisons

## [[1]]
## [1] "Mock"  "Virus"

        grp.order <- c("Mock", "Virus")
        grp.order

## [1] "Mock"  "Virus"

We can also specify which columns we wish to measure MFI levels on.

    ### Select columns to measure MFI
    
        as.matrix(aligned.cellular.cols)

##      [,1]                         
## [1,] "PECy5-5 CD3e_asinh_aligned" 
## [2,] "PECy7 CD16-32_asinh_aligned"
## [3,] "DL800 Ly6G_asinh_aligned"   
## [4,] "AF700 CD45_asinh_aligned"   
## [5,] "APCCy7 CD48_asinh_aligned"  
## [6,] "BUV395 CD11b_asinh_aligned" 
## [7,] "BUV737 B220_asinh_aligned"  
## [8,] "BV605 Ly6C_asinh_aligned"

        dyn.cols <- aligned.cellular.cols[c(5,8)]
        dyn.cols

## [1] "APCCy7 CD48_asinh_aligned" "BV605 Ly6C_asinh_aligned"

Use the new create.sumtable function to generate summary data – a data.table of samples (rows) vs measurements (columns).

    ### Create summary tables
    
        sum.dat <- create.sumtable(dat = cell.dat, 
                                   sample.col = sample.col,
                                   pop.col = "Population",
                                   use.cols = dyn.cols, 
                                   annot.cols = c(group.col, batch.col)
                                   )

## Creating sumtable

##  -- running some initial tests

##  -- calculting cell proportions

##  -- calculting expression levels

##  -- wrapping up

##  -- sumtable complete!

Once the summary data has been generated, we can review it and select which columns to plot. In each case, the column names (i.e. name of each summary measure) are structured as ‘MEASURE TYPE – POPULATION’. This provides a useful structure, as we can use regular expression searches to split the name into just the MEASURE TYPE or POPULATION segment.

    ### Review summary data
        
        sum.dat

##        Sample  Group  Batch Exp APCCy7 CD48_asinh_aligned -- Immature B cells
##        <fctr> <char> <char>                                             <num>
##  1: Mock_01_A   Mock      A                                          2.424887
##  2: Mock_02_A   Mock      A                                          2.374245
##  3: Mock_03_A   Mock      A                                          2.508753
##  4: Mock_04_A   Mock      A                                          2.600931
##  5: Mock_05_B   Mock      B                                          2.419923
##  6: Mock_06_B   Mock      B                                          2.385357
##  7: Mock_07_B   Mock      B                                          2.325857
##  8: Mock_08_B   Mock      B                                          2.465879
##  9:  WNV_01_A  Virus      A                                          2.122836
## 10:  WNV_02_A  Virus      A                                          2.305655
## 11:  WNV_03_A  Virus      A                                          2.232820
## 12:  WNV_04_A  Virus      A                                          2.447445
## 13:  WNV_05_B  Virus      B                                          2.049278
## 14:  WNV_06_B  Virus      B                                          2.183903
## 15:  WNV_07_B  Virus      B                                          2.152955
## 16:  WNV_08_B  Virus      B                                          2.173844
##     Exp APCCy7 CD48_asinh_aligned -- Immature neutrophils
##                                                     <num>
##  1:                                              1.572872
##  2:                                              1.526846
##  3:                                              1.566277
##  4:                                              1.682951
##  5:                                              1.486138
##  6:                                              1.373131
##  7:                                              1.466973
##  8:                                              1.640794
##  9:                                              1.915684
## 10:                                              1.904565
## 11:                                              1.852920
## 12:                                              1.889313
## 13:                                              1.843289
## 14:                                              1.817669
## 15:                                              1.875538
## 16:                                              1.859179
##     Exp APCCy7 CD48_asinh_aligned -- Mature B cells
##                                               <num>
##  1:                                        2.412014
##  2:                                        2.426953
##  3:                                        2.623877
##  4:                                        2.714248
##  5:                                        2.419015
##  6:                                        2.413756
##  7:                                        2.362291
##  8:                                        2.469881
##  9:                                        2.542520
## 10:                                        2.643040
## 11:                                        2.678252
## 12:                                        2.798805
## 13:                                        2.205732
## 14:                                        2.393001
## 15:                                        2.346441
## 16:                                        2.400605
##     Exp APCCy7 CD48_asinh_aligned -- Mature neutrophils
##                                                   <num>
##  1:                                           0.5129836
##  2:                                           0.5528726
##  3:                                           0.5352846
##  4:                                           0.6251231
##  5:                                           0.5175413
##  6:                                           0.5001851
##  7:                                           0.5884818
##  8:                                           0.6476299
##  9:                                           0.8659670
## 10:                                           1.0622124
## 11:                                           1.0449080
## 12:                                           1.1598637
## 13:                                           0.9772988
## 14:                                           1.0035599
## 15:                                           1.1106013
## 16:                                           1.1197803
##     Exp APCCy7 CD48_asinh_aligned -- Monocytes
##                                          <num>
##  1:                                   3.261353
##  2:                                   3.254478
##  3:                                   3.419966
##  4:                                   3.435929
##  5:                                   3.250326
##  6:                                   3.249777
##  7:                                   3.218665
##  8:                                   3.316341
##  9:                                   3.186699
## 10:                                   3.261326
## 11:                                   3.347447
## 12:                                   3.497759
## 13:                                   3.129637
## 14:                                   3.258258
## 15:                                   3.277384
## 16:                                   3.345560
##     Exp APCCy7 CD48_asinh_aligned -- Other
##                                      <num>
##  1:                               2.488965
##  2:                               2.388345
##  3:                               2.613971
##  4:                               2.684664
##  5:                               2.756410
##  6:                               2.649867
##  7:                               2.699764
##  8:                               2.834465
##  9:                               2.715744
## 10:                               2.869109
## 11:                               2.865703
## 12:                               3.140419
## 13:                               2.672039
## 14:                               2.828839
## 15:                               2.879763
## 16:                               2.891880
##     Exp APCCy7 CD48_asinh_aligned -- T cells
##                                        <num>
##  1:                                 2.759116
##  2:                                 2.669215
##  3:                                 2.970334
##  4:                                 3.001218
##  5:                                 2.766417
##  6:                                 2.683297
##  7:                                 2.610325
##  8:                                 2.930828
##  9:                                 2.764899
## 10:                                 2.842706
## 11:                                 2.931075
## 12:                                 2.937672
## 13:                                 2.666584
## 14:                                 2.647255
## 15:                                 2.766093
## 16:                                 2.782212
##     Exp BV605 Ly6C_asinh_aligned -- Immature B cells
##                                                <num>
##  1:                                        0.2387287
##  2:                                        0.2969931
##  3:                                        0.2263609
##  4:                                        0.2079162
##  5:                                        0.2410776
##  6:                                        0.2074710
##  7:                                        0.2253116
##  8:                                        0.2424775
##  9:                                        0.5038403
## 10:                                        0.4630924
## 11:                                        0.4633731
## 12:                                        0.4460627
## 13:                                        0.5610659
## 14:                                        0.3308880
## 15:                                        0.3395451
## 16:                                        0.3197377
##     Exp BV605 Ly6C_asinh_aligned -- Immature neutrophils
##                                                    <num>
##  1:                                             2.991968
##  2:                                             2.937831
##  3:                                             2.823442
##  4:                                             2.776157
##  5:                                             2.922687
##  6:                                             2.740521
##  7:                                             2.718578
##  8:                                             2.915434
##  9:                                             3.335838
## 10:                                             3.239150
## 11:                                             3.135962
## 12:                                             3.318283
## 13:                                             3.222065
## 14:                                             3.154325
## 15:                                             3.258957
## 16:                                             3.101161
##     Exp BV605 Ly6C_asinh_aligned -- Mature B cells
##                                              <num>
##  1:                                      0.5643588
##  2:                                      0.4958642
##  3:                                      0.4701363
##  4:                                      0.5119049
##  5:                                      0.6077903
##  6:                                      0.4059218
##  7:                                      0.4717474
##  8:                                      0.4684568
##  9:                                      1.4602033
## 10:                                      1.9881967
## 11:                                      1.7394467
## 12:                                      2.3235193
## 13:                                      1.4512756
## 14:                                      1.3436108
## 15:                                      1.9374443
## 16:                                      1.3764834
##     Exp BV605 Ly6C_asinh_aligned -- Mature neutrophils
##                                                  <num>
##  1:                                           2.452764
##  2:                                           2.473359
##  3:                                           2.320761
##  4:                                           2.220885
##  5:                                           2.441565
##  6:                                           2.423296
##  7:                                           2.470727
##  8:                                           2.361716
##  9:                                           2.228977
## 10:                                           2.179033
## 11:                                           2.204247
## 12:                                           2.225025
## 13:                                           2.233103
## 14:                                           2.153610
## 15:                                           2.110596
## 16:                                           2.183070
##     Exp BV605 Ly6C_asinh_aligned -- Monocytes
##                                         <num>
##  1:                                  4.540271
##  2:                                  4.584795
##  3:                                  4.458234
##  4:                                  4.307355
##  5:                                  4.539643
##  6:                                  4.525972
##  7:                                  4.572508
##  8:                                  4.577179
##  9:                                  4.304258
## 10:                                  4.264818
## 11:                                  4.213613
## 12:                                  4.346373
## 13:                                  4.148381
## 14:                                  4.499382
## 15:                                  4.429550
## 16:                                  4.306105
##     Exp BV605 Ly6C_asinh_aligned -- Other
##                                     <num>
##  1:                             0.7109253
##  2:                             0.7375577
##  3:                             0.6353084
##  4:                             0.6839464
##  5:                             0.7102996
##  6:                             0.7267920
##  7:                             0.7004569
##  8:                             0.7801929
##  9:                             1.4603817
## 10:                             1.5642416
## 11:                             1.6769586
## 12:                             1.5278105
## 13:                             1.3646209
## 14:                             1.4066820
## 15:                             1.3737616
## 16:                             1.3836509
##     Exp BV605 Ly6C_asinh_aligned -- T cells
##                                       <num>
##  1:                               1.5734507
##  2:                               0.8458289
##  3:                               1.0760717
##  4:                               1.1502750
##  5:                               1.4533346
##  6:                               1.2713361
##  7:                               1.3909613
##  8:                               3.0236276
##  9:                               2.8637326
## 10:                               3.0743690
## 11:                               3.1711638
## 12:                               3.1395780
## 13:                               3.3345189
## 14:                               2.8204267
## 15:                               3.1807756
## 16:                               3.2684104
##     Percent of sample -- Immature B cells
##                                     <num>
##  1:                                 18.78
##  2:                                 20.30
##  3:                                 20.03
##  4:                                 20.32
##  5:                                 20.13
##  6:                                 16.84
##  7:                                 13.97
##  8:                                 22.61
##  9:                                  3.27
## 10:                                  4.31
## 11:                                  9.86
## 12:                                  7.41
## 13:                                  1.13
## 14:                                  6.46
## 15:                                  6.41
## 16:                                  4.71
##     Percent of sample -- Immature neutrophils
##                                         <num>
##  1:                                      6.63
##  2:                                      8.80
##  3:                                      6.67
##  4:                                      5.77
##  5:                                      6.65
##  6:                                      6.49
##  7:                                      5.44
##  8:                                      7.45
##  9:                                      8.86
## 10:                                     11.06
## 11:                                     12.02
## 12:                                      9.58
## 13:                                      9.11
## 14:                                      9.90
## 15:                                      8.36
## 16:                                      8.73
##     Percent of sample -- Mature B cells Percent of sample -- Mature neutrophils
##                                   <num>                                   <num>
##  1:                               12.60                                   41.92
##  2:                               12.54                                   40.20
##  3:                               12.90                                   39.13
##  4:                               12.61                                   40.79
##  5:                               11.26                                   42.80
##  6:                               15.23                                   41.29
##  7:                               17.03                                   41.20
##  8:                                7.52                                   42.82
##  9:                               15.96                                   36.22
## 10:                               15.32                                   33.94
## 11:                               17.60                                   29.29
## 12:                               13.14                                   40.20
## 13:                               15.56                                   37.60
## 14:                               18.27                                   35.16
## 15:                               14.48                                   37.44
## 16:                               17.62                                   35.51
##     Percent of sample -- Monocytes Percent of sample -- Other
##                              <num>                      <num>
##  1:                           8.01                       7.59
##  2:                           5.49                       8.49
##  3:                           8.45                       8.82
##  4:                           8.79                       8.09
##  5:                           7.43                       8.46
##  6:                           8.82                       7.56
##  7:                           9.10                       7.37
##  8:                           9.87                       7.55
##  9:                          15.22                      12.12
## 10:                          14.52                      13.06
## 11:                          13.19                      11.61
## 12:                           9.48                      11.31
## 13:                          17.94                      11.17
## 14:                           9.57                      11.79
## 15:                          12.53                      11.42
## 16:                          13.04                      11.75
##     Percent of sample -- T cells
##                            <num>
##  1:                         4.47
##  2:                         4.18
##  3:                         4.00
##  4:                         3.63
##  5:                         3.27
##  6:                         3.77
##  7:                         5.89
##  8:                         2.18
##  9:                         8.35
## 10:                         7.79
## 11:                         6.43
## 12:                         8.88
## 13:                         7.49
## 14:                         8.85
## 15:                         9.36
## 16:                         8.64

        as.matrix(names(sum.dat))

##       [,1]                                                   
##  [1,] "Sample"                                               
##  [2,] "Group"                                                
##  [3,] "Batch"                                                
##  [4,] "Exp APCCy7 CD48_asinh_aligned -- Immature B cells"    
##  [5,] "Exp APCCy7 CD48_asinh_aligned -- Immature neutrophils"
##  [6,] "Exp APCCy7 CD48_asinh_aligned -- Mature B cells"      
##  [7,] "Exp APCCy7 CD48_asinh_aligned -- Mature neutrophils"  
##  [8,] "Exp APCCy7 CD48_asinh_aligned -- Monocytes"           
##  [9,] "Exp APCCy7 CD48_asinh_aligned -- Other"               
## [10,] "Exp APCCy7 CD48_asinh_aligned -- T cells"             
## [11,] "Exp BV605 Ly6C_asinh_aligned -- Immature B cells"     
## [12,] "Exp BV605 Ly6C_asinh_aligned -- Immature neutrophils" 
## [13,] "Exp BV605 Ly6C_asinh_aligned -- Mature B cells"       
## [14,] "Exp BV605 Ly6C_asinh_aligned -- Mature neutrophils"   
## [15,] "Exp BV605 Ly6C_asinh_aligned -- Monocytes"            
## [16,] "Exp BV605 Ly6C_asinh_aligned -- Other"                
## [17,] "Exp BV605 Ly6C_asinh_aligned -- T cells"              
## [18,] "Percent of sample -- Immature B cells"                
## [19,] "Percent of sample -- Immature neutrophils"            
## [20,] "Percent of sample -- Mature B cells"                  
## [21,] "Percent of sample -- Mature neutrophils"              
## [22,] "Percent of sample -- Monocytes"                       
## [23,] "Percent of sample -- Other"                           
## [24,] "Percent of sample -- T cells"

Specify which columns we want to plot.

        annot.cols <- c(group.col, batch.col)
        
        plot.cols <- names(sum.dat)[c(4:24)]
        plot.cols

##  [1] "Exp APCCy7 CD48_asinh_aligned -- Immature B cells"    
##  [2] "Exp APCCy7 CD48_asinh_aligned -- Immature neutrophils"
##  [3] "Exp APCCy7 CD48_asinh_aligned -- Mature B cells"      
##  [4] "Exp APCCy7 CD48_asinh_aligned -- Mature neutrophils"  
##  [5] "Exp APCCy7 CD48_asinh_aligned -- Monocytes"           
##  [6] "Exp APCCy7 CD48_asinh_aligned -- Other"               
##  [7] "Exp APCCy7 CD48_asinh_aligned -- T cells"             
##  [8] "Exp BV605 Ly6C_asinh_aligned -- Immature B cells"     
##  [9] "Exp BV605 Ly6C_asinh_aligned -- Immature neutrophils" 
## [10] "Exp BV605 Ly6C_asinh_aligned -- Mature B cells"       
## [11] "Exp BV605 Ly6C_asinh_aligned -- Mature neutrophils"   
## [12] "Exp BV605 Ly6C_asinh_aligned -- Monocytes"            
## [13] "Exp BV605 Ly6C_asinh_aligned -- Other"                
## [14] "Exp BV605 Ly6C_asinh_aligned -- T cells"              
## [15] "Percent of sample -- Immature B cells"                
## [16] "Percent of sample -- Immature neutrophils"            
## [17] "Percent of sample -- Mature B cells"                  
## [18] "Percent of sample -- Mature neutrophils"              
## [19] "Percent of sample -- Monocytes"                       
## [20] "Percent of sample -- Other"                           
## [21] "Percent of sample -- T cells"

Reorder the data such that sample appear in the specify group order.

    ### Reorder summary data and SAVE
        
        sum.dat <- do.reorder(sum.dat, group.col, grp.order)
        sum.dat[,c(1:3)]

##        Sample  Group  Batch
##        <fctr> <fctr> <char>
##  1: Mock_01_A   Mock      A
##  2: Mock_02_A   Mock      A
##  3: Mock_03_A   Mock      A
##  4: Mock_04_A   Mock      A
##  5: Mock_05_B   Mock      B
##  6: Mock_06_B   Mock      B
##  7: Mock_07_B   Mock      B
##  8: Mock_08_B   Mock      B
##  9:  WNV_01_A  Virus      A
## 10:  WNV_02_A  Virus      A
## 11:  WNV_03_A  Virus      A
## 12:  WNV_04_A  Virus      A
## 13:  WNV_05_B  Virus      B
## 14:  WNV_06_B  Virus      B
## 15:  WNV_07_B  Virus      B
## 16:  WNV_08_B  Virus      B

Save the summary data.

        fwrite(sum.dat, 'sum.dat.csv')

Violin/scatter plots

We can use the run.autograph function to create violin/scatter plots with embedded statistic – one per population/measurement type.

    ### Autographs

        for(i in plot.cols){
            
            measure <- gsub("\\ --.*", "", i)
            measure
            
            pop <- gsub("^[^--]*.-- ", "", i)
            pop
            
            make.autograph(sum.dat,
                           x.axis = group.col,
                           y.axis = i,
                           y.axis.label = measure, 
                           violin = FALSE, 
                           colour.by = batch.col,
                           
                           grp.order = grp.order,
                           my_comparisons = comparisons,
                           
                           Variance_test = variance.test,
                           Pairwise_test = pairwise.test,
                           
                           title = pop,
                           subtitle = measure,
                           filename = paste0(i, '.pdf'))
            
        }

Heatmaps

We can also create a global heatmap show the z-score of each population/measurement type against each sample.

    ### Create a fold change heatmap
        
        ## Z-score calculation
        sum.dat.z <- do.zscore(sum.dat, plot.cols, replace = TRUE)
        
        ## Group 
        t.first <- match(grp.order, sum.dat.z[[group.col]])
        t.first <- t.first -1
        t.first
        
        ## Make heatmap
        make.pheatmap(sum.dat.z, 
                      sample.col = sample.col, 
                      plot.cols = plot.cols,
                      is.fold = TRUE, 
                      plot.title = 'Z-score',
                      annot.cols = annot.cols,
                      dendrograms = 'column',
                      row.sep = t.first,
                      cutree_cols = 3)

9. Output session info

Create “Output-info” directory and save session data

For the final step of our setup, we want to record the session info our R session, and save this in a folder we’ll call “Output-info”.

    ### Session info and metadata
        
        setwd(OutputDirectory)
        dir.create("Output - info", showWarnings = FALSE)
        setwd("Output - info")

        sink(file = "session_info.txt", append=TRUE, split=FALSE, type = c("output", "message"))
        sessionInfo()

## R version 4.4.0 (2024-04-24)
## Platform: aarch64-apple-darwin20
## Running under: macOS Ventura 13.5.2
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Europe/London
## tzcode source: internal
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] Biobase_2.62.0       BiocGenerics_0.48.1  CytoNorm_2.0.1      
##  [4] remotes_2.5.0        viridis_0.6.5        viridisLite_0.4.2   
##  [7] uwot_0.2.2           umap_0.2.10.0        scattermore_1.2     
## [10] scales_1.3.0         Rtsne_0.17           rsvd_1.0.5          
## [13] rstudioapi_0.16.0    RColorBrewer_1.1-3   pheatmap_1.0.12     
## [16] patchwork_1.2.0      irlba_2.3.5.1        Matrix_1.7-0        
## [19] gtools_3.9.5         gridExtra_2.3        ggthemes_5.1.0      
## [22] ggpubr_0.6.0         ggpointdensity_0.1.0 FlowSOM_2.10.0      
## [25] igraph_2.0.3         flowCore_2.14.2      factoextra_1.0.7    
## [28] ggplot2_3.5.1        dendsort_0.3.4       data.table_1.15.4   
## [31] colorRamps_2.3.4     Spectre_1.2.0       
## 
## loaded via a namespace (and not attached):
##  [1] rlang_1.1.3                 magrittr_2.0.3             
##  [3] matrixStats_1.3.0           compiler_4.4.0             
##  [5] png_0.1-8                   vctrs_0.6.5                
##  [7] stringr_1.5.1               pkgconfig_2.0.3            
##  [9] fastmap_1.2.0               backports_1.4.1            
## [11] labeling_0.4.3              utf8_1.2.4                 
## [13] rmarkdown_2.27              purrr_1.0.2                
## [15] xfun_0.44                   cachem_1.1.0               
## [17] jsonlite_1.8.8              highr_0.10                 
## [19] tweenr_2.0.3                broom_1.0.6                
## [21] cluster_2.1.6               R6_2.5.1                   
## [23] bslib_0.7.0                 stringi_1.8.4              
## [25] reticulate_1.37.0           car_3.1-2                  
## [27] jquerylib_0.1.4             Rcpp_1.0.12                
## [29] knitr_1.46                  tidyselect_1.2.1           
## [31] abind_1.4-5                 yaml_2.3.8                 
## [33] curl_5.2.1                  lattice_0.22-6             
## [35] tibble_3.2.1                withr_3.0.0                
## [37] askpass_1.2.0               evaluate_0.23              
## [39] polyclip_1.10-6             ConsensusClusterPlus_1.66.0
## [41] pillar_1.9.0                carData_3.0-5              
## [43] stats4_4.4.0                generics_0.1.3             
## [45] S4Vectors_0.40.2            munsell_0.5.1              
## [47] glue_1.7.0                  tools_4.4.0                
## [49] ggnewscale_0.4.10           RSpectra_0.16-1            
## [51] ggsignif_0.6.4              XML_3.99-0.16.1            
## [53] grid_4.4.0                  tidyr_1.3.1                
## [55] RProtoBufLib_2.14.1         colorspace_2.1-0           
## [57] ggforce_0.4.2               cli_3.6.2                  
## [59] fansi_1.0.6                 cytolib_2.14.1             
## [61] dplyr_1.1.4                 gtable_0.3.5               
## [63] rstatix_0.7.2               sass_0.4.9                 
## [65] digest_0.6.35               ggrepel_0.9.5              
## [67] farver_2.1.2                htmltools_0.5.8.1          
## [69] lifecycle_1.0.4             openssl_2.2.0              
## [71] MASS_7.3-60.2

        sink()

We will also save the cellular and clustering columns.

        write(aligned.cellular.cols, "cellular.cols.txt")
        write(aligned.cluster.cols, "cluster.cols.txt")

Spectre - batch alignment and analysis of cytometry data using CytoNorm

Thomas Ashhurst

23/05/2024

Introduction

Citation and methods

Setup

Directories

Analysis script

Data files

1. Packages and directories

Load packages

Set directories

2. Import and prep data

Demo data

Import data

Review data

Merge data

Read in metadata

3. Data transformation

4. Add metadata and set some preferences

Add metadata

Setting preferences

Set downsampling targets for visualisation

5. Batch alignment

Setup

Initial clustering

Full alignment

Review alignment

6. Clustering and dimensionality reduction

7. Annotate clusters

8. Summary data and statistical analysis

Violin/scatter plots

Heatmaps

9. Output session info