<- Back to Spectre home page



Introduction


Overview

The batch alignment and analysis workflow builds on the ‘simple discovery’ workflow by adding a step to facilitate batch alignment. This workflow allows for the correction of technical variation or shifts in signal levels in samples stained and/or acquired across multiple batches. To do this, we have implemented the CytoNorm algorithm (Van Gassen 2020). CytoNorm uses reference control samples that are prepared and recorded along with each batch of samples to identify and correct technical variations between individual batches, while preserving biologically relevant differences. For more information on CytoNorm, see Van Gassen et al 2020, and for more information on our implementation in Spectre, see Ashhurst et al 2021.

The demo dataset used for this worked example are cells extracted from mock- or virally-infected mouse bone marrow, measured by flow cytometry. Expression level values in these datasets have been manipulated to simulate acquisition over two batches.

Reference controls for CytoNorm

An example of reference control samples are aliquots of peripheral blood mononuclear cells (PBMCs) that are derived from a single donor at one time point, and cryopreserved (i.e. multiple aliquots of a biologically identical sample). Each time a set of PBMC samples from the study cohort are thawed, stained, and recorded, a reference controls is also thawed, stained, and recorded. Differences in signal level between the reference controls allows CytoNorm to learn the differences in signal levels due to the batches, and correct them, while preserving biological differences between the individual samples. In our demo dataset, we are using bone marrow samples derived from separate mice. Though not derived from the same mouse, these are similar enough that they can be used successfully as reference controls.

Correction of batch effects with CytoNorm

Reference samples are clustered together using FlowSOM, where we attempt to captures cells from matching populations in each batch in a single metacluster (e.g. neutrophils from batch 1 and 2 are captured in metacluster 1, etc). This assumes a reasonably low level of batch effect, consisting of small shifts in the expression levels for one or more markers. Because the proportion of cells in each population of the reference samples are identical, FlowSOM can use quantile distributions for each marker on each metacluser to create a model which will adjust the data values, removing technical variation between batches. This model is then applied to all samples in each batch.

Requirements

The reference controls can only correct batch effects for markers which are actually expressed on the reference controls. For example, some activation markers may be expressed on samples from the ‘disease’ group, but won’t be present on reference controls derived from healthy donors. In this case, we would simply not attempt to perform alignment on that marker. Typically, stable phenotyping markers (e.g. CD4, CD8) would be expressed strongly enough in reference samples, and would be suitable for alignment.

Other approaches

If you have samples derived from multiple batches, but do not have reference controls processed with each batch, then other forms of batch alignment might be possible – see our data integration options for more information.


Citation and methods


Citation

If you use Spectre in your work, please consider citing Ashhurst TM, Marsh-Wakefield F, Putri GH et al. (2022). Cytometry Part A 101 (3), 237-253. Please also consider citing the authors of the individual packages or tools (e.g. CytoNorm, FlowSOM, tSNE, UMAP, etc) that are critical elements of your analysis work.

Sample methods blurb

Here is a sample methods blurb for this workflow. You may need to adapt this text to reflect any changes made in your analysis.

Computational analysis of data was performed using the Spectre R package (Ashhurst et al., 2022), with instructions and source code provided at https://github.com/ImmuneDynamics/spectre. Samples were initially prepared in FlowJo, and the population of interest was exported as raw value CSV files. Arcsinh transformation was performed on the data in R using a co-factor of 15 to redistribute the data on a linear scale and compress low end values near zero. The dataset was then merged into a single data.table, with keywords denoting the sample, group, and other factors added to each row (cell). The CytoNorm algorithm was used to correct for batch effects in the data using matched reference samples processed in each batch. The FlowSOM algorithm (Van Gassen et al., 2015) was then run on the merged dataset to cluster the data, where every cell is assigned to a specific cluster and metacluster. Subsequently, the data was downsampled and analysed by the dimensionality reduction algorithm Uniform Manifold Approximation and Projection (UMAP) (McInnes, Healy, Melville, 2018) for cellular visualisation.


Setup


Directories

Create a master folder with a meaningful name. Then inside that folder, insert the following:

  • One folder called ‘data’ – this will contain your data CSV or FCS files
  • One folder called ‘metadata’ – this will contain a CSV containg your sample metadata
  • One folder called ‘Spectre CytoNorm’ or similar – place this analysis script there

Example:

  BM analysis
    /data
        -- Contains data files, one CSV or FCS per sample
    /metadata
        -- Contains a CSV containing sample metadata (group, batch, etc)
    /Spectre CytoNorm
        -- CytoNorm workflow.R


Analysis script

You can download the CytoNorm script from this link – place this inside the Spectre CytoNorm folder.


Data files

If you would to use the demo data as a test run for the CytoNorm workflow, nothing to do at this step. Simply follow the relevant instructions further down this page to download the demo data (under 2. Import and prep data).

If you would like to use your own data, add your data and metadata files:

  • Place the sample CSV or FCS files in the data folder you created above.
  • Place the metadata CSV files in the metadata folder you created above.

Please see this page for detailed instructions on exporting data for Spectre and setting up a metadata file.



1. Packages and directories


Load packages

Load the Spectre and other required libraries

Running library(Spectre) will load the Spectre package. We can then use package.check() to see if the standard dependency packages are installed, and package.load() to load those packages.

    ### Load libraries

        library(Spectre)
        Spectre::package.check()    # Check that all required packages are installed
        Spectre::package.load()     # Load required packages

    ### Install CytoNorm package
        
        if(!require('remotes')) {install.packages('remotes')}
        library('remotes')
        remotes::install_github(repo = "saeyslab/CytoNorm")


Set directories

Set ‘PrimaryDirectory’

Initially, we will set the location of the script as PrimaryDirectory. We’ll use this as a sort of ‘home page’ for where our analysis is going to be performed – including where to find our input data, metadata, and where our output data will go.

    ### Set PrimaryDirectory
        
        dirname(rstudioapi::getActiveDocumentContext()$path)            # Finds the directory where this script is located
        setwd(dirname(rstudioapi::getActiveDocumentContext()$path))     # Sets the working directory to where the script is located
        getwd()
        PrimaryDirectory <- getwd()
        PrimaryDirectory

Set ‘InputDirectory’

Next we need to set the location of the ‘data’ folder – where our samples for analysis are stored. In this example they are stored in a sub-folder called data.

    ### Set 'input' directory
        
        setwd(PrimaryDirectory)
        dir.create('../data', showWarnings = FALSE)
        setwd("../data/")
        InputDirectory <- getwd()
        setwd(PrimaryDirectory)

Set ‘MetaDirectory’

We need to set the location of the ‘metadata’ folder. This is where we can store a CSV file that contains any relevant metadata that we want to embed in our samples. In this example, it is located in a sub-folder called metadata.

    ### Set 'metadata' directory
        
        setwd(PrimaryDirectory)
        dir.create('../metadata', showWarnings = FALSE)
        setwd("../metadata/")
        MetaDirectory <- getwd()
        setwd(PrimaryDirectory)

Create ‘OutputDirectory’

We need to create a folder where our output data can go once our analysis is finished. In this example we will call this Output_Spectre.

    ### Create output directory
        
        setwd(PrimaryDirectory)
        dir.create("Output_Spectre", showWarnings = FALSE)
        setwd("Output_Spectre")
        OutputDirectory <- getwd()
        setwd(PrimaryDirectory)



2. Import and prep data


Demo data

If you need the demo dataset, uncomment the following code in the analysis script (select all, CMD+SHIFT+C) and run to download. If you are using your own datasets, then skip this step.

This code will download the demo dataset files and metadata file, and place them in the data and metadata folders respectively.

# setwd(PrimaryDirectory)
# setwd("../")
# getwd()
# download.file(url = "https://github.com/ImmuneDynamics/data/blob/main/simBatches.zip?raw=TRUE", destfile = 'simBatches.zip', mode = 'wb')
# unzip(zipfile = 'simBatches.zip')
# for(i in list.files('simBatches/data', full.names = TRUE)){
#   file.rename(from = i,  to = gsub('simBatches/', '', i))
# }
# for(i in list.files('simBatches/metadata', full.names = TRUE)){
#   file.rename(from = i,  to = gsub('simBatches/', '', i))
# }
# unlink(c('simBatches/', 'simBatches.zip', '__MACOSX'), recursive = TRUE)


Import data

To begin, we will change our working directory to ‘InputDirectory’ and list all the files in that directory – CSV files in this example.

    ### Import data

        setwd(InputDirectory)
        list.files(InputDirectory, ".csv")
##  [1] "Mock_01_A.csv"  "Mock_02_A.csv"  "Mock_03_A.csv"  "Mock_04_A.csv" 
##  [5] "Mock_05_B.csv"  "Mock_06_B.csv"  "Mock_07_B.csv"  "Mock_08_B.csv" 
##  [9] "Virus_01_A.csv" "Virus_02_A.csv" "Virus_03_A.csv" "Virus_04_A.csv"
## [13] "Virus_05_B.csv" "Virus_06_B.csv" "Virus_07_B.csv" "Virus_08_B.csv"

We can then read in all of our samples (in this example, one CSV file per sample) into a list called data.list. Spectre uses the data.table framework to store data, which reads, writes, and performs operations on data very quickly.

        data.list <- Spectre::read.files(file.loc = InputDirectory,
                                         file.type = ".csv",
                                         do.embed.file.names = TRUE)

Next we will review the data files by running the do.list.summary() function.


Review data

    ### Check the data

        check <- do.list.summary(data.list)

        check$name.table # Review column names and their subsequent values
##              X1            X2         X3         X4          X5           X6
## 1  PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 2  PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 3  PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 4  PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 5  PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 6  PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 7  PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 8  PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 9  PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 10 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 11 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 12 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 13 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 14 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 15 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
## 16 PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48 BUV395 CD11b
##             X7         X8       X9    X10
## 1  BUV737 B220 BV605 Ly6C FileName FileNo
## 2  BUV737 B220 BV605 Ly6C FileName FileNo
## 3  BUV737 B220 BV605 Ly6C FileName FileNo
## 4  BUV737 B220 BV605 Ly6C FileName FileNo
## 5  BUV737 B220 BV605 Ly6C FileName FileNo
## 6  BUV737 B220 BV605 Ly6C FileName FileNo
## 7  BUV737 B220 BV605 Ly6C FileName FileNo
## 8  BUV737 B220 BV605 Ly6C FileName FileNo
## 9  BUV737 B220 BV605 Ly6C FileName FileNo
## 10 BUV737 B220 BV605 Ly6C FileName FileNo
## 11 BUV737 B220 BV605 Ly6C FileName FileNo
## 12 BUV737 B220 BV605 Ly6C FileName FileNo
## 13 BUV737 B220 BV605 Ly6C FileName FileNo
## 14 BUV737 B220 BV605 Ly6C FileName FileNo
## 15 BUV737 B220 BV605 Ly6C FileName FileNo
## 16 BUV737 B220 BV605 Ly6C FileName FileNo
        check$ncol.check # Review number of columns (features, markers) in each sample
##       [,1]
##  [1,] 10  
##  [2,] 10  
##  [3,] 10  
##  [4,] 10  
##  [5,] 10  
##  [6,] 10  
##  [7,] 10  
##  [8,] 10  
##  [9,] 10  
## [10,] 10  
## [11,] 10  
## [12,] 10  
## [13,] 10  
## [14,] 10  
## [15,] 10  
## [16,] 10
        check$nrow.check # Review number of rows (cells) in each sample
##       [,1] 
##  [1,] 10000
##  [2,] 10000
##  [3,] 10000
##  [4,] 10000
##  [5,] 10000
##  [6,] 10000
##  [7,] 10000
##  [8,] 10000
##  [9,] 10000
## [10,] 10000
## [11,] 10000
## [12,] 10000
## [13,] 10000
## [14,] 10000
## [15,] 10000
## [16,] 10000

You can review the first sample by running data.list[[1]].

        data.list[[1]]
##        PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48
##               <num>         <num>      <num>      <num>       <num>
##     1:      51.6403      1239.360   -681.257   1752.270    2600.430
##     2:      58.8824      -140.921   -267.039   1188.640     392.962
##     3:      60.0077      2191.260   -362.088   1676.960    1441.300
##     4:     313.9580      2532.700   -438.388    478.109    1809.220
##     5:     -83.0391      1603.390  -1200.350    564.994    3946.200
##    ---                                                             
##  9996:    -121.9210      4626.660   7377.120   1600.950     197.503
##  9997:    5635.1700       807.513   -831.751   1750.210    2388.820
##  9998:     132.8540       185.348    780.195    611.077    1471.310
##  9999:     -78.8966      2564.800  10150.800   2161.430     264.904
## 10000:      17.2606       531.683   -801.073    350.229    1963.560
##        BUV395 CD11b BUV737 B220 BV605 Ly6C  FileName FileNo
##               <num>       <num>      <num>    <char>  <int>
##     1:      627.259  17763.6000   128.9340 Mock_01_A      1
##     2:      304.615  11834.8000   907.3650 Mock_01_A      1
##     3:     4740.710    185.9470  3915.8200 Mock_01_A      1
##     4:     9440.380   1056.2900  6221.6700 Mock_01_A      1
##     5:     1636.110   4579.8700   274.5270 Mock_01_A      1
##    ---                                                     
##  9996:    12732.300    -18.9127  3826.8000 Mock_01_A      1
##  9997:     1106.890   -394.7910    79.3191 Mock_01_A      1
##  9998:      823.702   6442.3800  -345.6880 Mock_01_A      1
##  9999:    12933.400   -106.9610  8881.1600 Mock_01_A      1
## 10000:      793.921   8161.8700   288.1360 Mock_01_A      1

Merge data

Merge data.tables

We can merge the data into a single data.table using do.merge.files(). By default, columns with matching names will be aligned in the new table, and any columns that are present in some samples, but not others, will be added and filled with ‘NA’ for any samples that didn’t have that column initially.

    ### Merge data

        cell.dat <- Spectre::do.merge.files(dat = data.list)
        cell.dat
##         PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48
##                <num>         <num>      <num>      <num>       <num>
##      1:      51.6403     1239.3600   -681.257   1752.270    2600.430
##      2:      58.8824     -140.9210   -267.039   1188.640     392.962
##      3:      60.0077     2191.2600   -362.088   1676.960    1441.300
##      4:     313.9580     2532.7000   -438.388    478.109    1809.220
##      5:     -83.0391     1603.3900  -1200.350    564.994    3946.200
##     ---                                                             
## 159996:     106.3880     1661.1200   -114.892   2138.700    8311.230
## 159997:    1750.2300       65.4275   -843.984  19655.900    7874.900
## 159998:    -159.2870     5363.3900  17389.300   2578.650    -362.093
## 159999:     -62.6836     3132.7100   1941.660   1548.130     577.625
## 160000:     650.9440     1991.4900    254.429   1677.730    2209.210
##         BUV395 CD11b BUV737 B220 BV605 Ly6C   FileName FileNo
##                <num>       <num>      <num>     <char>  <int>
##      1:      627.259   17763.600    128.934  Mock_01_A      1
##      2:      304.615   11834.800    907.365  Mock_01_A      1
##      3:     4740.710     185.947   3915.820  Mock_01_A      1
##      4:     9440.380    1056.290   6221.670  Mock_01_A      1
##      5:     1636.110    4579.870    274.527  Mock_01_A      1
##     ---                                                      
## 159996:      127.068     258.936   6161.070 Virus_08_B     16
## 159997:      122.179     550.596    132.445 Virus_08_B     16
## 159998:    12091.600     121.659   4257.980 Virus_08_B     16
## 159999:    12904.000     338.244   1030.080 Virus_08_B     16
## 160000:    22242.300     789.639  12467.000 Virus_08_B     16


Read in metadata

Read in sample metadata.

    ### Read in metadata  
       
        setwd(MetaDirectory)
        
        meta.dat <- fread("sample.details.csv")
        meta.dat
##           FileName    Sample  Group  Batch
##             <char>    <char> <char> <char>
##  1:  Mock_01_A.csv Mock_01_A   Mock      A
##  2:  Mock_02_A.csv Mock_02_A   Mock      A
##  3:  Mock_03_A.csv Mock_03_A   Mock      A
##  4:  Mock_04_A.csv Mock_04_A   Mock      A
##  5: Virus_01_A.csv  WNV_01_A  Virus      A
##  6: Virus_02_A.csv  WNV_02_A  Virus      A
##  7: Virus_03_A.csv  WNV_03_A  Virus      A
##  8: Virus_04_A.csv  WNV_04_A  Virus      A
##  9:  Mock_05_B.csv Mock_05_B   Mock      B
## 10:  Mock_06_B.csv Mock_06_B   Mock      B
## 11:  Mock_07_B.csv Mock_07_B   Mock      B
## 12:  Mock_08_B.csv Mock_08_B   Mock      B
## 13: Virus_05_B.csv  WNV_05_B  Virus      B
## 14: Virus_06_B.csv  WNV_06_B  Virus      B
## 15: Virus_07_B.csv  WNV_07_B  Virus      B
## 16: Virus_08_B.csv  WNV_08_B  Virus      B



3. Data transformation


Before we perform clustering etc, we need to meaningfully transform the data. For more information on why this is necessary, please see this page.

    setwd(OutputDirectory)
    dir.create("Output 1 - transformed plots")
## Warning in dir.create("Output 1 - transformed plots"): 'Output 1 - transformed
## plots' already exists
    setwd("Output 1 - transformed plots")

First, check the column names of the dataset.

    ### Arcsinh transformation

        as.matrix(names(cell.dat))
##       [,1]           
##  [1,] "PECy5-5 CD3e" 
##  [2,] "PECy7 CD16-32"
##  [3,] "DL800 Ly6G"   
##  [4,] "AF700 CD45"   
##  [5,] "APCCy7 CD48"  
##  [6,] "BUV395 CD11b" 
##  [7,] "BUV737 B220"  
##  [8,] "BV605 Ly6C"   
##  [9,] "FileName"     
## [10,] "FileNo"

In this example, the columns we want to apply arcsinh transformation to are the cellular columns – column 1 to column 8. We can specify those columns using the code below.

        to.asinh <- names(cell.dat)[c(1:8)]
        to.asinh
## [1] "PECy5-5 CD3e"  "PECy7 CD16-32" "DL800 Ly6G"    "AF700 CD45"   
## [5] "APCCy7 CD48"   "BUV395 CD11b"  "BUV737 B220"   "BV605 Ly6C"

Define the cofactor we will use for transformation. As a general recommendation, we suggest using cofactor = 15 for CyTOF data, and cofactor between 100 and 1000 for flow data (we suggest 500 as a starting point). For more detailed guidance on setting the co-factors, see this page.

        cofactor <- 500

You can also choose a column to use for plotting the transformed result – ideally something that is expressed on a variety of cell types in your dataset.

        plot.against <- "BV605 Ly6C_asinh"

Now we need to apply arcsinh transformation to the data in those columns, using a specific co-factor.

        cell.dat <- do.asinh(cell.dat, to.asinh, cofactor = cofactor)

We can then make some plots to see if the arcsinh transformation is appropriate. Check the plots and see if you are happy with the transformation. If happy, the proceed with analysis. Otherwise, go back to the merging of the data.list (to create cell.dat) and try with another co-factor.

        transformed.cols <- paste0(to.asinh, "_asinh")

        for(i in transformed.cols){
          make.colour.plot(do.subsample(cell.dat, 20000), i, plot.against)
        }



4. Add metadata and set some preferences


Add metadata

We also want to attach some sample metadata, to aid with our analysis.

    ### Add metadata to data.table

        meta.dat
##           FileName    Sample  Group  Batch
##             <char>    <char> <char> <char>
##  1:  Mock_01_A.csv Mock_01_A   Mock      A
##  2:  Mock_02_A.csv Mock_02_A   Mock      A
##  3:  Mock_03_A.csv Mock_03_A   Mock      A
##  4:  Mock_04_A.csv Mock_04_A   Mock      A
##  5: Virus_01_A.csv  WNV_01_A  Virus      A
##  6: Virus_02_A.csv  WNV_02_A  Virus      A
##  7: Virus_03_A.csv  WNV_03_A  Virus      A
##  8: Virus_04_A.csv  WNV_04_A  Virus      A
##  9:  Mock_05_B.csv Mock_05_B   Mock      B
## 10:  Mock_06_B.csv Mock_06_B   Mock      B
## 11:  Mock_07_B.csv Mock_07_B   Mock      B
## 12:  Mock_08_B.csv Mock_08_B   Mock      B
## 13: Virus_05_B.csv  WNV_05_B  Virus      B
## 14: Virus_06_B.csv  WNV_06_B  Virus      B
## 15: Virus_07_B.csv  WNV_07_B  Virus      B
## 16: Virus_08_B.csv  WNV_08_B  Virus      B

Here we will select just the columns we would like to add to our data. In this example we only want to include use first four columns: Filename, Sample, Group, and Batch. We will use Filename for matching between cell.dat and meta.dat, and the other three columns will be the information that gets added to cell.dat.

        sample.info <- meta.dat[,c(1:4)]
        sample.info
##           FileName    Sample  Group  Batch
##             <char>    <char> <char> <char>
##  1:  Mock_01_A.csv Mock_01_A   Mock      A
##  2:  Mock_02_A.csv Mock_02_A   Mock      A
##  3:  Mock_03_A.csv Mock_03_A   Mock      A
##  4:  Mock_04_A.csv Mock_04_A   Mock      A
##  5: Virus_01_A.csv  WNV_01_A  Virus      A
##  6: Virus_02_A.csv  WNV_02_A  Virus      A
##  7: Virus_03_A.csv  WNV_03_A  Virus      A
##  8: Virus_04_A.csv  WNV_04_A  Virus      A
##  9:  Mock_05_B.csv Mock_05_B   Mock      B
## 10:  Mock_06_B.csv Mock_06_B   Mock      B
## 11:  Mock_07_B.csv Mock_07_B   Mock      B
## 12:  Mock_08_B.csv Mock_08_B   Mock      B
## 13: Virus_05_B.csv  WNV_05_B  Virus      B
## 14: Virus_06_B.csv  WNV_06_B  Virus      B
## 15: Virus_07_B.csv  WNV_07_B  Virus      B
## 16: Virus_08_B.csv  WNV_08_B  Virus      B

Now we can add this information to cell.dat. Essentially, the file names are listed in the metadata table, and we can use that to add any listed metadata in the table to the corresponding files in data.list. We can review the data to ensure the metadata has been correctly embedded.

        cell.dat <- do.add.cols(cell.dat, "FileName", sample.info, "FileName", rmv.ext = TRUE)
## Removing '.csv' or '.fcs' extension
## Step 1/3. Mapping data
## Step 2/3. Merging data
## Step 3/3. Returning data
        cell.dat
##         PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48
##                <num>         <num>      <num>      <num>       <num>
##      1:      51.6403     1239.3600   -681.257   1752.270    2600.430
##      2:      58.8824     -140.9210   -267.039   1188.640     392.962
##      3:      60.0077     2191.2600   -362.088   1676.960    1441.300
##      4:     313.9580     2532.7000   -438.388    478.109    1809.220
##      5:     -83.0391     1603.3900  -1200.350    564.994    3946.200
##     ---                                                             
## 159996:     106.3880     1661.1200   -114.892   2138.700    8311.230
## 159997:    1750.2300       65.4275   -843.984  19655.900    7874.900
## 159998:    -159.2870     5363.3900  17389.300   2578.650    -362.093
## 159999:     -62.6836     3132.7100   1941.660   1548.130     577.625
## 160000:     650.9440     1991.4900    254.429   1677.730    2209.210
##         BUV395 CD11b BUV737 B220 BV605 Ly6C   FileName FileNo
##                <num>       <num>      <num>     <fctr>  <int>
##      1:      627.259   17763.600    128.934  Mock_01_A      1
##      2:      304.615   11834.800    907.365  Mock_01_A      1
##      3:     4740.710     185.947   3915.820  Mock_01_A      1
##      4:     9440.380    1056.290   6221.670  Mock_01_A      1
##      5:     1636.110    4579.870    274.527  Mock_01_A      1
##     ---                                                      
## 159996:      127.068     258.936   6161.070 Virus_08_B     16
## 159997:      122.179     550.596    132.445 Virus_08_B     16
## 159998:    12091.600     121.659   4257.980 Virus_08_B     16
## 159999:    12904.000     338.244   1030.080 Virus_08_B     16
## 160000:    22242.300     789.639  12467.000 Virus_08_B     16
##         PECy5-5 CD3e_asinh PECy7 CD16-32_asinh DL800 Ly6G_asinh
##                      <num>               <num>            <num>
##      1:          0.1030979           1.6392988       -1.1159989
##      2:          0.1174943          -0.2782380       -0.5114826
##      3:          0.1197291           2.1835405       -0.6723598
##      4:          0.5926148           2.3251843       -0.7915047
##      5:         -0.1653241           1.8818839       -1.6097071
##     ---                                                        
## 159996:          0.2112023           1.9157040       -0.2278085
## 159997:          1.9658468           0.1304844       -1.2947039
## 159998:         -0.3134175           3.0680563        4.2423561
## 159999:         -0.1250411           2.5345014        2.0660180
## 160000:          1.0796017           2.0905763        0.4891206
##         AF700 CD45_asinh APCCy7 CD48_asinh BUV395 CD11b_asinh BUV737 B220_asinh
##                    <num>             <num>              <num>             <num>
##      1:        1.9669670         2.3510881          1.0504123         4.2636438
##      2:        1.6006640         0.7216390          0.5767234         3.8577846
##      3:        1.9247953         1.7806527          2.9452507         0.3638149
##      4:        0.8500738         1.9977594          3.6319912         1.4928782
##      5:        0.9703694         2.7630370          1.9011864         2.9109315
##     ---                                                                        
## 159996:        2.1598848         3.5048055          0.2514770         0.4971396
## 159997:        4.3648336         3.4509812          0.2419893         0.9511485
## 159998:        2.3428298        -0.6723679          3.8792326         0.2409789
## 159999:        1.8484549         0.9870066          3.9442068         0.6332984
## 160000:        1.9252352         2.1914959          4.4884166         1.2379493
##         BV605 Ly6C_asinh    Sample  Group  Batch
##                    <num>    <char> <char> <char>
##      1:        0.2550924 Mock_01_A   Mock      A
##      2:        1.3575720 Mock_01_A   Mock      A
##      3:        2.7553704 Mock_01_A   Mock      A
##      4:        3.2159434 Mock_01_A   Mock      A
##      5:        0.5246514 Mock_01_A   Mock      A
##     ---                                         
## 159996:        3.2061873  WNV_08_B  Virus      B
## 159997:        0.2618862  WNV_08_B  Virus      B
## 159998:        2.8385188  WNV_08_B  Virus      B
## 159999:        1.4702206  WNV_08_B  Virus      B
## 160000:        3.9097814  WNV_08_B  Virus      B


Setting preferences

Check the column names.

    ### Define cellular columns

        as.matrix(names(cell.dat))
##       [,1]                 
##  [1,] "PECy5-5 CD3e"       
##  [2,] "PECy7 CD16-32"      
##  [3,] "DL800 Ly6G"         
##  [4,] "AF700 CD45"         
##  [5,] "APCCy7 CD48"        
##  [6,] "BUV395 CD11b"       
##  [7,] "BUV737 B220"        
##  [8,] "BV605 Ly6C"         
##  [9,] "FileName"           
## [10,] "FileNo"             
## [11,] "PECy5-5 CD3e_asinh" 
## [12,] "PECy7 CD16-32_asinh"
## [13,] "DL800 Ly6G_asinh"   
## [14,] "AF700 CD45_asinh"   
## [15,] "APCCy7 CD48_asinh"  
## [16,] "BUV395 CD11b_asinh" 
## [17,] "BUV737 B220_asinh"  
## [18,] "BV605 Ly6C_asinh"   
## [19,] "Sample"             
## [20,] "Group"              
## [21,] "Batch"

Specify columns that represent transformed cellular features (in this case, the arcsinh transformed data, defined by “_asinh”). In this case, columns 11 to 18.

        cellular.cols <- names(cell.dat)[c(11:18)]
        as.matrix(cellular.cols)
##      [,1]                 
## [1,] "PECy5-5 CD3e_asinh" 
## [2,] "PECy7 CD16-32_asinh"
## [3,] "DL800 Ly6G_asinh"   
## [4,] "AF700 CD45_asinh"   
## [5,] "APCCy7 CD48_asinh"  
## [6,] "BUV395 CD11b_asinh" 
## [7,] "BUV737 B220_asinh"  
## [8,] "BV605 Ly6C_asinh"

Additionally, specify the columns that will be used to generate cluster and tSNE/UMAP results. Columns that are not specified here will still be analysed, but won’t contributed to the generation of clusters. There are a couple of strategies to take here: use all cellular columns for clustering to looks for all possible cell types/states, or use only stably expressed markers to cluster stable phenotypes, which can then be examined for changes in more dynamic markers. For more guidance, see this page.

    ### Define clustering columns    
        
        as.matrix(names(cell.dat))
##       [,1]                 
##  [1,] "PECy5-5 CD3e"       
##  [2,] "PECy7 CD16-32"      
##  [3,] "DL800 Ly6G"         
##  [4,] "AF700 CD45"         
##  [5,] "APCCy7 CD48"        
##  [6,] "BUV395 CD11b"       
##  [7,] "BUV737 B220"        
##  [8,] "BV605 Ly6C"         
##  [9,] "FileName"           
## [10,] "FileNo"             
## [11,] "PECy5-5 CD3e_asinh" 
## [12,] "PECy7 CD16-32_asinh"
## [13,] "DL800 Ly6G_asinh"   
## [14,] "AF700 CD45_asinh"   
## [15,] "APCCy7 CD48_asinh"  
## [16,] "BUV395 CD11b_asinh" 
## [17,] "BUV737 B220_asinh"  
## [18,] "BV605 Ly6C_asinh"   
## [19,] "Sample"             
## [20,] "Group"              
## [21,] "Batch"
        cluster.cols <- names(cell.dat)[c(11:18)]
        as.matrix(cluster.cols)
##      [,1]                 
## [1,] "PECy5-5 CD3e_asinh" 
## [2,] "PECy7 CD16-32_asinh"
## [3,] "DL800 Ly6G_asinh"   
## [4,] "AF700 CD45_asinh"   
## [5,] "APCCy7 CD48_asinh"  
## [6,] "BUV395 CD11b_asinh" 
## [7,] "BUV737 B220_asinh"  
## [8,] "BV605 Ly6C_asinh"

Here we can also specify sample, group, and batch columns.

    ### Define other key columns
        
        as.matrix(names(cell.dat))
##       [,1]                 
##  [1,] "PECy5-5 CD3e"       
##  [2,] "PECy7 CD16-32"      
##  [3,] "DL800 Ly6G"         
##  [4,] "AF700 CD45"         
##  [5,] "APCCy7 CD48"        
##  [6,] "BUV395 CD11b"       
##  [7,] "BUV737 B220"        
##  [8,] "BV605 Ly6C"         
##  [9,] "FileName"           
## [10,] "FileNo"             
## [11,] "PECy5-5 CD3e_asinh" 
## [12,] "PECy7 CD16-32_asinh"
## [13,] "DL800 Ly6G_asinh"   
## [14,] "AF700 CD45_asinh"   
## [15,] "APCCy7 CD48_asinh"  
## [16,] "BUV395 CD11b_asinh" 
## [17,] "BUV737 B220_asinh"  
## [18,] "BV605 Ly6C_asinh"   
## [19,] "Sample"             
## [20,] "Group"              
## [21,] "Batch"
        exp.name <- "BM experiment"
        
        sample.col <- "Sample"
        group.col <- "Group"
        batch.col <- "Batch"


Set downsampling targets for visualisation

Additionally, we want to specify the downsample targets for dimensionality reduction. This influences how many cells will be shown on a tSNE/UMAP plot, and we are specifying the number of cells per group to downsample to. Check for the number of cells (rows) in each group.

    ### Subsample targets per group

        data.frame(table(cell.dat[[group.col]])) # Check number of cells per sample.
##    Var1  Freq
## 1  Mock 80000
## 2 Virus 80000

You can then specify the number to downsample to in each group. These must be lower than the total number of cells in each group, and must be provided in the order that the group names appear in the dataset when running as.matrix(unique(cell.dat[[group.col]])).

In this example we want 10000 cells from ‘mock’ and 10000 cells from ‘Virus’ so that we have balanced numbers visualised between each group.

        as.matrix(unique(cell.dat[[group.col]]))
##      [,1]   
## [1,] "Mock" 
## [2,] "Virus"
        sub.targets <- c(10000, 10000) # target subsample numbers from each group
        sub.targets
## [1] 10000 10000



5. Batch alignment


Setup

Here we need to setup some preferences for batch alignment.

    ### Extract reference samples
        
        sample.info
##           FileName    Sample  Group  Batch
##             <char>    <char> <char> <char>
##  1:  Mock_01_A.csv Mock_01_A   Mock      A
##  2:  Mock_02_A.csv Mock_02_A   Mock      A
##  3:  Mock_03_A.csv Mock_03_A   Mock      A
##  4:  Mock_04_A.csv Mock_04_A   Mock      A
##  5: Virus_01_A.csv  WNV_01_A  Virus      A
##  6: Virus_02_A.csv  WNV_02_A  Virus      A
##  7: Virus_03_A.csv  WNV_03_A  Virus      A
##  8: Virus_04_A.csv  WNV_04_A  Virus      A
##  9:  Mock_05_B.csv Mock_05_B   Mock      B
## 10:  Mock_06_B.csv Mock_06_B   Mock      B
## 11:  Mock_07_B.csv Mock_07_B   Mock      B
## 12:  Mock_08_B.csv Mock_08_B   Mock      B
## 13: Virus_05_B.csv  WNV_05_B  Virus      B
## 14: Virus_06_B.csv  WNV_06_B  Virus      B
## 15: Virus_07_B.csv  WNV_07_B  Virus      B
## 16: Virus_08_B.csv  WNV_08_B  Virus      B

In this case we need to select which samples will serve as our ‘reference’ samples. These should be individual aliquots from a single biological samples that have been stain/processed in each batch (e.g. cryopreserved PBMCs from a single blood collection event from a single individual). In this example, we have selected one bone marrow sample from a mock infected mice in each batch.

        as.matrix(unique(cell.dat[[sample.col]]))
##       [,1]       
##  [1,] "Mock_01_A"
##  [2,] "Mock_02_A"
##  [3,] "Mock_03_A"
##  [4,] "Mock_04_A"
##  [5,] "Mock_05_B"
##  [6,] "Mock_06_B"
##  [7,] "Mock_07_B"
##  [8,] "Mock_08_B"
##  [9,] "WNV_01_A" 
## [10,] "WNV_02_A" 
## [11,] "WNV_03_A" 
## [12,] "WNV_04_A" 
## [13,] "WNV_05_B" 
## [14,] "WNV_06_B" 
## [15,] "WNV_07_B" 
## [16,] "WNV_08_B"

Here we are selecting the two samples: Mock_01_A, from batch A and Mock_05_B from batch B, which will be our reference samples.

        refs <- unique(cell.dat[[sample.col]])[c(1,5)]
        refs
## [1] "Mock_01_A" "Mock_05_B"

Now we can create a new dataset (ref.dat) containing only cells from the reference samples (in this example, Mock_01_A from batch A, and Mock_05_B from batch B)

        ref.dat <- do.filter(cell.dat, sample.col, refs)
        ref.dat
##        PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48
##               <num>         <num>      <num>      <num>       <num>
##     1:     51.64030      1239.360  -681.2570   1752.270    2600.430
##     2:     58.88240      -140.921  -267.0390   1188.640     392.962
##     3:     60.00770      2191.260  -362.0880   1676.960    1441.300
##     4:    313.95800      2532.700  -438.3880    478.109    1809.220
##     5:    -83.03910      1603.390 -1200.3500    564.994    3946.200
##    ---                                                             
## 19996:    -30.65640       158.361 -1988.0000   5905.010    8831.280
## 19997:    148.32100      5819.820 15918.1000   2497.000     751.419
## 19998:    669.96300      3373.540   -33.8201   1144.100    1320.440
## 19999:    320.50700      4386.740  7465.1800   2107.970    -134.778
## 20000:     -9.93316      5929.620 11076.1000   1861.070     267.681
##        BUV395 CD11b BUV737 B220 BV605 Ly6C  FileName FileNo PECy5-5 CD3e_asinh
##               <num>       <num>      <num>    <fctr>  <int>              <num>
##     1:     627.2590  17763.6000    128.934 Mock_01_A      1         0.10309786
##     2:     304.6150  11834.8000    907.365 Mock_01_A      1         0.11749428
##     3:    4740.7100    185.9470   3915.820 Mock_01_A      1         0.11972914
##     4:    9440.3800   1056.2900   6221.670 Mock_01_A      1         0.59261483
##     5:    1636.1100   4579.8700    274.527 Mock_01_A      1        -0.16532406
##    ---                                                                        
## 19996:     -77.0832   9662.6800    568.947 Mock_05_B      5        -0.06127445
## 19997:   13401.7000    -91.9106   2209.480 Mock_05_B      5         0.29245518
## 19998:    4762.0700    781.4270   4849.040 Mock_05_B      5         1.10256164
## 19999:   15413.4000    493.2230   1767.920 Mock_05_B      5         0.60367460
## 20000:   11084.1000     50.4877   1180.210 Mock_05_B      5        -0.01986501
##        PECy7 CD16-32_asinh DL800 Ly6G_asinh AF700 CD45_asinh APCCy7 CD48_asinh
##                      <num>            <num>            <num>             <num>
##     1:           1.6392988      -1.11599889        1.9669670         2.3510881
##     2:          -0.2782380      -0.51148258        1.6006640         0.7216390
##     3:           2.1835405      -0.67235984        1.9247953         1.7806527
##     4:           2.3251843      -0.79150471        0.8500738         1.9977594
##     5:           1.8818839      -1.60970711        0.9703694         2.7630370
##    ---                                                                        
## 19996:           0.3116525      -2.08887520        3.1638831         3.5653947
## 19997:           3.1494039       4.15399776        2.3112610         1.1963364
## 19998:           2.6077040      -0.06758873        1.5655634         1.6983185
## 19999:           2.8681129       3.39766352        2.1457973        -0.2663940
## 20000:           3.1680273       3.79159306        2.0250211         0.5126149
##        BUV395 CD11b_asinh BUV737 B220_asinh BV605 Ly6C_asinh    Sample  Group
##                     <num>             <num>            <num>    <char> <char>
##     1:          1.0504123         4.2636438        0.2550924 Mock_01_A   Mock
##     2:          0.5767234         3.8577846        1.3575720 Mock_01_A   Mock
##     3:          2.9452507         0.3638149        2.7553704 Mock_01_A   Mock
##     4:          3.6319912         1.4928782        3.2159434 Mock_01_A   Mock
##     5:          1.9011864         2.9109315        0.5246514 Mock_01_A   Mock
##    ---                                                                       
## 19996:         -0.1535622         3.6552341        0.9755986 Mock_05_B   Mock
## 19997:          3.9820237        -0.1828014        2.1916151 Mock_05_B   Mock
## 19998:          2.9497215         1.2291302        2.9677227 Mock_05_B   Mock
## 19999:          4.1217946         0.8717569        1.9755202 Mock_05_B   Mock
## 20000:          3.7923143         0.1008046        1.5941070 Mock_05_B   Mock
##         Batch
##        <char>
##     1:      A
##     2:      A
##     3:      A
##     4:      A
##     5:      A
##    ---       
## 19996:      B
## 19997:      B
## 19998:      B
## 19999:      B
## 20000:      B


Initial clustering

First, we are going to align the data of only the two reference samples.

Setup a sub-directory for some pre-alignment plots.

    ### Initial clustering
        
        setwd(OutputDirectory)
        dir.create("Output 2 - alignment")
## Warning in dir.create("Output 2 - alignment"): 'Output 2 - alignment' already
## exists
        setwd("Output 2 - alignment")

Run prep.cytonorm to initialise a cytonorm object, including clustering with FlowSOM. The number of metaclustes will be determined automatically, or you can specify a desired number of metaclusters by using the additional argument meta.k (e.g. meta.k = 20).

        cytnrm <- prep.cytonorm(dat = ref.dat, 
                                cellular.cols = cellular.cols, 
                                cluster.cols = cluster.cols, 
                                batch.col = batch.col, 
                                sample.col = sample.col)
## Loading required package: CytoNorm
## Loading required package: Biobase
## Loading required package: BiocGenerics
## 
## Attaching package: 'BiocGenerics'
## The following object is masked from 'package:gridExtra':
## 
##     combine
## The following objects are masked from 'package:igraph':
## 
##     normalize, path, union
## The following object is masked from 'package:flowCore':
## 
##     normalize
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
##     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
##     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
##     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
##     Position, rank, rbind, Reduce, rownames, sapply, setdiff, table,
##     tapply, union, unique, unsplit, which.max, which.min
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## Working directory is '/Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm'
## Step 1/3. Mapping data
## Step 2/3. Merging data
## Step 3/3. Returning data
## Step 1/4 - Splitting files for use with original FlowSOM function
## Step 2/4 - Running FlowSOM
## Reading A.fcs
## Reading B.fcs
## Step 3/4 - Preparing FlowSOM object and results data.table
## Step 4/4 - FlowSOM preparation for alignment model complete
        cytnrm
## $fsom
## FlowSOM model trained on 20000 cells and 8 markers, 
##  using a 5x5 grid (25 clusters) and 10 metaclusters.
## 
## Markers used:  column PECy5-5 CD3e_asinh from dataset <PECy5-5 CD3e_asinh>, column PECy7 CD16-32_asinh from dataset <PECy7 CD16-32_asinh>, column DL800 Ly6G_asinh from dataset <DL800 Ly6G_asinh>, column AF700 CD45_asinh from dataset <AF700 CD45_asinh>, column APCCy7 CD48_asinh from dataset <APCCy7 CD48_asinh>, column BUV395 CD11b_asinh from dataset <BUV395 CD11b_asinh>, column BUV737 B220_asinh from dataset <BUV737 B220_asinh>, column BV605 Ly6C_asinh from dataset <BV605 Ly6C_asinh>
## 
## Metacluster cell count:
##  MC 1  MC 2  MC 3  MC 4  MC 5  MC 6  MC 7  MC 8  MC 9 MC 10 
##  5886   455    83  7267  2389   878   331  1708   717   286 
## 
##  168 cells (0.84%) are further than 4 MAD from their cluster center.
## $dt
##        PECy5-5 CD3e_asinh PECy7 CD16-32_asinh DL800 Ly6G_asinh AF700 CD45_asinh
##                     <num>               <num>            <num>            <num>
##     1:       -0.073653787         -0.06836993      -0.71957141       -0.2601342
##     2:        0.196819022          1.04051566       0.03595166        0.5919558
##     3:       -0.148785651          2.90569115       3.45313573        1.2724129
##     4:        0.189247936          3.06170583       4.11895704        1.9266738
##     5:       -0.306349635          0.44284853       0.42922080        0.4019526
##    ---                                                                         
## 19996:        2.784854412          0.39633411      -0.98542446        3.4850731
## 19997:        0.007048962          3.26623678       3.83651400        2.1971622
## 19998:       -0.144241706          1.14419019       0.08433798        1.8859901
## 19999:        0.136786833          2.73393393       2.65572119        1.3384684
## 20000:        0.101986513          3.28912425       3.33585572        2.1958396
##        APCCy7 CD48_asinh BUV395 CD11b_asinh BUV737 B220_asinh BV605 Ly6C_asinh
##                    <num>              <num>             <num>            <num>
##     1:         0.6676006        0.455060869       0.191696376       -0.3804775
##     2:         0.7900302       -0.004049649       4.381132603       -0.2398050
##     3:         0.9693112        3.634680748       0.466979593        2.6836505
##     4:         0.4057303        4.441645622       0.637388706        3.1590574
##     5:        -0.5365349        1.400961280      -0.298591286       -0.1681118
##    ---                                                                        
## 19996:         1.9617901        0.664981723      -0.590357363        3.6845517
## 19997:         0.5892904        4.073036194      -0.006483534        2.1069219
## 19998:         3.1991692       -0.560110509       3.579796553       -0.1521181
## 19999:         0.7387957        4.001876354       0.324388862        1.7045259
## 20000:         1.4960654        4.071680546       0.229720220        1.9134706
##         File File_scattered Original_ID prep.fsom.cluster prep.fsom.metacluster
##        <num>          <num>       <num>             <num>                 <num>
##     1:     1      1.0370497        2369                17                     7
##     2:     1      0.8859173        5273                12                     1
##     3:     1      0.9769963        9290                15                     4
##     4:     1      1.1308356        1252                 9                     4
##     5:     1      0.9924445        8826                17                     7
##    ---                                                                         
## 19996:     2      1.9423792        6029                21                     9
## 19997:     2      2.0499919        6970                 4                     4
## 19998:     2      1.9680801        8165                11                     1
## 19999:     2      1.9309787        2143                15                     4
## 20000:     2      1.9343390        3821                 4                     4
## 
## $cellular.cols
## [1] "PECy5-5 CD3e_asinh"  "PECy7 CD16-32_asinh" "DL800 Ly6G_asinh"   
## [4] "AF700 CD45_asinh"    "APCCy7 CD48_asinh"   "BUV395 CD11b_asinh" 
## [7] "BUV737 B220_asinh"   "BV605 Ly6C_asinh"   
## 
## $cluster.cols
## [1] "PECy5-5 CD3e_asinh"  "PECy7 CD16-32_asinh" "DL800 Ly6G_asinh"   
## [4] "AF700 CD45_asinh"    "APCCy7 CD48_asinh"   "BUV395 CD11b_asinh" 
## [7] "BUV737 B220_asinh"   "BV605 Ly6C_asinh"   
## 
## $files
## [1] "A" "B"
## 
## $file.nums
## [1] 1 2

Once this is complete, we can subsample the dataset and plot it using UMAP.

        cytnrm.sub <- do.subsample(cytnrm$dt, 10000)
        cytnrm.sub <- run.umap(cytnrm.sub, use.cols = cluster.cols)
## [2024-05-23 15:43:15.585413]  starting umap
## [2024-05-23 15:43:15.60073]  creating graph of nearest neighbors
## [2024-05-23 15:43:27.204277]  creating initial embedding
## [2024-05-23 15:43:27.74285]  optimizing embedding
## [2024-05-23 15:43:39.423743]  done

Create a subfolder for plots.

    ### Initial clustering plots
        
        setwd(OutputDirectory)
        setwd("Output 2 - alignment")
        dir.create("1 - ref pre-alignment")
## Warning in dir.create("1 - ref pre-alignment"): '1 - ref pre-alignment' already
## exists
        setwd("1 - ref pre-alignment")

First, we will colour the cells by which batch they originated from. We can see batch effects evident by the offset between the red and green cells. In this case they are coloured by ‘File’.

        make.colour.plot(cytnrm.sub, 'UMAP_X', 'UMAP_Y', 'File', 'factor')

Then, we need to examine the cellular expression across these cells.

        make.multi.plot(cytnrm.sub, 'UMAP_X', 'UMAP_Y', cellular.cols)

## Check your working directory for a new .png called 'Multi plot.png'

Finally, we need to examine the metaclusters that have been generated, and confirm that they capture matching populations across batches (e.g. in this example, metacluster 1 captures B220+ B cells from both batches. If too many or too few metaclusters are generated, you can try running the prep.cytonorm function again, and specify a desired number of metaclusters using the meta.k argument.

        make.colour.plot(cytnrm.sub, 'UMAP_X', 'UMAP_Y', 'prep.fsom.metacluster', 'factor', add.label = TRUE)

You can see which file number corresponds to each batch by running cytnrm$files and cytnrm$file.nums.

        cytnrm$files
## [1] "A" "B"
        cytnrm$file.nums
## [1] 1 2

If the metaclusters are suitable, we will proceed to actually performing alignment. Because the train.cytonorm and run.cytonorm functions involve the writing and reading of files, we will set the directory beforehand.


Full alignment

Set a directory for the outputs

    ### Alignment
        
        setwd(OutputDirectory)
        dir.create("Output 2 - alignment")
## Warning in dir.create("Output 2 - alignment"): 'Output 2 - alignment' already
## exists
        setwd("Output 2 - alignment")

Firstly, we need to train the model that will perform quantile alignment for each marker on each metacluster.

        cytnrm <- train.cytonorm(model = cytnrm, align.cols = cellular.cols)
## Training alignment - setup
## Working directory is '/Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm'
## Training alignment - file (batch) preparation
## Training alignment - file (batch) and metacluster splitting
##  -- running File '1'
##  -- running File '2'
## Training alignment - learning conversions
## Processing cluster 1
## Computing Quantiles
##   A (FileID 1)
## 
##   B (FileID 2)
## 
## Computing Splines
##   A
##   B
## Processing cluster 2
## Computing Quantiles
##   A (FileID 1)
## 
##   B (FileID 2)
## 
## Computing Splines
##   A
##   B
## Processing cluster 3
## Computing Quantiles
##   A (FileID 1)
## 
##   Less then 50 cells in A (34). No quantiles computed.
##   B (FileID 2)
## 
##   Less then 50 cells in B (49). No quantiles computed.
## Computing Splines
##   A
## Warning in (function (files, labels, channels, transformList, nQ = 101, : Not enough cells for A
## The identity function will be used.
##   B
## Warning in (function (files, labels, channels, transformList, nQ = 101, : Not enough cells for B
## The identity function will be used.
## Processing cluster 4
## Computing Quantiles
##   A (FileID 1)
## 
##   B (FileID 2)
## 
## Computing Splines
##   A
##   B
## Processing cluster 5
## Computing Quantiles
##   A (FileID 1)
## 
##   B (FileID 2)
## 
## Computing Splines
##   A
##   B
## Processing cluster 6
## Computing Quantiles
##   A (FileID 1)
## 
##   B (FileID 2)
## 
## Computing Splines
##   A
##   B
## Processing cluster 7
## Computing Quantiles
##   A (FileID 1)
## 
##   B (FileID 2)
## 
## Computing Splines
##   A
##   B
## Processing cluster 8
## Computing Quantiles
##   A (FileID 1)
## 
##   B (FileID 2)
## 
## Computing Splines
##   A
##   B
## Processing cluster 9
## Computing Quantiles
##   A (FileID 1)
## 
##   B (FileID 2)
## 
## Computing Splines
##   A
##   B
## Processing cluster 10
## Computing Quantiles
##   A (FileID 1)
## 
##   B (FileID 2)
## 
## Computing Splines
##   A
##   B
## Training alignment - cleanup
## Training alignment - training complete

Once complete, this model can be used to align the full dataset.

        cell.dat <- run.cytonorm(dat = cell.dat, model = cytnrm, batch.col = batch.col)
## Working directory is '/Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm'
## Alignment -- splitting files (batches)
## Alignment -- processing files (batches)
## -- Splitting A.fcs
## Warning in dir.create("tmp-target"): 'tmp-target' already exists
## Warning in FlowSOM::NewData(model$fsom, ff): 1258 cells (1.57%) seem far from
## their cluster centers.
## -- Splitting B.fcs
## Warning in dir.create("tmp-target"): 'tmp-target' already exists
## Warning in FlowSOM::NewData(model$fsom, ff): 1648 cells (2.06%) seem far from
## their cluster centers.
## Alignment -- aligning data
## -- Processing cluster 1
##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom1.fcs (A)
## Normalizing A
##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom1.fcs (B)
## Normalizing B
## -- Processing cluster 2
##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom2.fcs (A)
## Normalizing A
##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom2.fcs (B)
## Normalizing B
## -- Processing cluster 3
##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom3.fcs (A)
## Normalizing A
##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom3.fcs (B)
## Normalizing B
## -- Processing cluster 4
##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom4.fcs (A)
## Normalizing A
##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom4.fcs (B)
## Normalizing B
## -- Processing cluster 5
##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom5.fcs (A)
## Normalizing A
##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom5.fcs (B)
## Normalizing B
## -- Processing cluster 6
##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom6.fcs (A)
## Normalizing A
##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom6.fcs (B)
## Normalizing B
## -- Processing cluster 7
##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom7.fcs (A)
## Normalizing A
##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom7.fcs (B)
## Normalizing B
## -- Processing cluster 8
##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom8.fcs (A)
## Normalizing A
##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom8.fcs (B)
## Normalizing B
## -- Processing cluster 9
##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom9.fcs (A)
## Normalizing A
##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom9.fcs (B)
## Normalizing B
## -- Processing cluster 10
##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/A.fcs_fsom10.fcs (A)
## Normalizing A
##   /Users/thomasa/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/CytoNorm/tmp-target/EachCluster/B.fcs_fsom10.fcs (B)
## Normalizing B
## Alignment -- merging aligned data for each metacluster
## Alignment -- creating final data.table
## Alignment -- complete


Review alignment

To examine the results, we first need to specify the cellular names containing the aligned data (they will have ’_aligned’ appended to the end).

        aligned.cols <- paste0(cellular.cols, '_aligned')

Then we can set a new output directory.

    ### Plotting reference data
        
        setwd(OutputDirectory)
        setwd("Output 2 - alignment")
        dir.create("2 - ref aligned")
## Warning in dir.create("2 - ref aligned"): '2 - ref aligned' already exists
        setwd("2 - ref aligned")

First, we will examine just the reference samples, to see if the alignment looks suitable. We will run UMAP using the new aligned data.

        ref.sub <- do.filter(cell.dat, sample.col, refs)
        ref.sub
##        PECy5-5 CD3e PECy7 CD16-32 DL800 Ly6G AF700 CD45 APCCy7 CD48
##               <num>         <num>      <num>      <num>       <num>
##     1:     51.64030      1239.360  -681.2570   1752.270    2600.430
##     2:     58.88240      -140.921  -267.0390   1188.640     392.962
##     3:     60.00770      2191.260  -362.0880   1676.960    1441.300
##     4:    313.95800      2532.700  -438.3880    478.109    1809.220
##     5:    -83.03910      1603.390 -1200.3500    564.994    3946.200
##    ---                                                             
## 19996:    -30.65640       158.361 -1988.0000   5905.010    8831.280
## 19997:    148.32100      5819.820 15918.1000   2497.000     751.419
## 19998:    669.96300      3373.540   -33.8201   1144.100    1320.440
## 19999:    320.50700      4386.740  7465.1800   2107.970    -134.778
## 20000:     -9.93316      5929.620 11076.1000   1861.070     267.681
##        BUV395 CD11b BUV737 B220 BV605 Ly6C  FileName FileNo PECy5-5 CD3e_asinh
##               <num>       <num>      <num>    <fctr>  <int>              <num>
##     1:     627.2590  17763.6000    128.934 Mock_01_A      1         0.10309786
##     2:     304.6150  11834.8000    907.365 Mock_01_A      1         0.11749428
##     3:    4740.7100    185.9470   3915.820 Mock_01_A      1         0.11972914
##     4:    9440.3800   1056.2900   6221.670 Mock_01_A      1         0.59261483
##     5:    1636.1100   4579.8700    274.527 Mock_01_A      1        -0.16532406
##    ---                                                                        
## 19996:     -77.0832   9662.6800    568.947 Mock_05_B      5        -0.06127445
## 19997:   13401.7000    -91.9106   2209.480 Mock_05_B      5         0.29245518
## 19998:    4762.0700    781.4270   4849.040 Mock_05_B      5         1.10256164
## 19999:   15413.4000    493.2230   1767.920 Mock_05_B      5         0.60367460
## 20000:   11084.1000     50.4877   1180.210 Mock_05_B      5        -0.01986501
##        PECy7 CD16-32_asinh DL800 Ly6G_asinh AF700 CD45_asinh APCCy7 CD48_asinh
##                      <num>            <num>            <num>             <num>
##     1:           1.6392988      -1.11599889        1.9669670         2.3510881
##     2:          -0.2782380      -0.51148258        1.6006640         0.7216390
##     3:           2.1835405      -0.67235984        1.9247953         1.7806527
##     4:           2.3251843      -0.79150471        0.8500738         1.9977594
##     5:           1.8818839      -1.60970711        0.9703694         2.7630370
##    ---                                                                        
## 19996:           0.3116525      -2.08887520        3.1638831         3.5653947
## 19997:           3.1494039       4.15399776        2.3112610         1.1963364
## 19998:           2.6077040      -0.06758873        1.5655634         1.6983185
## 19999:           2.8681129       3.39766352        2.1457973        -0.2663940
## 20000:           3.1680273       3.79159306        2.0250211         0.5126149
##        BUV395 CD11b_asinh BUV737 B220_asinh BV605 Ly6C_asinh    Sample  Group
##                     <num>             <num>            <num>    <char> <char>
##     1:          1.0504123         4.2636438        0.2550924 Mock_01_A   Mock
##     2:          0.5767234         3.8577846        1.3575720 Mock_01_A   Mock
##     3:          2.9452507         0.3638149        2.7553704 Mock_01_A   Mock
##     4:          3.6319912         1.4928782        3.2159434 Mock_01_A   Mock
##     5:          1.9011864         2.9109315        0.5246514 Mock_01_A   Mock
##    ---                                                                       
## 19996:         -0.1535622         3.6552341        0.9755986 Mock_05_B   Mock
## 19997:          3.9820237        -0.1828014        2.1916151 Mock_05_B   Mock
## 19998:          2.9497215         1.2291302        2.9677227 Mock_05_B   Mock
## 19999:          4.1217946         0.8717569        1.9755202 Mock_05_B   Mock
## 20000:          3.7923143         0.1008046        1.5941070 Mock_05_B   Mock
##         Batch PECy5-5 CD3e_asinh_aligned PECy7 CD16-32_asinh_aligned
##        <char>                      <num>                       <num>
##     1:      A                 0.07550371                   1.6243541
##     2:      A                 0.09086002                  -0.4109158
##     3:      A                 0.19667186                   2.3359835
##     4:      A                 0.78801227                   2.4700482
##     5:      A                -0.20452863                   1.8813063
##    ---                                                              
## 19996:      B                -0.02770719                   0.3994900
## 19997:      B                 0.25879997                   3.0681801
## 19998:      B                 0.89016247                   2.4627149
## 19999:      B                 0.56146991                   2.7699685
## 20000:      B                -0.04108953                   3.0901837
##        DL800 Ly6G_asinh_aligned AF700 CD45_asinh_aligned
##                           <num>                    <num>
##     1:               -0.9853218                 2.260301
##     2:               -0.5260243                 1.783121
##     3:               -0.5778462                 1.962701
##     4:               -0.6863875                 1.103076
##     5:               -1.3787018                 1.213712
##    ---                                                  
## 19996:               -3.0485964                 2.892127
## 19997:                4.0582747                 2.050607
## 19998:               -0.1113533                 1.328649
## 19999:                3.2538199                 1.860001
## 20000:                3.6525168                 1.720933
##        APCCy7 CD48_asinh_aligned BUV395 CD11b_asinh_aligned
##                            <num>                      <num>
##     1:                 2.5360262                 0.45637840
##     2:                 1.1129812                 0.01138196
##     3:                 1.8322810                 2.99106073
##     4:                 2.0085123                 3.73671436
##     5:                 2.9657056                 1.33816469
##    ---                                                     
## 19996:                 3.3363132                 0.44203448
## 19997:                 1.0153860                 3.88138986
## 19998:                 1.5865602                 2.91923833
## 19999:                -0.4603302                 4.03672075
## 20000:                 0.2559606                 3.69683123
##        BUV737 B220_asinh_aligned BV605 Ly6C_asinh_aligned Alignment_MC_aligned
##                            <num>                    <num>                <num>
##     1:                4.10898304                0.1759336                    1
##     2:                3.71863341                1.2138813                    1
##     3:                0.49721161                2.6361940                    5
##     4:                1.55262542                3.0890400                    5
##     5:                2.83306122                0.4209961                    1
##    ---                                                                        
## 19996:                3.82389307                1.1142409                    1
## 19997:               -0.20893200                2.4836495                    4
## 19998:                1.05263329                3.0943832                    5
## 19999:                0.82943606                2.2702370                    4
## 20000:                0.07291784                1.8810067                    4
        ref.sub <- do.subsample(ref.sub, 20000)
        ref.sub <- run.umap(ref.sub, use.cols = aligned.cols)
## [2024-05-23 15:43:49.365568]  starting umap
## [2024-05-23 15:43:49.379667]  creating graph of nearest neighbors
## [2024-05-23 15:44:15.804667]  creating initial embedding
## [2024-05-23 15:44:17.118676]  optimizing embedding
## [2024-05-23 15:44:39.803111]  done

Plotting the data by batch reveals that the two batches are now well integrated with eachother.

        make.colour.plot(ref.sub, 'UMAP_X', 'UMAP_Y', batch.col, 'factor')