Spectre - simple discovery workflow

Introduction

Overview

Here we provide a worked example of a ‘simple’ discovery analysis workflow, where the entire process (data prep, clustering, dimensionality reduction, cluster annotation, plotting, summary data, and statistical analysis) is contained within a single script. The ‘simple’ workflow is most suitable for fast analysis of small datasets. For larger or more complex datasets, or datasets with multiple batches, we recommend the general discovery workflow, where the data preparation, batch alignment, clustering/dimensionality reduction, and quantitative analysis are separated into separate scripts. The demo dataset used for this worked example are cells extracted from mock- or virally-infected mouse brains, measured by flow cytometry.

Strategy

The ‘simple’ and ‘general’ discovery workflows are designed to facilitate the analysis of large and complex cytometry datasets using the Spectre R package. We’ve tested up to 30 million cells in a single analysis session so far. The workflow is designed to get around the cell number limitations of tSNE/UMAP. The analysis starts with clustering with FlowSOM – which is fast and scales well to large datasets. The clustered data is then downsampled, and dimensionality reduction is performed with tSNE/UMAP. This allows for visualisation of the data, and the clusters present in the dataset. Once the possible cell types in the datasets have been explored, the clusters can be labelled with the appropriate cellular identities. Finally, we can use the clusters/populations to generate summary statistics (expression levels, frequencies, total counts etc), which allows us to create graphs and heatmaps, facilitating statistical analysis.

Multiple samples

To analyse multiple samples, all the files must be imported into the one analysis session. This allows cells from each session to be clustered and analysed together, and allows us to examine the differential expression of markers, or the differences in cell proportions, between experimental groups.

Batch alignment

The ‘simple’ discovery workflow does not include any batch alignment steps. If batch correction needs to be applied, we recommend using the general discovery workflow.

Before you start

If you haven’t installed Spectre, please visit our Spectre installation page. If you are new to R and Spectre, we recommend trying out the R/RStudio and Spectre tutorials available on our getting st arted page, to familiarise yourself with R/RStudio first.

Citation and methods

Citation

If you use Spectre in your work, please consider citing Ashhurst TM, Marsh-Wakefield F, Putri GH et al. (2022). Cytometry Part A 101 (3), 237-253. To continue providing open-source tools such as Spectre, it helps us if we can demonstrate that our efforts are contributing to analysis efforts in the community. Please also consider citing the authors of the individual packages or tools (e.g. CytoNorm, FlowSOM, tSNE, UMAP, etc) that are critical elements of your analysis work. We have provided some generic text that you can use for your methods section with each protocol and on the ‘about’ page.

Sample methods blurb

Here is a sample methods blurb for this workflow. You may need to adapt this text to reflect any changes made in your analysis.

Computational analysis of data was performed using the Spectre R package (Ashhurst et al., 2022), with instructions and source code provided at https://github.com/ImmuneDynamics/spectre. Samples were initially prepared in FlowJo, and the population of interest was exported as raw value CSV files. Arcsinh transformation was performed on the data in R using a co-factor of 15 to redistribute the data on a linear scale and compress low end values near zero. The dataset was then merged into a single data.table, with keywords denoting the sample, group, and other factors added to each row (cell). The FlowSOM algorithm (Van Gassen et al., 2015) was then run on the merged dataset to cluster the data, where every cell is assigned to a specific cluster and metacluster. Subsequently, the data was downsampled and analysed by the dimensionality reduction algorithm Uniform Manifold Approximation and Projection (UMAP) (McInnes, Healy, Melville, 2018) for cellular visualisation.

Setup

Directories

Create a master folder with a meaningful name. Then inside that folder, insert the following:

One folder called ‘data’ – this will contain your data CSV or FCS files
One folder called ‘metadata’ – this will contain a CSV containg your sample metadata
One folder called ‘Spectre simple discovery’ or similar – place this analysis script there

Example:

  CNS analysis
    /data
        -- Contains data files, one CSV or FCS per sample
    /metadata
        -- Contains a CSV containing sample metadata (group, batch, etc)
    /Spectre simple discovery
        -- Spectre simple.discovery.R

Analysis script

You can download the simple discovery script from this link – place this inside the Spectre simple discovery folder.

Data files

If you would to use the demo data as a test run for the simple discovery workflow, nothing to do at this step. Simply follow the relevant instructions further down this page to download the demo data (under 2. Import and prep data).

If you would like to use your own data, add your data and metadata files:

Place the sample CSV or FCS files in the data folder you created above.
Place the metadata CSV files in the metadata folder you created above.

Please see this page for detailed instructions on exporting data for Spectre and setting up a metadata file.

1. Load packages and set directories

#######################################################################################################
#### 1. Load packages, and set working directory
#######################################################################################################

Running library(Spectre) will load the Spectre package (also known as a ‘library’). We can then use package.check() to see if the standard dependency packages are installed, and package.load() to load those packages.

    ### Load libraries

        library(Spectre)
        Spectre::package.check()    # Check that all required packages are installed
        Spectre::package.load()     # Load required packages

Here we can set our ‘primary’ directory, which is going to be the location where the R script is saved. This path will be stored as PrimaryDirectory.

Note: if you aren’t sure how to navigate directories in R, check out our brief introduction to R tutorial.

    ### Set PrimaryDirectory
        
        dirname(rstudioapi::getActiveDocumentContext()$path)
        setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
        getwd()
        PrimaryDirectory <- getwd()
        PrimaryDirectory

We can then set our input directory, which will be the ‘data’ folder where we placed our files during setup. To do this we ask R to start at the location of the PrimaryDirectory, go up one level .., and then find the data folder.

    ### Set 'input' directory
        
        setwd(PrimaryDirectory)
        dir.create('../data', showWarnings = FALSE)
        setwd("../data/")
        InputDirectory <- getwd()
        setwd(PrimaryDirectory)

We need to set the location of the ‘metadata’ folder. This is where we can store a CSV file that contains any relevant metadata that we want to embed in our samples. In this example, it is located in a sub-folder called ‘metadata’.

    ### Set 'metadata' directory
        
        setwd(PrimaryDirectory)
        dir.create('../metadata', showWarnings = FALSE)
        setwd("../metadata/")
        MetaDirectory <- getwd()
        setwd(PrimaryDirectory)

We need to create a folder where our output data can go once our analysis is finished. In this example we will call this ‘Output_Spectre’.

    ### Create output directory
        
        setwd(PrimaryDirectory)
        dir.create("Output_Spectre", showWarnings = FALSE)
        setwd("Output_Spectre")
        OutputDirectory <- getwd()
        setwd(PrimaryDirectory)

2. Import and prep data

#######################################################################################################
#### 2. Import and prep data
#######################################################################################################

Demo data

If you need the demo dataset, uncomment the following code in the analysis script (select all, CMD+SHIFT+C) and run to download. If you are using your own datasets, then skip this step.

This code will download the demo dataset files and metadata file, and place them in the data and metadata folders respectively.

# setwd(PrimaryDirectory)
# setwd("../")
# getwd()
# download.file(url = "https://github.com/ImmuneDynamics/data/blob/main/msCNS.zip?raw=TRUE", destfile = 'msCNS.zip', mode = 'wb')
# unzip(zipfile = 'msCNS.zip')
# for(i in list.files('msCNS/data', full.names = TRUE)){
#   file.rename(from = i,  to = gsub('msCNS/', '', i))
# }
# for(i in list.files('msCNS/metadata', full.names = TRUE)){
#   file.rename(from = i,  to = gsub('msCNS/', '', i))
# }
# unlink(c('msCNS/', 'msCNS.zip', '__MACOSX'), recursive = TRUE)

Import data

To begin, we will change our working directory to ‘InputDirectory’ and list all the CSV files in that directory – these should be the sample CSV files. We can then read in all of our samples (in this example, one CSV file per sample) into a list called ‘data.list’. Spectre uses the data.table framework to store data, which reads, writes, and performs operations on data very quickly.

    ### Import data

        setwd(InputDirectory)
        list.files(InputDirectory, ".csv")

##  [1] "CNS_Mock_01.csv"   "CNS_Mock_02.csv"   "CNS_Mock_03.csv"  
##  [4] "CNS_Mock_04.csv"   "CNS_Mock_05.csv"   "CNS_Mock_06.csv"  
##  [7] "CNS_WNV_D7_01.csv" "CNS_WNV_D7_02.csv" "CNS_WNV_D7_03.csv"
## [10] "CNS_WNV_D7_04.csv" "CNS_WNV_D7_05.csv" "CNS_WNV_D7_06.csv"

        data.list <- Spectre::read.files(file.loc = InputDirectory,
                                         file.type = ".csv",
                                         do.embed.file.names = TRUE)

By default, the read.files() function will generate some other variables, which you can review, by running the do.list.summary() function.

The ‘name.table’ variable is a table of all the column names for all of your samples (one row per sample, one column per column name). If all of the column names are matching, then this table should be a repeating pattern. If it has been jumbled, then some of your samples have columns that don’t appear in other samples. The ‘ncol.check’ and ‘nrow.check’ are simple tables indicating the number or columns and rows in each sample.

    ### Check the data

        check <- do.list.summary(data.list)

        check$name.table # Review column names and their subsequent values

##      X1  X2   X3   X4    X5   X6   X7   X8  X9      X10    X11
## 1  NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4 FileName FileNo
## 2  NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4 FileName FileNo
## 3  NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4 FileName FileNo
## 4  NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4 FileName FileNo
## 5  NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4 FileName FileNo
## 6  NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4 FileName FileNo
## 7  NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4 FileName FileNo
## 8  NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4 FileName FileNo
## 9  NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4 FileName FileNo
## 10 NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4 FileName FileNo
## 11 NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4 FileName FileNo
## 12 NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4 FileName FileNo

        check$ncol.check # Review number of columns (features, markers) in each sample

##       [,1]
##  [1,] 11  
##  [2,] 11  
##  [3,] 11  
##  [4,] 11  
##  [5,] 11  
##  [6,] 11  
##  [7,] 11  
##  [8,] 11  
##  [9,] 11  
## [10,] 11  
## [11,] 11  
## [12,] 11

        check$nrow.check # Review number of rows (cells) in each sample

##       [,1] 
##  [1,] 9937 
##  [2,] 15415
##  [3,] 14246
##  [4,] 17044
##  [5,] 5459 
##  [6,] 4891 
##  [7,] 17950
##  [8,] 16233
##  [9,] 15999
## [10,] 17131
## [11,] 15926
## [12,] 18773

You can review the first 6 rows of the first sample in your data using the following:

        data.list[[1]]

##            NK11        CD3     CD45       Ly6G     CD11b      B220      CD8a
##           <num>      <num>    <num>      <num>     <num>     <num>     <num>
##    1:   42.3719  40.098700  6885.08  -344.7830 14787.300  -40.2399  83.71750
##    2:   42.9586 119.014000  1780.29  -429.6650  5665.730   86.6673  34.72190
##    3:   59.2366 206.238000 10248.30 -1603.8400 19894.300  427.8310 285.88000
##    4:  364.9480  -0.233878  3740.04  -815.9800  9509.430  182.4200 333.60500
##    5:  440.2470  40.035200  9191.38    40.5055  5745.820 -211.6940 149.22000
##   ---                                                                       
## 9933:   11.2126  36.951600  2515.82  -647.4930  6172.070  221.9380 266.90000
## 9934:  239.9700 440.217000  7247.28 -1449.7200 15355.400  809.3040 456.59900
## 9935: -134.9650 111.350000  2472.85    81.5975  9657.160 -113.1320   3.79607
## 9936:   86.3333  28.286900  5745.27 -1284.0800 18303.100  353.5290 262.96300
## 9937:   10.1467 122.255000  1971.69  -215.7660   727.708  506.8580 113.14400
##            Ly6C      CD4    FileName FileNo
##           <num>    <num>      <char>  <int>
##    1:  958.7000  711.072 CNS_Mock_01      1
##    2:  448.2590  307.272 CNS_Mock_01      1
##    3: 1008.8300  707.094 CNS_Mock_01      1
##    4:  440.0710  249.784 CNS_Mock_01      1
##    5:   87.4815  867.570 CNS_Mock_01      1
##   ---                                      
## 9933:  141.4200  708.348 CNS_Mock_01      1
## 9934: 2093.6900 2119.270 CNS_Mock_01      1
## 9935: -114.1510  110.743 CNS_Mock_01      1
## 9936:  745.8080  537.750 CNS_Mock_01      1
## 9937:  244.2210 2334.800 CNS_Mock_01      1

Merge data.tables

Once the metadata has been added, we can then merge the data into a single data.table using do.merge.files(). By default, columns with matching names will be aligned in the new table, and any columns that are present in some samples, but not others, will be added and filled with ‘NA’ for any samples that didn’t have that column initially. Once the data has been merged, we can review the data:

    ### Merge data

        cell.dat <- Spectre::do.merge.files(dat = data.list)
        cell.dat

##              NK11        CD3     CD45       Ly6G    CD11b      B220      CD8a
##             <num>      <num>    <num>      <num>    <num>     <num>     <num>
##      1:   42.3719  40.098700  6885.08  -344.7830 14787.30  -40.2399   83.7175
##      2:   42.9586 119.014000  1780.29  -429.6650  5665.73   86.6673   34.7219
##      3:   59.2366 206.238000 10248.30 -1603.8400 19894.30  427.8310  285.8800
##      4:  364.9480  -0.233878  3740.04  -815.9800  9509.43  182.4200  333.6050
##      5:  440.2470  40.035200  9191.38    40.5055  5745.82 -211.6940  149.2200
##     ---                                                                      
## 169000:  910.8890  72.856100 31466.20  -316.5570 28467.80   -7.7972 -271.8040
## 169001:  -10.2642  64.188700 45188.00  -540.5140 22734.00  202.4110 -936.4920
## 169002: -184.2910  -9.445650 11842.60   -97.9383 17237.00  123.4760 -219.9320
## 169003:  248.3860 229.986000 32288.20  -681.1630 19255.80 -656.0540 -201.5880
## 169004:  738.9810  95.470300 46185.10 -1004.6000 22957.80 -661.6280   72.3356
##               Ly6C       CD4      FileName FileNo
##              <num>     <num>        <char>  <int>
##      1:   958.7000  711.0720   CNS_Mock_01      1
##      2:   448.2590  307.2720   CNS_Mock_01      1
##      3:  1008.8300  707.0940   CNS_Mock_01      1
##      4:   440.0710  249.7840   CNS_Mock_01      1
##      5:    87.4815  867.5700   CNS_Mock_01      1
##     ---                                          
## 169000: 12023.7000 1103.0500 CNS_WNV_D7_06     12
## 169001:  4188.3300  315.9400 CNS_WNV_D7_06     12
## 169002:  8923.4000 -453.4640 CNS_WNV_D7_06     12
## 169003: 10365.7000   61.6765 CNS_WNV_D7_06     12
## 169004:  9704.4700  -31.8532 CNS_WNV_D7_06     12

Read in metadata

    ### Read in metadata  
       
        setwd(MetaDirectory)
        
        meta.dat <- fread("sample.details.csv")
        meta.dat

##              Filename     Sample  Group  Batch Cells per sample
##                <char>     <char> <char> <char>            <num>
##  1:   CNS_Mock_01.csv 01_Mock_01   Mock      A           420000
##  2:   CNS_Mock_02.csv 02_Mock_02   Mock      B           240000
##  3:   CNS_Mock_03.csv 03_Mock_03   Mock      B           256000
##  4:   CNS_Mock_04.csv 04_Mock_04   Mock      A           252000
##  5:   CNS_Mock_05.csv 05_Mock_05   Mock      A           345000
##  6:   CNS_Mock_06.csv 06_Mock_06   Mock      B           702000
##  7: CNS_WNV_D7_01.csv  07_WNV_01    WNV      A          5070000
##  8: CNS_WNV_D7_02.csv  08_WNV_02    WNV      B          2940000
##  9: CNS_WNV_D7_03.csv  09_WNV_03    WNV      A          2120000
## 10: CNS_WNV_D7_04.csv  10_WNV_04    WNV      A          4320000
## 11: CNS_WNV_D7_05.csv  11_WNV_05    WNV      B          4080000
## 12: CNS_WNV_D7_06.csv  12_WNV_06    WNV      A          1830000

3. Data transformation

#######################################################################################################
#### 3. Data transformation
#######################################################################################################

Before we perform clustering etc, we need to meaningfully transform the data. For more information on why this is necessary, please see this page.

Note: If you have imported CSV (channel value) files exported from FlowJo, then no data transformations are required, and you can skip all of the arcsinh transformation steps and proceed straight to adding the metadata. More information on the FCS, CSV scale, and CSV channel value file types can be found here.

    setwd(OutputDirectory)
    dir.create("Output 1 - transformed plots")
    setwd("Output 1 - transformed plots")

First, check the column names of the dataset.

    ### Arcsinh transformation

        as.matrix(names(cell.dat))

##       [,1]      
##  [1,] "NK11"    
##  [2,] "CD3"     
##  [3,] "CD45"    
##  [4,] "Ly6G"    
##  [5,] "CD11b"   
##  [6,] "B220"    
##  [7,] "CD8a"    
##  [8,] "Ly6C"    
##  [9,] "CD4"     
## [10,] "FileName"
## [11,] "FileNo"

The columns we want to apply arcsinh transformation to are the cellular columns – column 1 to column 9. We can specify those columns using the code below.

    ### Arcsinh transformation

        as.matrix(names(cell.dat))

##       [,1]      
##  [1,] "NK11"    
##  [2,] "CD3"     
##  [3,] "CD45"    
##  [4,] "Ly6G"    
##  [5,] "CD11b"   
##  [6,] "B220"    
##  [7,] "CD8a"    
##  [8,] "Ly6C"    
##  [9,] "CD4"     
## [10,] "FileName"
## [11,] "FileNo"

        to.asinh <- names(cell.dat)[c(1:9)]
        to.asinh

## [1] "NK11"  "CD3"   "CD45"  "Ly6G"  "CD11b" "B220"  "CD8a"  "Ly6C"  "CD4"

Define the cofactor we will use for transformation. As a general recommendation, we suggest using cofactor = 15 for CyTOF data, and cofactor between 100 and 1000 for flow data (we suggest 500 as a starting point). Here is a quick comparison figure showing how different co-factors compare to bi-exponential transformations performed on an LSR-II. For more detailed information on this choice, and for approaches where different cofactors for different columns might be required, see this page.

In this worked example we will use a cofactor of 500.

        cofactor <- 500

You can also choose a column to use for plotting the transformed result – ideally something that is expressed on a variety of cell types in your dataset.

        plot.against <- "Ly6C_asinh"

Now we need to apply arcsinh transformation to the data in those columns, using a specific co-factor.

        cell.dat <- do.asinh(cell.dat, to.asinh, cofactor = cofactor)
        transformed.cols <- paste0(to.asinh, "_asinh")

We can then make some plots to see if the arcsinh transformation is appropriate

        for(i in transformed.cols){
          make.colour.plot(do.subsample(cell.dat, 20000), i, plot.against)
        }

Check the plots and see if you are happy with the transformation. For more detailed guidance, see this page. If happy, the proceed with analysis. Otherwise, go back to the merging of the data.list (to create cell.dat) and try with another co-factor.

4. Add metadata

#######################################################################################################
#### 4. Add metadata and set some preferences
#######################################################################################################

We also want to read in and attach some sample metadata, to aid with our analysis.

    ### Add metadata to data.table

        meta.dat

##              Filename     Sample  Group  Batch Cells per sample
##                <char>     <char> <char> <char>            <num>
##  1:   CNS_Mock_01.csv 01_Mock_01   Mock      A           420000
##  2:   CNS_Mock_02.csv 02_Mock_02   Mock      B           240000
##  3:   CNS_Mock_03.csv 03_Mock_03   Mock      B           256000
##  4:   CNS_Mock_04.csv 04_Mock_04   Mock      A           252000
##  5:   CNS_Mock_05.csv 05_Mock_05   Mock      A           345000
##  6:   CNS_Mock_06.csv 06_Mock_06   Mock      B           702000
##  7: CNS_WNV_D7_01.csv  07_WNV_01    WNV      A          5070000
##  8: CNS_WNV_D7_02.csv  08_WNV_02    WNV      B          2940000
##  9: CNS_WNV_D7_03.csv  09_WNV_03    WNV      A          2120000
## 10: CNS_WNV_D7_04.csv  10_WNV_04    WNV      A          4320000
## 11: CNS_WNV_D7_05.csv  11_WNV_05    WNV      B          4080000
## 12: CNS_WNV_D7_06.csv  12_WNV_06    WNV      A          1830000

        sample.info <- meta.dat[,c(1:4)]
        sample.info

##              Filename     Sample  Group  Batch
##                <char>     <char> <char> <char>
##  1:   CNS_Mock_01.csv 01_Mock_01   Mock      A
##  2:   CNS_Mock_02.csv 02_Mock_02   Mock      B
##  3:   CNS_Mock_03.csv 03_Mock_03   Mock      B
##  4:   CNS_Mock_04.csv 04_Mock_04   Mock      A
##  5:   CNS_Mock_05.csv 05_Mock_05   Mock      A
##  6:   CNS_Mock_06.csv 06_Mock_06   Mock      B
##  7: CNS_WNV_D7_01.csv  07_WNV_01    WNV      A
##  8: CNS_WNV_D7_02.csv  08_WNV_02    WNV      B
##  9: CNS_WNV_D7_03.csv  09_WNV_03    WNV      A
## 10: CNS_WNV_D7_04.csv  10_WNV_04    WNV      A
## 11: CNS_WNV_D7_05.csv  11_WNV_05    WNV      B
## 12: CNS_WNV_D7_06.csv  12_WNV_06    WNV      A

Once we have the metadata read into R, we will select only the columns we want to add to our dataset. In this example we only want to include use first four columns (Filename, Sample, Group, and Batch). ‘Filename’ will be used to for matching between cell.dat and meta.dat, and the other three columns will be the information that gets added to cell.dat

        meta.dat

##              Filename     Sample  Group  Batch Cells per sample
##                <char>     <char> <char> <char>            <num>
##  1:   CNS_Mock_01.csv 01_Mock_01   Mock      A           420000
##  2:   CNS_Mock_02.csv 02_Mock_02   Mock      B           240000
##  3:   CNS_Mock_03.csv 03_Mock_03   Mock      B           256000
##  4:   CNS_Mock_04.csv 04_Mock_04   Mock      A           252000
##  5:   CNS_Mock_05.csv 05_Mock_05   Mock      A           345000
##  6:   CNS_Mock_06.csv 06_Mock_06   Mock      B           702000
##  7: CNS_WNV_D7_01.csv  07_WNV_01    WNV      A          5070000
##  8: CNS_WNV_D7_02.csv  08_WNV_02    WNV      B          2940000
##  9: CNS_WNV_D7_03.csv  09_WNV_03    WNV      A          2120000
## 10: CNS_WNV_D7_04.csv  10_WNV_04    WNV      A          4320000
## 11: CNS_WNV_D7_05.csv  11_WNV_05    WNV      B          4080000
## 12: CNS_WNV_D7_06.csv  12_WNV_06    WNV      A          1830000

        counts <- meta.dat[,c(2,5)]
        counts

##         Sample Cells per sample
##         <char>            <num>
##  1: 01_Mock_01           420000
##  2: 02_Mock_02           240000
##  3: 03_Mock_03           256000
##  4: 04_Mock_04           252000
##  5: 05_Mock_05           345000
##  6: 06_Mock_06           702000
##  7:  07_WNV_01          5070000
##  8:  08_WNV_02          2940000
##  9:  09_WNV_03          2120000
## 10:  10_WNV_04          4320000
## 11:  11_WNV_05          4080000
## 12:  12_WNV_06          1830000

Now we can add this information to cell.dat. Essentially, the file names are listed in the metadata table, and we can use that to add any listed metadata in the table to the corresponding files in data.list. We can thrn review the data to ensure the metadata has been correctly embedded.

        cell.dat <- do.add.cols(cell.dat, "FileName", sample.info, "Filename", rmv.ext = TRUE)
        cell.dat

##              NK11        CD3     CD45       Ly6G    CD11b      B220      CD8a
##             <num>      <num>    <num>      <num>    <num>     <num>     <num>
##      1:   42.3719  40.098700  6885.08  -344.7830 14787.30  -40.2399   83.7175
##      2:   42.9586 119.014000  1780.29  -429.6650  5665.73   86.6673   34.7219
##      3:   59.2366 206.238000 10248.30 -1603.8400 19894.30  427.8310  285.8800
##      4:  364.9480  -0.233878  3740.04  -815.9800  9509.43  182.4200  333.6050
##      5:  440.2470  40.035200  9191.38    40.5055  5745.82 -211.6940  149.2200
##     ---                                                                      
## 169000:  910.8890  72.856100 31466.20  -316.5570 28467.80   -7.7972 -271.8040
## 169001:  -10.2642  64.188700 45188.00  -540.5140 22734.00  202.4110 -936.4920
## 169002: -184.2910  -9.445650 11842.60   -97.9383 17237.00  123.4760 -219.9320
## 169003:  248.3860 229.986000 32288.20  -681.1630 19255.80 -656.0540 -201.5880
## 169004:  738.9810  95.470300 46185.10 -1004.6000 22957.80 -661.6280   72.3356
##               Ly6C       CD4      FileName FileNo  NK11_asinh    CD3_asinh
##              <num>     <num>        <fctr>  <int>       <num>        <num>
##      1:   958.7000  711.0720   CNS_Mock_01      1  0.08464269  0.080111681
##      2:   448.2590  307.2720   CNS_Mock_01      1  0.08581185  0.235835773
##      3:  1008.8300  707.0940   CNS_Mock_01      1  0.11819779  0.401593928
##      4:   440.0710  249.7840   CNS_Mock_01      1  0.67698633 -0.000467756
##      5:    87.4815  867.5700   CNS_Mock_01      1  0.79429776  0.079985087
##     ---                                                                   
## 169000: 12023.7000 1103.0500 CNS_WNV_D7_06     12  1.36096843  0.145201437
## 169001:  4188.3300  315.9400 CNS_WNV_D7_06     12 -0.02052696  0.128027364
## 169002:  8923.4000 -453.4640 CNS_WNV_D7_06     12 -0.36070893 -0.018890177
## 169003: 10365.7000   61.6765 CNS_WNV_D7_06     12  0.47832275  0.445126321
## 169004:  9704.4700  -31.8532 CNS_WNV_D7_06     12  1.18247624  0.189799003
##         CD45_asinh  Ly6G_asinh CD11b_asinh  B220_asinh  CD8a_asinh Ly6C_asinh
##              <num>       <num>       <num>       <num>       <num>      <num>
##      1:   3.316967 -0.64409775    4.080349 -0.08039317  0.16666238  1.4060734
##      2:   1.982231 -0.77832998    3.122671  0.17247816  0.06938811  0.8062765
##      3:   3.714001 -1.88215184    4.376885  0.77554551  0.54445897  1.4515055
##      4:   2.709829 -1.26580137    3.639269  0.35719570  0.62559714  0.7940335
##      5:   3.605299  0.08092265    3.136655 -0.41166199  0.29417852  0.1740824
##     ---                                                                      
## 169000:   4.835271 -0.59701179    4.735145 -0.01559377 -0.51987213  3.8736061
## 169001:   5.197157 -0.93752450    4.510277  0.39450870 -1.38534526  2.8221404
## 169002:   3.858443 -0.19464519    4.233563  0.24450841 -0.42678897  3.5757554
## 169003:   4.861056 -1.11588765    4.344275 -1.08581190 -0.39298254  3.7253778
## 169004:   5.218981 -1.44774229    4.520071 -1.09255123  0.14417124  3.6595440
##           CD4_asinh     Sample  Group  Batch
##               <num>     <char> <char> <char>
##      1:  1.15078593 01_Mock_01   Mock      A
##      2:  0.58125620 01_Mock_01   Mock      A
##      3:  1.14620108 01_Mock_01   Mock      A
##      4:  0.48082540 01_Mock_01   Mock      A
##      5:  1.31850146 01_Mock_01   Mock      A
##     ---                                     
## 169000:  1.53218180  12_WNV_06    WNV      A
## 169001:  0.59596889  12_WNV_06    WNV      A
## 169002: -0.81400762  12_WNV_06    WNV      A
## 169003:  0.12304230  12_WNV_06    WNV      A
## 169004: -0.06366339  12_WNV_06    WNV      A

Check the column names and specify columns that represent cellular features (in this case, the arcsinh transformed data, defined by “markername_asinh”).

    ### Columns

        as.matrix(names(cell.dat))

##       [,1]         
##  [1,] "NK11"       
##  [2,] "CD3"        
##  [3,] "CD45"       
##  [4,] "Ly6G"       
##  [5,] "CD11b"      
##  [6,] "B220"       
##  [7,] "CD8a"       
##  [8,] "Ly6C"       
##  [9,] "CD4"        
## [10,] "FileName"   
## [11,] "FileNo"     
## [12,] "NK11_asinh" 
## [13,] "CD3_asinh"  
## [14,] "CD45_asinh" 
## [15,] "Ly6G_asinh" 
## [16,] "CD11b_asinh"
## [17,] "B220_asinh" 
## [18,] "CD8a_asinh" 
## [19,] "Ly6C_asinh" 
## [20,] "CD4_asinh"  
## [21,] "Sample"     
## [22,] "Group"      
## [23,] "Batch"

        cellular.cols <- names(cell.dat)[c(12:20)]
        as.matrix(cellular.cols)

##       [,1]         
##  [1,] "NK11_asinh" 
##  [2,] "CD3_asinh"  
##  [3,] "CD45_asinh" 
##  [4,] "Ly6G_asinh" 
##  [5,] "CD11b_asinh"
##  [6,] "B220_asinh" 
##  [7,] "CD8a_asinh" 
##  [8,] "Ly6C_asinh" 
##  [9,] "CD4_asinh"

Additionally, specify the columns that will be used to generate cluster and tSNE/UMAP results. Columns that are not specified here will still be analysed, but won’t contributed to the generation of clusters. There are a couple of strategies to take here: use all cellular columns for clustering to looks for all possible cell types/states, or use only stably expressed markers to cluster stable phenotypes, which can then be examined for changes in more dynamic markers. For more guidance, see our tutorials page.

        as.matrix(names(cell.dat))

##       [,1]         
##  [1,] "NK11"       
##  [2,] "CD3"        
##  [3,] "CD45"       
##  [4,] "Ly6G"       
##  [5,] "CD11b"      
##  [6,] "B220"       
##  [7,] "CD8a"       
##  [8,] "Ly6C"       
##  [9,] "CD4"        
## [10,] "FileName"   
## [11,] "FileNo"     
## [12,] "NK11_asinh" 
## [13,] "CD3_asinh"  
## [14,] "CD45_asinh" 
## [15,] "Ly6G_asinh" 
## [16,] "CD11b_asinh"
## [17,] "B220_asinh" 
## [18,] "CD8a_asinh" 
## [19,] "Ly6C_asinh" 
## [20,] "CD4_asinh"  
## [21,] "Sample"     
## [22,] "Group"      
## [23,] "Batch"

        cluster.cols <- names(cell.dat)[c(12:20)]
        as.matrix(cluster.cols)

##       [,1]         
##  [1,] "NK11_asinh" 
##  [2,] "CD3_asinh"  
##  [3,] "CD45_asinh" 
##  [4,] "Ly6G_asinh" 
##  [5,] "CD11b_asinh"
##  [6,] "B220_asinh" 
##  [7,] "CD8a_asinh" 
##  [8,] "Ly6C_asinh" 
##  [9,] "CD4_asinh"

Specify sample, group, and batch columns.

        exp.name <- "CNS experiment"
        sample.col <- "Sample"
        group.col <- "Group"
        batch.col <- "Batch"

Additionally, we want to specify the downsample targets for dimensionality reduction. This influences how many cells will be shown on a tSNE/UMAP plot, and we are specifying the number of cells per group to downsample to. Check for the number of cells (rows) in each group:

    ### Subsample targets per group

        data.frame(table(cell.dat[[group.col]])) # Check number of cells per sample.

##   Var1   Freq
## 1 Mock  66992
## 2  WNV 102012

You can then specify the number to downsample to in each group. These must be lower than the total number of cells in each group. In this example we want 2000 cells from ‘mock’ and 20,000 cells from ‘WNV’, to reflect the number of cells present in each group.

First, check the order that the groups appear in the dataset.

        unique(cell.dat[[group.col]])

## [1] "Mock" "WNV"

Now you can specify the targets (in the order of unique(cell.dat[[group.col]]) above).

        sub.targets <- c(2000, 20000) # target subsample numbers from each group
        sub.targets

## [1]  2000 20000

5. Clustering and DR

#######################################################################################################
#### 5. Clustering and dimensionality reduction
#######################################################################################################

    setwd(OutputDirectory)
    dir.create("Output 2 - clustering")
    setwd("Output 2 - clustering")

We can run clustering using the run.flowsom function. In this case we can define the number of desired metaclusters manually, with the meta.k argument (in this case we have chosen 8). This can be increased or decreased as required. Typically, overclustering is preferred, as multiple clusters that represent a single cellular population can always be annotated as such. Subsequently, we can write the clustered dataset to disk.

    ### Clustering

        cell.dat <- run.flowsom(cell.dat, cluster.cols, meta.k = 8)
        cell.dat

##              NK11        CD3     CD45       Ly6G    CD11b      B220      CD8a
##             <num>      <num>    <num>      <num>    <num>     <num>     <num>
##      1:   42.3719  40.098700  6885.08  -344.7830 14787.30  -40.2399   83.7175
##      2:   42.9586 119.014000  1780.29  -429.6650  5665.73   86.6673   34.7219
##      3:   59.2366 206.238000 10248.30 -1603.8400 19894.30  427.8310  285.8800
##      4:  364.9480  -0.233878  3740.04  -815.9800  9509.43  182.4200  333.6050
##      5:  440.2470  40.035200  9191.38    40.5055  5745.82 -211.6940  149.2200
##     ---                                                                      
## 169000:  910.8890  72.856100 31466.20  -316.5570 28467.80   -7.7972 -271.8040
## 169001:  -10.2642  64.188700 45188.00  -540.5140 22734.00  202.4110 -936.4920
## 169002: -184.2910  -9.445650 11842.60   -97.9383 17237.00  123.4760 -219.9320
## 169003:  248.3860 229.986000 32288.20  -681.1630 19255.80 -656.0540 -201.5880
## 169004:  738.9810  95.470300 46185.10 -1004.6000 22957.80 -661.6280   72.3356
##               Ly6C       CD4      FileName FileNo  NK11_asinh    CD3_asinh
##              <num>     <num>        <fctr>  <int>       <num>        <num>
##      1:   958.7000  711.0720   CNS_Mock_01      1  0.08464269  0.080111681
##      2:   448.2590  307.2720   CNS_Mock_01      1  0.08581185  0.235835773
##      3:  1008.8300  707.0940   CNS_Mock_01      1  0.11819779  0.401593928
##      4:   440.0710  249.7840   CNS_Mock_01      1  0.67698633 -0.000467756
##      5:    87.4815  867.5700   CNS_Mock_01      1  0.79429776  0.079985087
##     ---                                                                   
## 169000: 12023.7000 1103.0500 CNS_WNV_D7_06     12  1.36096843  0.145201437
## 169001:  4188.3300  315.9400 CNS_WNV_D7_06     12 -0.02052696  0.128027364
## 169002:  8923.4000 -453.4640 CNS_WNV_D7_06     12 -0.36070893 -0.018890177
## 169003: 10365.7000   61.6765 CNS_WNV_D7_06     12  0.47832275  0.445126321
## 169004:  9704.4700  -31.8532 CNS_WNV_D7_06     12  1.18247624  0.189799003
##         CD45_asinh  Ly6G_asinh CD11b_asinh  B220_asinh  CD8a_asinh Ly6C_asinh
##              <num>       <num>       <num>       <num>       <num>      <num>
##      1:   3.316967 -0.64409775    4.080349 -0.08039317  0.16666238  1.4060734
##      2:   1.982231 -0.77832998    3.122671  0.17247816  0.06938811  0.8062765
##      3:   3.714001 -1.88215184    4.376885  0.77554551  0.54445897  1.4515055
##      4:   2.709829 -1.26580137    3.639269  0.35719570  0.62559714  0.7940335
##      5:   3.605299  0.08092265    3.136655 -0.41166199  0.29417852  0.1740824
##     ---                                                                      
## 169000:   4.835271 -0.59701179    4.735145 -0.01559377 -0.51987213  3.8736061
## 169001:   5.197157 -0.93752450    4.510277  0.39450870 -1.38534526  2.8221404
## 169002:   3.858443 -0.19464519    4.233563  0.24450841 -0.42678897  3.5757554
## 169003:   4.861056 -1.11588765    4.344275 -1.08581190 -0.39298254  3.7253778
## 169004:   5.218981 -1.44774229    4.520071 -1.09255123  0.14417124  3.6595440
##           CD4_asinh     Sample  Group  Batch FlowSOM_cluster
##               <num>     <char> <char> <char>           <num>
##      1:  1.15078593 01_Mock_01   Mock      A             162
##      2:  0.58125620 01_Mock_01   Mock      A             113
##      3:  1.14620108 01_Mock_01   Mock      A              75
##      4:  0.48082540 01_Mock_01   Mock      A             147
##      5:  1.31850146 01_Mock_01   Mock      A             144
##     ---                                                     
## 169000:  1.53218180  12_WNV_06    WNV      A             125
## 169001:  0.59596889  12_WNV_06    WNV      A              78
## 169002: -0.81400762  12_WNV_06    WNV      A             165
## 169003:  0.12304230  12_WNV_06    WNV      A             108
## 169004: -0.06366339  12_WNV_06    WNV      A              36
##         FlowSOM_metacluster
##                       <int>
##      1:                   7
##      2:                   7
##      3:                   7
##      4:                   7
##      5:                   7
##     ---                    
## 169000:                   4
## 169001:                   4
## 169002:                   4
## 169003:                   4
## 169004:                   4

We can then run dimensionality reduction on a subset of the data, allow us to visualise the data and resulting clusters. In this case we have used run.umap, though other options are available, including run.fitsne and run.tsne. As before, this subsampled dataset with DR coordinates is saved to disk.

    ### Dimensionality reduction

        cell.sub <- do.subsample(cell.dat, sub.targets, group.col)
        cell.sub

##             NK11       CD3     CD45      Ly6G    CD11b      B220      CD8a
##            <num>     <num>    <num>     <num>    <num>     <num>     <num>
##     1:  106.0150 235.13900 13574.70  -179.318 31385.80 -169.4210 -394.6730
##     2:  132.7940  16.52790  9533.69  -673.468 22061.20  184.8820  282.5120
##     3:  236.6740 143.33000  7805.94  -605.170 11658.20  658.9580  107.7280
##     4:   14.8980 -10.63350  3628.48   468.071 12264.60 -178.1110 -194.3480
##     5:  194.5450   3.62867  3192.66  -515.083 11939.20  -33.5803   42.3089
##    ---                                                                    
## 21996:  -16.4986  11.00010 24706.30  -263.460  9151.59 -144.2120 -499.9370
## 21997:  119.8740  92.33010 43645.10 -1130.550 38278.10  -42.0887 -445.5530
## 21998: 2884.3600  88.01560 88998.60 -4851.610  4889.69 4999.7900  -58.7703
## 21999:  560.6400 151.51900 55440.80 -2633.550 23937.20  -95.7631  656.6650
## 22000:  118.9280 331.26200 52654.70 -3161.200 25113.80 -938.5550  155.8260
##             Ly6C       CD4      FileName FileNo  NK11_asinh    CD3_asinh
##            <num>     <num>        <fctr>  <int>       <num>        <num>
##     1:   380.647  509.4590   CNS_Mock_05      5  0.21047261  0.454470942
##     2:   608.245 1095.4700   CNS_Mock_04      4  0.26256084  0.033049783
##     3:   805.445 1937.4200   CNS_Mock_04      4  0.45724742  0.282872452
##     4:   126.416  227.9000   CNS_Mock_04      4  0.02979159 -0.021265397
##     5:   666.571  358.8580   CNS_Mock_02      2  0.37988669  0.007257276
##    ---                                                                  
## 21996: 22334.600   59.6839 CNS_WNV_D7_04     10 -0.03299121  0.021998426
## 21997:  4490.860  727.2510 CNS_WNV_D7_06     12  0.23750870  0.183626518
## 21998:  1175.860  273.2380 CNS_WNV_D7_01      7  2.45302657  0.175134535
## 21999: 13667.400  334.3160 CNS_WNV_D7_04     10  0.96458591  0.298581704
## 22000: 23457.800 2331.7400 CNS_WNV_D7_01      7  0.23566844  0.621694916
##        CD45_asinh Ly6G_asinh CD11b_asinh  B220_asinh  CD8a_asinh Ly6C_asinh
##             <num>      <num>       <num>       <num>       <num>      <num>
##     1:   3.994841 -0.3513617    4.832713 -0.33267180 -0.72432725  0.7021579
##     2:   3.641813 -1.1067473    4.480243  0.36181778  0.53860285  1.0264871
##     3:   3.442203 -1.0225758    3.842764  1.08932754  0.21382293  1.2547409
##     4:   2.679822  0.8354908    3.893427 -0.34908852 -0.37951948  0.2502130
##     5:   2.553225 -0.9025441    3.866560 -0.06711021  0.08451714  1.0984975
##    ---                                                                     
## 21996:   4.593455 -0.5051593    3.600967 -0.28456776 -0.88128449  4.4925567
## 21997:   5.162418 -1.5546568    5.031215 -0.08407830 -0.80224144  2.8914233
## 21998:   5.874923 -2.9682497    2.976027  2.99818116 -0.11727162  1.5907079
## 21999:   5.401630 -2.3635192    4.561837 -0.19037418  1.08655240  4.0016422
## 22000:   5.350072 -2.5434425    4.609811 -1.38728687  0.30681557  4.5416110
##        CD4_asinh     Sample  Group  Batch FlowSOM_cluster FlowSOM_metacluster
##            <num>     <char> <char> <char>           <num>               <int>
##     1: 0.8946876 05_Mock_05   Mock      A             145                   7
##     2: 1.5259050 04_Mock_04   Mock      A             104                   7
##     3: 2.0639011 04_Mock_04   Mock      A              73                   7
##     4: 0.4413331 04_Mock_04   Mock      A             184                   7
##     5: 0.6671197 02_Mock_02   Mock      B             147                   7
##    ---                                                                       
## 21996: 0.1190861  10_WNV_04    WNV      A             151                   4
## 21997: 1.1692576  12_WNV_06    WNV      A              50                   4
## 21998: 0.5223904  07_WNV_01    WNV      A              44                   1
## 21999: 0.6267796  10_WNV_04    WNV      A              23                   4
## 22000: 2.2442111  07_WNV_01    WNV      A              13                   4

        cell.sub <- run.umap(cell.sub, cluster.cols)
        cell.sub

##             NK11       CD3     CD45      Ly6G    CD11b      B220      CD8a
##            <num>     <num>    <num>     <num>    <num>     <num>     <num>
##     1:  106.0150 235.13900 13574.70  -179.318 31385.80 -169.4210 -394.6730
##     2:  132.7940  16.52790  9533.69  -673.468 22061.20  184.8820  282.5120
##     3:  236.6740 143.33000  7805.94  -605.170 11658.20  658.9580  107.7280
##     4:   14.8980 -10.63350  3628.48   468.071 12264.60 -178.1110 -194.3480
##     5:  194.5450   3.62867  3192.66  -515.083 11939.20  -33.5803   42.3089
##    ---                                                                    
## 21996:  -16.4986  11.00010 24706.30  -263.460  9151.59 -144.2120 -499.9370
## 21997:  119.8740  92.33010 43645.10 -1130.550 38278.10  -42.0887 -445.5530
## 21998: 2884.3600  88.01560 88998.60 -4851.610  4889.69 4999.7900  -58.7703
## 21999:  560.6400 151.51900 55440.80 -2633.550 23937.20  -95.7631  656.6650
## 22000:  118.9280 331.26200 52654.70 -3161.200 25113.80 -938.5550  155.8260
##             Ly6C       CD4      FileName FileNo  NK11_asinh    CD3_asinh
##            <num>     <num>        <fctr>  <int>       <num>        <num>
##     1:   380.647  509.4590   CNS_Mock_05      5  0.21047261  0.454470942
##     2:   608.245 1095.4700   CNS_Mock_04      4  0.26256084  0.033049783
##     3:   805.445 1937.4200   CNS_Mock_04      4  0.45724742  0.282872452
##     4:   126.416  227.9000   CNS_Mock_04      4  0.02979159 -0.021265397
##     5:   666.571  358.8580   CNS_Mock_02      2  0.37988669  0.007257276
##    ---                                                                  
## 21996: 22334.600   59.6839 CNS_WNV_D7_04     10 -0.03299121  0.021998426
## 21997:  4490.860  727.2510 CNS_WNV_D7_06     12  0.23750870  0.183626518
## 21998:  1175.860  273.2380 CNS_WNV_D7_01      7  2.45302657  0.175134535
## 21999: 13667.400  334.3160 CNS_WNV_D7_04     10  0.96458591  0.298581704
## 22000: 23457.800 2331.7400 CNS_WNV_D7_01      7  0.23566844  0.621694916
##        CD45_asinh Ly6G_asinh CD11b_asinh  B220_asinh  CD8a_asinh Ly6C_asinh
##             <num>      <num>       <num>       <num>       <num>      <num>
##     1:   3.994841 -0.3513617    4.832713 -0.33267180 -0.72432725  0.7021579
##     2:   3.641813 -1.1067473    4.480243  0.36181778  0.53860285  1.0264871
##     3:   3.442203 -1.0225758    3.842764  1.08932754  0.21382293  1.2547409
##     4:   2.679822  0.8354908    3.893427 -0.34908852 -0.37951948  0.2502130
##     5:   2.553225 -0.9025441    3.866560 -0.06711021  0.08451714  1.0984975
##    ---                                                                     
## 21996:   4.593455 -0.5051593    3.600967 -0.28456776 -0.88128449  4.4925567
## 21997:   5.162418 -1.5546568    5.031215 -0.08407830 -0.80224144  2.8914233
## 21998:   5.874923 -2.9682497    2.976027  2.99818116 -0.11727162  1.5907079
## 21999:   5.401630 -2.3635192    4.561837 -0.19037418  1.08655240  4.0016422
## 22000:   5.350072 -2.5434425    4.609811 -1.38728687  0.30681557  4.5416110
##        CD4_asinh     Sample  Group  Batch FlowSOM_cluster FlowSOM_metacluster
##            <num>     <char> <char> <char>           <num>               <int>
##     1: 0.8946876 05_Mock_05   Mock      A             145                   7
##     2: 1.5259050 04_Mock_04   Mock      A             104                   7
##     3: 2.0639011 04_Mock_04   Mock      A              73                   7
##     4: 0.4413331 04_Mock_04   Mock      A             184                   7
##     5: 0.6671197 02_Mock_02   Mock      B             147                   7
##    ---                                                                       
## 21996: 0.1190861  10_WNV_04    WNV      A             151                   4
## 21997: 1.1692576  12_WNV_06    WNV      A              50                   4
## 21998: 0.5223904  07_WNV_01    WNV      A              44                   1
## 21999: 0.6267796  10_WNV_04    WNV      A              23                   4
## 22000: 2.2442111  07_WNV_01    WNV      A              13                   4
##              UMAP_X     UMAP_Y
##               <num>      <num>
##     1:  -2.00245842 -3.0362981
##     2:  -1.31822215 -3.8230994
##     3:  -1.48708923 -3.9165858
##     4:  -3.81201521 -3.2078957
##     5:  -2.58767721 -4.1725518
##    ---                        
## 21996:  -0.02283664  2.0631377
## 21997:   2.05799768  2.3655588
## 21998: -10.99510792 -5.2987869
## 21999:   5.09514622  0.9261395
## 22000:   6.26499282  3.0087131

We can visualise the DR data to asses which clusters represent cellular populations.

    ### DR plots

        make.colour.plot(cell.sub, "UMAP_X", "UMAP_Y", "FlowSOM_metacluster", col.type = 'factor', add.label = TRUE)

        make.multi.plot(cell.sub, "UMAP_X", "UMAP_Y", cellular.cols)

We can also generate some multi plots to compare between experimental groups or batches.

        make.multi.plot(cell.sub, "UMAP_X", "UMAP_Y", "FlowSOM_metacluster", group.col, col.type = 'factor')

We can also produce expression heatmaps to help guide our interpretation of cluster identities.

    ### Expression heatmap

        exp <- do.aggregate(cell.dat, cellular.cols, by = "FlowSOM_metacluster")
        make.pheatmap(exp, "FlowSOM_metacluster", cellular.cols)

6. Annotate clusters

#######################################################################################################
#### 6. Annotate clusters
#######################################################################################################

Review the cluster labels and marker expression patterns, so you can annotate the clusters. This annotation is optional, as all subsequent steps can be performed on the ‘clusters’ instead of the ‘populations’. Here we can create a list of population names, and then specify which clusters make up that population (e.g. CD4 T cells are contained within cluster ‘3’).

    setwd(OutputDirectory)
    dir.create("Output 3 - annotation")
    setwd("Output 3 - annotation")

    ### Annotate

        annots <- list("CD4 T cells" = c(3),
                       "CD8 T cells" = c(2),
                       "NK cells" = c(1),
                       "Neutrophils" = c(8),
                       "Infil Macrophages" = c(4),
                       "Microglia" = c(5,6,7)
        )

Once the annotation list is created, we can switch the list into a table format to annotate our data.

        annots <- do.list.switch(annots)
        names(annots) <- c("Values", "Population")
        setorderv(annots, 'Values')
        annots

##    Values        Population
##     <num>            <char>
## 1:      1          NK cells
## 2:      2       CD8 T cells
## 3:      3       CD4 T cells
## 4:      4 Infil Macrophages
## 5:      5         Microglia
## 6:      6         Microglia
## 7:      7         Microglia
## 8:      8       Neutrophils

Using the do.add.cols function, we can add the population names to the corresponding clusters.

    ### Add annotations

        cell.dat <- do.add.cols(cell.dat, "FlowSOM_metacluster", annots, "Values")
        cell.dat

##              NK11        CD3     CD45       Ly6G    CD11b      B220      CD8a
##             <num>      <num>    <num>      <num>    <num>     <num>     <num>
##      1:   42.3719  40.098700  6885.08  -344.7830 14787.30  -40.2399   83.7175
##      2:   42.9586 119.014000  1780.29  -429.6650  5665.73   86.6673   34.7219
##      3:   59.2366 206.238000 10248.30 -1603.8400 19894.30  427.8310  285.8800
##      4:  364.9480  -0.233878  3740.04  -815.9800  9509.43  182.4200  333.6050
##      5:  440.2470  40.035200  9191.38    40.5055  5745.82 -211.6940  149.2200
##     ---                                                                      
## 169000:  910.8890  72.856100 31466.20  -316.5570 28467.80   -7.7972 -271.8040
## 169001:  -10.2642  64.188700 45188.00  -540.5140 22734.00  202.4110 -936.4920
## 169002: -184.2910  -9.445650 11842.60   -97.9383 17237.00  123.4760 -219.9320
## 169003:  248.3860 229.986000 32288.20  -681.1630 19255.80 -656.0540 -201.5880
## 169004:  738.9810  95.470300 46185.10 -1004.6000 22957.80 -661.6280   72.3356
##               Ly6C       CD4      FileName FileNo  NK11_asinh    CD3_asinh
##              <num>     <num>        <fctr>  <int>       <num>        <num>
##      1:   958.7000  711.0720   CNS_Mock_01      1  0.08464269  0.080111681
##      2:   448.2590  307.2720   CNS_Mock_01      1  0.08581185  0.235835773
##      3:  1008.8300  707.0940   CNS_Mock_01      1  0.11819779  0.401593928
##      4:   440.0710  249.7840   CNS_Mock_01      1  0.67698633 -0.000467756
##      5:    87.4815  867.5700   CNS_Mock_01      1  0.79429776  0.079985087
##     ---                                                                   
## 169000: 12023.7000 1103.0500 CNS_WNV_D7_06     12  1.36096843  0.145201437
## 169001:  4188.3300  315.9400 CNS_WNV_D7_06     12 -0.02052696  0.128027364
## 169002:  8923.4000 -453.4640 CNS_WNV_D7_06     12 -0.36070893 -0.018890177
## 169003: 10365.7000   61.6765 CNS_WNV_D7_06     12  0.47832275  0.445126321
## 169004:  9704.4700  -31.8532 CNS_WNV_D7_06     12  1.18247624  0.189799003
##         CD45_asinh  Ly6G_asinh CD11b_asinh  B220_asinh  CD8a_asinh Ly6C_asinh
##              <num>       <num>       <num>       <num>       <num>      <num>
##      1:   3.316967 -0.64409775    4.080349 -0.08039317  0.16666238  1.4060734
##      2:   1.982231 -0.77832998    3.122671  0.17247816  0.06938811  0.8062765
##      3:   3.714001 -1.88215184    4.376885  0.77554551  0.54445897  1.4515055
##      4:   2.709829 -1.26580137    3.639269  0.35719570  0.62559714  0.7940335
##      5:   3.605299  0.08092265    3.136655 -0.41166199  0.29417852  0.1740824
##     ---                                                                      
## 169000:   4.835271 -0.59701179    4.735145 -0.01559377 -0.51987213  3.8736061
## 169001:   5.197157 -0.93752450    4.510277  0.39450870 -1.38534526  2.8221404
## 169002:   3.858443 -0.19464519    4.233563  0.24450841 -0.42678897  3.5757554
## 169003:   4.861056 -1.11588765    4.344275 -1.08581190 -0.39298254  3.7253778
## 169004:   5.218981 -1.44774229    4.520071 -1.09255123  0.14417124  3.6595440
##           CD4_asinh     Sample  Group  Batch FlowSOM_cluster
##               <num>     <char> <char> <char>           <num>
##      1:  1.15078593 01_Mock_01   Mock      A             162
##      2:  0.58125620 01_Mock_01   Mock      A             113
##      3:  1.14620108 01_Mock_01   Mock      A              75
##      4:  0.48082540 01_Mock_01   Mock      A             147
##      5:  1.31850146 01_Mock_01   Mock      A             144
##     ---                                                     
## 169000:  1.53218180  12_WNV_06    WNV      A             125
## 169001:  0.59596889  12_WNV_06    WNV      A              78
## 169002: -0.81400762  12_WNV_06    WNV      A             165
## 169003:  0.12304230  12_WNV_06    WNV      A             108
## 169004: -0.06366339  12_WNV_06    WNV      A              36
##         FlowSOM_metacluster        Population
##                      <fctr>            <char>
##      1:                   7         Microglia
##      2:                   7         Microglia
##      3:                   7         Microglia
##      4:                   7         Microglia
##      5:                   7         Microglia
##     ---                                      
## 169000:                   4 Infil Macrophages
## 169001:                   4 Infil Macrophages
## 169002:                   4 Infil Macrophages
## 169003:                   4 Infil Macrophages
## 169004:                   4 Infil Macrophages

        cell.sub <- do.add.cols(cell.sub, "FlowSOM_metacluster", annots, "Values")
        cell.sub

##             NK11       CD3     CD45      Ly6G    CD11b      B220      CD8a
##            <num>     <num>    <num>     <num>    <num>     <num>     <num>
##     1:  106.0150 235.13900 13574.70  -179.318 31385.80 -169.4210 -394.6730
##     2:  132.7940  16.52790  9533.69  -673.468 22061.20  184.8820  282.5120
##     3:  236.6740 143.33000  7805.94  -605.170 11658.20  658.9580  107.7280
##     4:   14.8980 -10.63350  3628.48   468.071 12264.60 -178.1110 -194.3480
##     5:  194.5450   3.62867  3192.66  -515.083 11939.20  -33.5803   42.3089
##    ---                                                                    
## 21996:  -16.4986  11.00010 24706.30  -263.460  9151.59 -144.2120 -499.9370
## 21997:  119.8740  92.33010 43645.10 -1130.550 38278.10  -42.0887 -445.5530
## 21998: 2884.3600  88.01560 88998.60 -4851.610  4889.69 4999.7900  -58.7703
## 21999:  560.6400 151.51900 55440.80 -2633.550 23937.20  -95.7631  656.6650
## 22000:  118.9280 331.26200 52654.70 -3161.200 25113.80 -938.5550  155.8260
##             Ly6C       CD4      FileName FileNo  NK11_asinh    CD3_asinh
##            <num>     <num>        <fctr>  <int>       <num>        <num>
##     1:   380.647  509.4590   CNS_Mock_05      5  0.21047261  0.454470942
##     2:   608.245 1095.4700   CNS_Mock_04      4  0.26256084  0.033049783
##     3:   805.445 1937.4200   CNS_Mock_04      4  0.45724742  0.282872452
##     4:   126.416  227.9000   CNS_Mock_04      4  0.02979159 -0.021265397
##     5:   666.571  358.8580   CNS_Mock_02      2  0.37988669  0.007257276
##    ---                                                                  
## 21996: 22334.600   59.6839 CNS_WNV_D7_04     10 -0.03299121  0.021998426
## 21997:  4490.860  727.2510 CNS_WNV_D7_06     12  0.23750870  0.183626518
## 21998:  1175.860  273.2380 CNS_WNV_D7_01      7  2.45302657  0.175134535
## 21999: 13667.400  334.3160 CNS_WNV_D7_04     10  0.96458591  0.298581704
## 22000: 23457.800 2331.7400 CNS_WNV_D7_01      7  0.23566844  0.621694916
##        CD45_asinh Ly6G_asinh CD11b_asinh  B220_asinh  CD8a_asinh Ly6C_asinh
##             <num>      <num>       <num>       <num>       <num>      <num>
##     1:   3.994841 -0.3513617    4.832713 -0.33267180 -0.72432725  0.7021579
##     2:   3.641813 -1.1067473    4.480243  0.36181778  0.53860285  1.0264871
##     3:   3.442203 -1.0225758    3.842764  1.08932754  0.21382293  1.2547409
##     4:   2.679822  0.8354908    3.893427 -0.34908852 -0.37951948  0.2502130
##     5:   2.553225 -0.9025441    3.866560 -0.06711021  0.08451714  1.0984975
##    ---                                                                     
## 21996:   4.593455 -0.5051593    3.600967 -0.28456776 -0.88128449  4.4925567
## 21997:   5.162418 -1.5546568    5.031215 -0.08407830 -0.80224144  2.8914233
## 21998:   5.874923 -2.9682497    2.976027  2.99818116 -0.11727162  1.5907079
## 21999:   5.401630 -2.3635192    4.561837 -0.19037418  1.08655240  4.0016422
## 22000:   5.350072 -2.5434425    4.609811 -1.38728687  0.30681557  4.5416110
##        CD4_asinh     Sample  Group  Batch FlowSOM_cluster FlowSOM_metacluster
##            <num>     <char> <char> <char>           <num>              <fctr>
##     1: 0.8946876 05_Mock_05   Mock      A             145                   7
##     2: 1.5259050 04_Mock_04   Mock      A             104                   7
##     3: 2.0639011 04_Mock_04   Mock      A              73                   7
##     4: 0.4413331 04_Mock_04   Mock      A             184                   7
##     5: 0.6671197 02_Mock_02   Mock      B             147                   7
##    ---                                                                       
## 21996: 0.1190861  10_WNV_04    WNV      A             151                   4
## 21997: 1.1692576  12_WNV_06    WNV      A              50                   4
## 21998: 0.5223904  07_WNV_01    WNV      A              44                   1
## 21999: 0.6267796  10_WNV_04    WNV      A              23                   4
## 22000: 2.2442111  07_WNV_01    WNV      A              13                   4
##              UMAP_X     UMAP_Y        Population
##               <num>      <num>            <char>
##     1:  -2.00245842 -3.0362981         Microglia
##     2:  -1.31822215 -3.8230994         Microglia
##     3:  -1.48708923 -3.9165858         Microglia
##     4:  -3.81201521 -3.2078957         Microglia
##     5:  -2.58767721 -4.1725518         Microglia
##    ---                                          
## 21996:  -0.02283664  2.0631377 Infil Macrophages
## 21997:   2.05799768  2.3655588 Infil Macrophages
## 21998: -10.99510792 -5.2987869          NK cells
## 21999:   5.09514622  0.9261395 Infil Macrophages
## 22000:   6.26499282  3.0087131 Infil Macrophages

Subsequently, we can visualise the population labels on a UMAP plot.

        make.colour.plot(cell.sub, "UMAP_X", "UMAP_Y", "Population", col.type = 'factor', add.label = TRUE)

        make.multi.plot(cell.sub, "UMAP_X", "UMAP_Y", "Population", group.col, col.type = 'factor')

We can also generate an expression heatmap to summarise the expression levels of each marker on our populations.

    ### Expression heatmap
        
        rm(exp)
        exp <- do.aggregate(cell.dat, cellular.cols, by = "Population")
        make.pheatmap(exp, "Population", cellular.cols)

    ### Write data
        
        setwd(OutputDirectory)
        setwd("Output 3 - annotation")
        
        fwrite(cell.dat, "Annotated.data.csv")
        fwrite(cell.sub, "Annotated.data.DR.csv")

    ### Write FCS files
        
        setwd(OutputDirectory)
        setwd("Output 3 - annotation")
        dir.create('FCS files - all')
        setwd('FCS files - all')
        
        write.files(cell.dat,
                    file.prefix = exp.name,
                    divide.by = sample.col,
                    write.csv = FALSE,
                    write.fcs = TRUE)

        setwd(OutputDirectory)
        setwd("Output 3 - annotation")
        dir.create('FCS files - DR')
        setwd('FCS files - DR')
        
        write.files(cell.sub,
                    file.prefix = paste0('DR-', exp.name),
                    divide.by = sample.col,
                    write.csv = FALSE,
                    write.fcs = TRUE)

7. Summary data and stats

#######################################################################################################
#### 7. Summary data and statistical analysis
#######################################################################################################

Here we can create ‘summary’ data for our experiment. This involves calculating the percentage of each population in each sample, along with the corresponding cell counts if the information is available. In addition, we calculate the MFI for selected markers on each population in each sample.

First, set the working directory, and select which columns we will measure the MFI of. In this case, CD11b_asinh and Ly6C_asinh.

    setwd(OutputDirectory)
    dir.create("Output 4 - summary data")
    setwd("Output 4 - summary data")

    ### Setup
    
        variance.test <- 'kruskal.test'
        pairwise.test <- "wilcox.test"
    
        as.matrix(unique(cell.dat[[group.col]]))

##      [,1]  
## [1,] "Mock"
## [2,] "WNV"

        comparisons <- list(c("Mock", "WNV"))
        comparisons

## [[1]]
## [1] "Mock" "WNV"

        grp.order <- c("Mock", "WNV")
        grp.order

## [1] "Mock" "WNV"

We can also specify which columns we wish to measure MFI levels on.

    ### Select columns to measure MFI
    
        as.matrix(cellular.cols)

##       [,1]         
##  [1,] "NK11_asinh" 
##  [2,] "CD3_asinh"  
##  [3,] "CD45_asinh" 
##  [4,] "Ly6G_asinh" 
##  [5,] "CD11b_asinh"
##  [6,] "B220_asinh" 
##  [7,] "CD8a_asinh" 
##  [8,] "Ly6C_asinh" 
##  [9,] "CD4_asinh"

        dyn.cols <- cellular.cols[c(5,8)]
        dyn.cols

## [1] "CD11b_asinh" "Ly6C_asinh"

Create summary data

Use the new create.sumtable function to generate summary data – a data.table of samples (rows) vs measurements (columns).

    ### Create summary tables
    
        sum.dat <- create.sumtable(dat = cell.dat, 
                                   sample.col = sample.col,
                                   pop.col = "Population",
                                   use.cols = dyn.cols, 
                                   annot.cols = c(group.col, batch.col), 
                                   counts = counts)

Once the summary data has been generated, we can review it and select which columns to plot. In each case, the column names (i.e. name of each summary measure) are structured as ‘MEASURE TYPE – POPULATION’. This provides a useful structure, as we can use regular expression searches to split the name into just the MEASURE TYPE or POPULATION segment.

    ### Review summary data
        
        sum.dat

##         Sample  Group  Batch Cells per sample -- CD4 T cells
##         <fctr> <char> <char>                           <num>
##  1: 01_Mock_01   Mock      A                       10651.102
##  2: 02_Mock_02   Mock      B                        2319.818
##  3: 03_Mock_03   Mock      B                        2156.395
##  4: 04_Mock_04   Mock      A                         901.901
##  5: 05_Mock_05   Mock      A                        1832.753
##  6: 06_Mock_06   Mock      B                        8324.678
##  7:  07_WNV_01    WNV      A                      237823.955
##  8:  08_WNV_02    WNV      B                       67917.206
##  9:  09_WNV_03    WNV      A                       73409.588
## 10:  10_WNV_04    WNV      A                      131635.048
## 11:  11_WNV_05    WNV      B                      143207.334
## 12:  12_WNV_06    WNV      A                       68528.738
##     Cells per sample -- CD8 T cells Cells per sample -- Infil Macrophages
##                               <num>                                 <num>
##  1:                       2789.5743                             10186.173
##  2:                       1183.2631                              7068.440
##  3:                       1221.9570                              6702.794
##  4:                        487.9136                              5056.559
##  5:                        568.7855                             10680.528
##  6:                       2439.9918                             43919.853
##  7:                     262397.2145                           3135773.816
##  8:                      57956.0155                           2042587.322
##  9:                      53135.8210                           1483562.723
## 10:                     107174.1288                           2872771.000
## 11:                     123481.1001                           2674826.071
## 12:                      60047.9412                           1231957.599
##     Cells per sample -- Microglia Cells per sample -- Neutrophils
##                             <num>                           <num>
##  1:                      381368.6                       11919.090
##  2:                      221098.9                        6663.639
##  3:                      235747.9                        8158.360
##  4:                      238782.0                        5751.467
##  5:                      318456.7                       11438.908
##  6:                      587751.0                       52818.646
##  7:                      658676.3                      129927.577
##  8:                      372910.7                      148874.515
##  9:                      248983.1                       86262.891
## 10:                      507879.3                      366157.259
## 11:                      510832.6                       77111.641
## 12:                      236195.1                       44743.515
##     Cells per sample -- NK cells Exp CD11b_asinh -- CD4 T cells
##                            <num>                          <num>
##  1:                     3085.438                      1.2281976
##  2:                     1665.910                      1.2510496
##  3:                     2012.635                      1.0728421
##  4:                     1020.183                      1.0422156
##  5:                     2022.348                      0.7475088
##  6:                     6745.860                      0.9305047
##  7:                   645401.114                      2.2873443
##  8:                   249754.204                      1.9717031
##  9:                   174645.915                      2.2071369
## 10:                   334383.282                      2.0382993
## 11:                   550541.253                      2.1868276
## 12:                   188527.140                      2.2291687
##     Exp CD11b_asinh -- CD8 T cells Exp CD11b_asinh -- Infil Macrophages
##                              <num>                                <num>
##  1:                      1.1640880                             3.685172
##  2:                      1.0635289                             3.898546
##  3:                      1.1374143                             4.021177
##  4:                      0.9897486                             3.931021
##  5:                      0.9672789                             3.893223
##  6:                      0.9624663                             3.555143
##  7:                      2.2006729                             4.558658
##  8:                      2.1302117                             4.486104
##  9:                      2.1989388                             4.663166
## 10:                      2.0212625                             4.425780
## 11:                      2.1805143                             4.834350
## 12:                      2.1321808                             4.681589
##     Exp CD11b_asinh -- Microglia Exp CD11b_asinh -- Neutrophils
##                            <num>                          <num>
##  1:                     3.703173                       4.424943
##  2:                     3.683890                       4.343586
##  3:                     3.725262                       4.476029
##  4:                     3.784847                       4.278391
##  5:                     3.747063                       4.361385
##  6:                     3.723275                       4.501451
##  7:                     3.648656                       5.539180
##  8:                     3.545951                       5.769196
##  9:                     3.831846                       5.625011
## 10:                     3.691808                       5.792890
## 11:                     3.648319                       5.363509
## 12:                     3.793242                       5.200583
##     Exp CD11b_asinh -- NK cells Exp Ly6C_asinh -- CD4 T cells
##                           <num>                         <num>
##  1:                    1.962920                      4.255930
##  2:                    2.118059                      1.076549
##  3:                    2.097595                      2.584336
##  4:                    1.791303                      1.230352
##  5:                    1.999800                      3.626999
##  6:                    1.474988                      3.112294
##  7:                    2.955860                      3.252281
##  8:                    3.010919                      2.668624
##  9:                    2.942789                      2.524692
## 10:                    2.898820                      2.665296
## 11:                    2.851842                      2.930416
## 12:                    2.770628                      2.365051
##     Exp Ly6C_asinh -- CD8 T cells Exp Ly6C_asinh -- Infil Macrophages
##                             <num>                               <num>
##  1:                     1.0181309                            4.483754
##  2:                     1.3204695                            4.300729
##  3:                     0.9998138                            4.502121
##  4:                     0.4853867                            4.158759
##  5:                     1.3079999                            4.234948
##  6:                     0.9641838                            4.737304
##  7:                     3.1542268                            4.690279
##  8:                     2.6102675                            4.222498
##  9:                     2.4094844                            3.964888
## 10:                     2.6639529                            4.376003
## 11:                     2.7627420                            4.193174
## 12:                     2.2350847                            3.886658
##     Exp Ly6C_asinh -- Microglia Exp Ly6C_asinh -- Neutrophils
##                           <num>                         <num>
##  1:                   0.8904561                      3.961571
##  2:                   0.8527123                      4.118082
##  3:                   0.8981631                      4.140689
##  4:                   0.8523674                      4.022654
##  5:                   0.8461584                      4.107360
##  6:                   0.9143086                      4.015038
##  7:                   1.6620987                      3.142655
##  8:                   1.5369769                      2.832004
##  9:                   1.4791886                      2.431145
## 10:                   1.6295949                      2.946887
## 11:                   1.3897775                      2.708211
## 12:                   1.4062473                      2.205575
##     Exp Ly6C_asinh -- NK cells Percent of sample -- CD4 T cells
##                          <num>                            <num>
##  1:                  1.7119220                        2.5359767
##  2:                  1.1660899                        0.9665910
##  3:                  0.9614992                        0.8423417
##  4:                  1.0565807                        0.3578972
##  5:                  0.8642064                        0.5312328
##  6:                  1.9822096                        1.1858516
##  7:                  1.6749264                        4.6908078
##  8:                  1.3674728                        2.3101090
##  9:                  1.1757863                        3.4627164
## 10:                  1.4787423                        3.0471076
## 11:                  1.2242706                        3.5099837
## 12:                  1.1603644                        3.7447398
##     Percent of sample -- CD8 T cells Percent of sample -- Infil Macrophages
##                                <num>                                  <num>
##  1:                        0.6641844                               2.425279
##  2:                        0.4930263                               2.945183
##  3:                        0.4773270                               2.618279
##  4:                        0.1936165                               2.006571
##  5:                        0.1648654                               3.095805
##  6:                        0.3475772                               6.256389
##  7:                        5.1754875                              61.849582
##  8:                        1.9712930                              69.475759
##  9:                        2.5064067                              69.979374
## 10:                        2.4808826                              66.499329
## 11:                        3.0264976                              65.559463
## 12:                        3.2813083                              67.320087
##     Percent of sample -- Microglia Percent of sample -- Neutrophils
##                              <num>                            <num>
##  1:                       90.80205                         2.837879
##  2:                       92.12455                         2.776516
##  3:                       92.08901                         3.186859
##  4:                       94.75475                         2.282328
##  5:                       92.30628                         3.315626
##  6:                       83.72521                         7.524024
##  7:                       12.99164                         2.562674
##  8:                       12.68404                         5.063759
##  9:                       11.74448                         4.069004
## 10:                       11.75646                         8.475862
## 11:                       12.52041                         1.889991
## 12:                       12.90683                         2.445001
##     Percent of sample -- NK cells
##                             <num>
##  1:                     0.7346282
##  2:                     0.6941291
##  3:                     0.7861856
##  4:                     0.4048345
##  5:                     0.5861879
##  6:                     0.9609487
##  7:                    12.7298050
##  8:                     8.4950410
##  9:                     8.2380149
## 10:                     7.7403537
## 11:                    13.4936582
## 12:                    10.3020295

        as.matrix(names(sum.dat))

##       [,1]                                    
##  [1,] "Sample"                                
##  [2,] "Group"                                 
##  [3,] "Batch"                                 
##  [4,] "Cells per sample -- CD4 T cells"       
##  [5,] "Cells per sample -- CD8 T cells"       
##  [6,] "Cells per sample -- Infil Macrophages" 
##  [7,] "Cells per sample -- Microglia"         
##  [8,] "Cells per sample -- Neutrophils"       
##  [9,] "Cells per sample -- NK cells"          
## [10,] "Exp CD11b_asinh -- CD4 T cells"        
## [11,] "Exp CD11b_asinh -- CD8 T cells"        
## [12,] "Exp CD11b_asinh -- Infil Macrophages"  
## [13,] "Exp CD11b_asinh -- Microglia"          
## [14,] "Exp CD11b_asinh -- Neutrophils"        
## [15,] "Exp CD11b_asinh -- NK cells"           
## [16,] "Exp Ly6C_asinh -- CD4 T cells"         
## [17,] "Exp Ly6C_asinh -- CD8 T cells"         
## [18,] "Exp Ly6C_asinh -- Infil Macrophages"   
## [19,] "Exp Ly6C_asinh -- Microglia"           
## [20,] "Exp Ly6C_asinh -- Neutrophils"         
## [21,] "Exp Ly6C_asinh -- NK cells"            
## [22,] "Percent of sample -- CD4 T cells"      
## [23,] "Percent of sample -- CD8 T cells"      
## [24,] "Percent of sample -- Infil Macrophages"
## [25,] "Percent of sample -- Microglia"        
## [26,] "Percent of sample -- Neutrophils"      
## [27,] "Percent of sample -- NK cells"

        annot.cols <- c(group.col, batch.col)

Specify which columns we want to plot.

        plot.cols <- names(sum.dat)[c(4:27)]
        plot.cols

##  [1] "Cells per sample -- CD4 T cells"       
##  [2] "Cells per sample -- CD8 T cells"       
##  [3] "Cells per sample -- Infil Macrophages" 
##  [4] "Cells per sample -- Microglia"         
##  [5] "Cells per sample -- Neutrophils"       
##  [6] "Cells per sample -- NK cells"          
##  [7] "Exp CD11b_asinh -- CD4 T cells"        
##  [8] "Exp CD11b_asinh -- CD8 T cells"        
##  [9] "Exp CD11b_asinh -- Infil Macrophages"  
## [10] "Exp CD11b_asinh -- Microglia"          
## [11] "Exp CD11b_asinh -- Neutrophils"        
## [12] "Exp CD11b_asinh -- NK cells"           
## [13] "Exp Ly6C_asinh -- CD4 T cells"         
## [14] "Exp Ly6C_asinh -- CD8 T cells"         
## [15] "Exp Ly6C_asinh -- Infil Macrophages"   
## [16] "Exp Ly6C_asinh -- Microglia"           
## [17] "Exp Ly6C_asinh -- Neutrophils"         
## [18] "Exp Ly6C_asinh -- NK cells"            
## [19] "Percent of sample -- CD4 T cells"      
## [20] "Percent of sample -- CD8 T cells"      
## [21] "Percent of sample -- Infil Macrophages"
## [22] "Percent of sample -- Microglia"        
## [23] "Percent of sample -- Neutrophils"      
## [24] "Percent of sample -- NK cells"

Reorder the data such that sample appear in the specify group order.

    ### Reorder summary data and SAVE
        
        sum.dat <- do.reorder(sum.dat, group.col, grp.order)
        sum.dat[,c(1:3)]

##         Sample  Group  Batch
##         <fctr> <fctr> <char>
##  1: 01_Mock_01   Mock      A
##  2: 02_Mock_02   Mock      B
##  3: 03_Mock_03   Mock      B
##  4: 04_Mock_04   Mock      A
##  5: 05_Mock_05   Mock      A
##  6: 06_Mock_06   Mock      B
##  7:  07_WNV_01    WNV      A
##  8:  08_WNV_02    WNV      B
##  9:  09_WNV_03    WNV      A
## 10:  10_WNV_04    WNV      A
## 11:  11_WNV_05    WNV      B
## 12:  12_WNV_06    WNV      A

        fwrite(sum.dat, 'sum.dat.csv')

Graphs

We can use the run.autograph function to create violin/scatter plots with embedded statistic – one per population/measurement type.

    ### Autographs

        for(i in plot.cols){
            
            measure <- gsub("\\ --.*", "", i)
            measure
            
            pop <- gsub("^[^--]*.-- ", "", i)
            pop
            
            make.autograph(sum.dat,
                           x.axis = group.col,
                           y.axis = i,
                           y.axis.label = measure,
                           
                           grp.order = grp.order,
                           my_comparisons = comparisons,
                           
                           Variance_test = variance.test,
                           Pairwise_test = pairwise.test,
                           
                           title = pop,
                           subtitle = measure,
                           filename = paste0(i, '.pdf'))
            
        }

Heatmaps

We can also create a global heatmap show the z-score of each population/measurement type against each sample.

    ### Create a fold change heatmap
        
        ## Z-score calculation
        sum.dat.z <- do.zscore(sum.dat, plot.cols)
        
        ## Group 
        t.first <- match(grp.order, sum.dat.z[[group.col]])
        t.first <- t.first -1
        t.first

## [1] 0 6

        ## Make heatmap
        make.pheatmap(sum.dat.z, 
                      sample.col = sample.col, 
                      plot.cols = paste0(plot.cols, '_zscore'), 
                      is.fold = TRUE, 
                      plot.title = 'Z-score',
                      annot.cols = annot.cols,
                      dendrograms = 'column',
                      row.sep = t.first,
                      cutree_cols = 3)

8. Save session info

#######################################################################################################
#### 8. Output session info
#######################################################################################################

For the final step of our setup, we want to record the session info our R session, and save this in a folder we’ll call “Output-info”.

   ### Session info and metadata
        
        setwd(OutputDirectory)
        dir.create("Output - info", showWarnings = FALSE)
        setwd("Output - info")

        sink(file = "session_info.txt", append=TRUE, split=FALSE, type = c("output", "message"))
        sessionInfo()
        sink()

        write(cellular.cols, "cellular.cols.txt")
        write(cluster.cols, "cluster.cols.txt")