SESSION 1 - Intro to R (10:30 am)

Session #1: Getting started with cytometry analysis using Spectre and R

Tue 26-Oct, 10:30 am – 12:00 pm AEDT

Lead instructors: Givanna Putri, Felix Marsh-Wakefield

In this session, instructors will introduce attendees to methods of high-dimensional analysis with R/RStudio. Specifically we will use the R package Spectre, and will explore computational approaches such as clustering (e.g. using FlowSOM) and dimensionality reduction (e.g. using tSNE or UMAP).

ZOOM

Hands on practical 3

For this mini session, we will install packages from CRAN, Bioconductor, and GitHub.

Firstly, we will install the data.table package from CRAN by running the following code:

install.packages("data.table", repos="https://cran.csiro.au/")

##
## The downloaded binary packages are in
##  /var/folders/j2/7y5xp8610kgf0jbx202by2200000gn/T//RtmpB8uMsH/downloaded_packages

For bioconductor, you need to install BiocManager first from CRAN:

install.packages("BiocManager", repos="https://cran.csiro.au/")

##
## The downloaded binary packages are in
##  /var/folders/j2/7y5xp8610kgf0jbx202by2200000gn/T//RtmpB8uMsH/downloaded_packages

Then use the install function from BiocManager to install the desired package. We will install FlowSOM package to be used in later session for clustering.

BiocManager::install("FlowSOM")

To install packages from Github, you need to first install the remotes package:

install.packages("remotes", repos="https://cran.csiro.au/")

##
## The downloaded binary packages are in
##  /var/folders/j2/7y5xp8610kgf0jbx202by2200000gn/T//RtmpB8uMsH/downloaded_packages

For this example, we will install the Spectre package which will be used in subsequent sessions for analysing COVID-19 samples. To do this, you need to first locate the github repository where the package is stored. For Spectre, it is: https://github.com/ImmuneDynamics/Spectre. Note, the repository you need to supply is whatever comes after github.com.

remotes::install_github("immunedynamics/Spectre")

Hands on practical 4

For this mini session, we will perform some basic data manipulation using R and Spectre package.

1. Setting working directory

What is a working directory? A default location in your computer your R script will be working from, e.g. loading data or exporting data. Use the following setwd function to set your working directory:

setwd("/Users/givanna/Documents/")

If you want to create a directory within your current working directory, use dir.create function:

dir.create("ASI_workshop_session1")

We will set our working directory to the new directory we just created:

setwd("ASI_workshop_session1/")

Your working directory now should be in the following directory: /Users/givanna/Documents/ASI_workshop_session1. To check, run the following function:

getwd()

## [1] "/Users/givanna/Documents/ASI_workshop_session1"

TIP

If you press ctrl + space, you can get Rstudio to autocomplete a path or a function.

2. Loading Spectre package

This step assumes you have successfully installed Spectre package. If not, please refer to the previous hands-on mini session. Run the following library function to load Spectre package.

library(Spectre)

3. Read in some data

Here, we will try to read in a demo dataset stored in CSV files. You can download the dataset from Spectre's GitHub repository: https://github.com/ImmuneDynamics/Spectre/tree/master/workflows/Classic%20workflows/data

Download the files and save them in your working directory.

The following function is handy to see what files we have in the current working directory:

list.files()

##  [1] "ASI workshop_Day1_R_intro.Rmd"  "ASI_workshop_session1"
##  [3] "ASI-workshop_Day1_R_intro.html" "CNS_Mock_01.csv"
##  [5] "CNS_Mock_02.csv"                "CNS_Mock_03.csv"
##  [7] "CNS_Mock_04.csv"                "CNS_Mock_05.csv"
##  [9] "CNS_Mock_06.csv"                "CNS_WNV_D7_01.csv"
## [11] "CNS_WNV_D7_02.csv"              "CNS_WNV_D7_03.csv"
## [13] "CNS_WNV_D7_04.csv"              "CNS_WNV_D7_05.csv"
## [15] "CNS_WNV_D7_06.csv"              "Session 1.Rmd"
## [17] "Session-1.html"                 "Session-1.Rmd"

Now, we will use the read.files function in Spectre to read in the CSV files and store them in dat variable.

dat <- read.files(file.loc = getwd(), file.type = ".csv")

## Loading required package: data.table

TIP

To read fcs file, change the file.type to .fcs

The function read.files will read each file in the file.loc directory as a data.table and store them in a list

class(dat)

## [1] "list"

names(dat)

##  [1] "CNS_Mock_01"   "CNS_Mock_02"   "CNS_Mock_03"   "CNS_Mock_04"
##  [5] "CNS_Mock_05"   "CNS_Mock_06"   "CNS_WNV_D7_01" "CNS_WNV_D7_02"
##  [9] "CNS_WNV_D7_03" "CNS_WNV_D7_04" "CNS_WNV_D7_05" "CNS_WNV_D7_06"

Our samples are split into 12 files. The read.files function read in each file and store them in a list. We can merge all of them into 1 big data.table for ease of processing using do.merge.files function:

dat <- do.merge.files(dat)

Check what is the type of variable dat:

class(dat)

## [1] "data.table" "data.frame"

It should say data.frame and data.table which is essentially a table like structure.

4. Investigate the data

Let’s look at the content of the data. First, let’s show the first 10 rows in the data:

head(dat, n = 10)

##         NK11        CD3     CD45       Ly6G    CD11b      B220     CD8a
##  1:  42.3719  40.098700  6885.08  -344.7830 14787.30  -40.2399  83.7175
##  2:  42.9586 119.014000  1780.29  -429.6650  5665.73   86.6673  34.7219
##  3:  59.2366 206.238000 10248.30 -1603.8400 19894.30  427.8310 285.8800
##  4: 364.9480  -0.233878  3740.04  -815.9800  9509.43  182.4200 333.6050
##  5: 440.2470  40.035200  9191.38    40.5055  5745.82 -211.6940 149.2200
##  6: 151.5890 124.525000  4256.17  -596.1300 12200.80   94.0770 109.3110
##  7: -25.0254 114.917000  4206.72  -787.1680 12227.70  183.8690 195.4950
##  8:  91.3912  15.326300  3673.56  -246.3870  7040.34  384.5510 117.1330
##  9:  56.4442 -38.378600  9129.14  -422.0990 23569.60  435.3240 196.7540
## 10: 206.7270 216.563000  2970.59   -42.0700  7393.80  191.8660  23.6561
##          Ly6C      CD4    FileName FileNo
##  1:  958.7000  711.072 CNS_Mock_01      1
##  2:  448.2590  307.272 CNS_Mock_01      1
##  3: 1008.8300  707.094 CNS_Mock_01      1
##  4:  440.0710  249.784 CNS_Mock_01      1
##  5:   87.4815  867.570 CNS_Mock_01      1
##  6:  417.4010  352.982 CNS_Mock_01      1
##  7:  245.7640  436.935 CNS_Mock_01      1
##  8:  504.8190  438.310 CNS_Mock_01      1
##  9:  389.0720 1714.390 CNS_Mock_01      1
## 10:   97.2920  689.716 CNS_Mock_01      1

NOTE: The n parameter denotes the number of rows to show.

TIP

You can also see the entire data content by clicking the “table” icon in the environment tab on top right panel.

A handy function to see what markers/columns we have in the data:

names(dat)

##  [1] "NK11"     "CD3"      "CD45"     "Ly6G"     "CD11b"    "B220"
##  [7] "CD8a"     "Ly6C"     "CD4"      "FileName" "FileNo"

We have a fair few markers as well as a column denoting the files each cell come from (FileName and FileNo). Let’s say we want don’t want to see the FileName and FileNo column in our dataset. How do we do that? First we store the column names we want to keep in a vector called markers_cols:

markers_cols <- names(dat)[1:9]
markers_cols

## [1] "NK11"  "CD3"   "CD45"  "Ly6G"  "CD11b" "B220"  "CD8a"  "Ly6C"  "CD4"

Then we simply subset the data and store it in another variable:

dat_marker_only <- dat[, ..markers_cols]

Let’s inspect it:

head(dat_marker_only)

##        NK11        CD3     CD45       Ly6G    CD11b      B220     CD8a
## 1:  42.3719  40.098700  6885.08  -344.7830 14787.30  -40.2399  83.7175
## 2:  42.9586 119.014000  1780.29  -429.6650  5665.73   86.6673  34.7219
## 3:  59.2366 206.238000 10248.30 -1603.8400 19894.30  427.8310 285.8800
## 4: 364.9480  -0.233878  3740.04  -815.9800  9509.43  182.4200 333.6050
## 5: 440.2470  40.035200  9191.38    40.5055  5745.82 -211.6940 149.2200
## 6: 151.5890 124.525000  4256.17  -596.1300 12200.80   94.0770 109.3110
##         Ly6C     CD4
## 1:  958.7000 711.072
## 2:  448.2590 307.272
## 3: 1008.8300 707.094
## 4:  440.0710 249.784
## 5:   87.4815 867.570
## 6:  417.4010 352.982

Let’s pretend we have forgotten how many samples/files we had previously and want to retrieve it back:

samples <- dat$FileName

The above command will show you the sample each file come from, but we’re only interested in what samples we have, so we use the unique function to see all the distinct samples we have:

unique(samples)

##  [1] "CNS_Mock_01"   "CNS_Mock_02"   "CNS_Mock_03"   "CNS_Mock_04"
##  [5] "CNS_Mock_05"   "CNS_Mock_06"   "CNS_WNV_D7_01" "CNS_WNV_D7_02"
##  [9] "CNS_WNV_D7_03" "CNS_WNV_D7_04" "CNS_WNV_D7_05" "CNS_WNV_D7_06"

Say we’re instered in knowing the mean of CD3 and NK11 expression

cd3_mean <- mean(dat$CD3)
nk11_mean <- mean(dat$NK11)

cd3_mean

## [1] 318.6597

nk11_mean

## [1] 333.1011

TIP

You can refer to the environment panel to see their values!

Say we want to sum them up and store them:

cd3_plus_nk11 <- cd3_mean + nk11_mean
cd3_plus_nk11

## [1] 651.7608

Say you want to add the expression of Ly6G and CD11b for each cell and append the value as a new column. First, we store it in a variable:

ly6g_plus_cd11b <- dat$Ly6G + dat$CD11b

Then append it as a column

dat$ly6g_plus_cd11b <- ly6g_plus_cd11b

Let’s inspect just the ly6G, cd11b and the new sum column

head(dat[, c("Ly6G", "CD11b", "ly6g_plus_cd11b")])

##          Ly6G    CD11b ly6g_plus_cd11b
## 1:  -344.7830 14787.30       14442.517
## 2:  -429.6650  5665.73        5236.065
## 3: -1603.8400 19894.30       18290.460
## 4:  -815.9800  9509.43        8693.450
## 5:    40.5055  5745.82        5786.325
## 6:  -596.1300 12200.80       11604.670

Say we want to isolate cells with Ly6G > 0 and B220 > 0

dat_subset <- dat[dat$Ly6G > 0 & dat$B220 > 0,]
head(dat_subset)

##        NK11       CD3     CD45      Ly6G    CD11b    B220      CD8a       Ly6C
## 1: -58.7602 224.69600  4845.85  281.1300 10309.40 384.199   12.9359   277.5870
## 2: -40.5028 125.56200 30759.30 7375.7400 21262.20 142.793  103.3740 15781.8000
## 3: 250.6860 210.81500  8046.83  326.3870 14379.60 412.146   29.7603   203.6220
## 4:  79.6307 520.37600  1221.33  893.9670  5292.79 899.481  210.5850  1888.4400
## 5:  78.7477 352.02300  2397.58  599.0960  6634.97 315.061 -103.1890   339.0870
## 6: -47.4769  -9.92212  2567.36   54.9505  6874.72 100.574   72.5257   -54.5288
##          CD4    FileName FileNo ly6g_plus_cd11b
## 1:  828.3490 CNS_Mock_01      1       10590.530
## 2:  111.9670 CNS_Mock_01      1       28637.940
## 3: 1074.2200 CNS_Mock_01      1       14705.987
## 4: 2146.3700 CNS_Mock_01      1        6186.757
## 5: 1503.0800 CNS_Mock_01      1        7234.066
## 6:   99.6476 CNS_Mock_01      1        6929.671

The new subset won’t have any cells with negative Ly6G or B220

min(dat_subset$Ly6G)

## [1] 0.0697798

min(dat$Ly6G)

## [1] -88512.1

min(dat_subset$B220)

## [1] 0.0164425

min(dat$B220)

## [1] -75712.5

5. Save the subsetted data as CSV or FCS file.

We will use Spectre’s write.files function.

TIP

If you forget what the write.files function looks like, you can ask Rstudio! Or if you don’t even remember what the function name is, look it up using the “Packages” tab at the bottom right panel.

?write.files

NOTE: This function relies on the flowCore package to write out FCS file. If you don’t have it installed, we can install it using Bioconductor then load it.

BiocManager::install("flowCore")

## Bioconductor version 3.13 (BiocManager 1.30.16), R 4.1.1 (2021-08-10)

## Warning: package(s) not installed when version(s) same as current; use `force = TRUE` to
##   re-install: 'flowCore'

## Old packages: 'deldir', 'rlang', 'Seurat'

library("flowCore")

Then run the write.files function:

write.files(dat = dat_subset,
            file.prefix = "demo_data_subset",
            write.csv = TRUE,
            write.fcs = TRUE)

ASI Masterclass - Workshop Session 1

Thomas Ashhurst, Givanna Putri, Felix Marsh-Wakefield, Jennifer Habel, Wuji Zhang

26/10/2021