Session #1: Getting started with cytometry analysis using Spectre and R Tue 26-Oct, 10:30 am – 12:00 pm AEDT Lead instructors: Givanna Putri, Felix Marsh-Wakefield In this session, instructors will introduce attendees to methods of high-dimensional analysis with R/RStudio. Specifically we will use the R package Spectre, and will explore computational approaches such as clustering (e.g. using FlowSOM) and dimensionality reduction (e.g. using tSNE or UMAP).
|
For this mini session, we will install packages from CRAN, Bioconductor, and GitHub.
Firstly, we will install the data.table package from CRAN by running the following code:
install.packages("data.table", repos="https://cran.csiro.au/")
##
## The downloaded binary packages are in
## /var/folders/j2/7y5xp8610kgf0jbx202by2200000gn/T//RtmpB8uMsH/downloaded_packages
For bioconductor, you need to install BiocManager first from CRAN:
install.packages("BiocManager", repos="https://cran.csiro.au/")
##
## The downloaded binary packages are in
## /var/folders/j2/7y5xp8610kgf0jbx202by2200000gn/T//RtmpB8uMsH/downloaded_packages
Then use the install function from BiocManager to install the desired package. We will install FlowSOM package to be used in later session for clustering.
BiocManager::install("FlowSOM")
To install packages from Github, you need to first install the remotes package:
install.packages("remotes", repos="https://cran.csiro.au/")
##
## The downloaded binary packages are in
## /var/folders/j2/7y5xp8610kgf0jbx202by2200000gn/T//RtmpB8uMsH/downloaded_packages
For this example, we will install the Spectre package which will be used in subsequent sessions for analysing COVID-19 samples. To do this, you need to first locate the github repository where the package is stored. For Spectre, it is: https://github.com/ImmuneDynamics/Spectre. Note, the repository you need to supply is whatever comes after github.com.
remotes::install_github("immunedynamics/Spectre")
For this mini session, we will perform some basic data manipulation using R and Spectre package.
What is a working directory? A default location in your computer your R script will be working from, e.g. loading data or exporting data. Use the following setwd function to set your working directory:
setwd("/Users/givanna/Documents/")
If you want to create a directory within your current working directory, use dir.create function:
dir.create("ASI_workshop_session1")
We will set our working directory to the new directory we just created:
setwd("ASI_workshop_session1/")
Your working directory now should be in the following directory: /Users/givanna/Documents/ASI_workshop_session1. To check, run the following function:
getwd()
## [1] "/Users/givanna/Documents/ASI_workshop_session1"
TIP
If you press ctrl + space, you can get Rstudio to autocomplete a path or a function.
This step assumes you have successfully installed Spectre package. If not, please refer to the previous hands-on mini session. Run the following library function to load Spectre package.
library(Spectre)
Here, we will try to read in a demo dataset stored in CSV files. You can download the dataset from Spectre's GitHub repository: https://github.com/ImmuneDynamics/Spectre/tree/master/workflows/Classic%20workflows/data
Download the files and save them in your working directory.
The following function is handy to see what files we have in the current working directory:
list.files()
## [1] "ASI workshop_Day1_R_intro.Rmd" "ASI_workshop_session1"
## [3] "ASI-workshop_Day1_R_intro.html" "CNS_Mock_01.csv"
## [5] "CNS_Mock_02.csv" "CNS_Mock_03.csv"
## [7] "CNS_Mock_04.csv" "CNS_Mock_05.csv"
## [9] "CNS_Mock_06.csv" "CNS_WNV_D7_01.csv"
## [11] "CNS_WNV_D7_02.csv" "CNS_WNV_D7_03.csv"
## [13] "CNS_WNV_D7_04.csv" "CNS_WNV_D7_05.csv"
## [15] "CNS_WNV_D7_06.csv" "Session 1.Rmd"
## [17] "Session-1.html" "Session-1.Rmd"
Now, we will use the read.files function in Spectre to read in the CSV files and store them in dat variable.
dat <- read.files(file.loc = getwd(), file.type = ".csv")
## Loading required package: data.table
TIP
To read fcs file, change the file.type to .fcs
The function read.files will read each file in the file.loc directory as a data.table and store them in a list
class(dat)
## [1] "list"
names(dat)
## [1] "CNS_Mock_01" "CNS_Mock_02" "CNS_Mock_03" "CNS_Mock_04"
## [5] "CNS_Mock_05" "CNS_Mock_06" "CNS_WNV_D7_01" "CNS_WNV_D7_02"
## [9] "CNS_WNV_D7_03" "CNS_WNV_D7_04" "CNS_WNV_D7_05" "CNS_WNV_D7_06"
Our samples are split into 12 files. The read.files function read in each file and store them in a list. We can merge all of them into 1 big data.table for ease of processing using do.merge.files function:
dat <- do.merge.files(dat)
Check what is the type of variable dat:
class(dat)
## [1] "data.table" "data.frame"
It should say data.frame and data.table which is essentially a table like structure.
Let’s look at the content of the data. First, let’s show the first 10 rows in the data:
head(dat, n = 10)
## NK11 CD3 CD45 Ly6G CD11b B220 CD8a
## 1: 42.3719 40.098700 6885.08 -344.7830 14787.30 -40.2399 83.7175
## 2: 42.9586 119.014000 1780.29 -429.6650 5665.73 86.6673 34.7219
## 3: 59.2366 206.238000 10248.30 -1603.8400 19894.30 427.8310 285.8800
## 4: 364.9480 -0.233878 3740.04 -815.9800 9509.43 182.4200 333.6050
## 5: 440.2470 40.035200 9191.38 40.5055 5745.82 -211.6940 149.2200
## 6: 151.5890 124.525000 4256.17 -596.1300 12200.80 94.0770 109.3110
## 7: -25.0254 114.917000 4206.72 -787.1680 12227.70 183.8690 195.4950
## 8: 91.3912 15.326300 3673.56 -246.3870 7040.34 384.5510 117.1330
## 9: 56.4442 -38.378600 9129.14 -422.0990 23569.60 435.3240 196.7540
## 10: 206.7270 216.563000 2970.59 -42.0700 7393.80 191.8660 23.6561
## Ly6C CD4 FileName FileNo
## 1: 958.7000 711.072 CNS_Mock_01 1
## 2: 448.2590 307.272 CNS_Mock_01 1
## 3: 1008.8300 707.094 CNS_Mock_01 1
## 4: 440.0710 249.784 CNS_Mock_01 1
## 5: 87.4815 867.570 CNS_Mock_01 1
## 6: 417.4010 352.982 CNS_Mock_01 1
## 7: 245.7640 436.935 CNS_Mock_01 1
## 8: 504.8190 438.310 CNS_Mock_01 1
## 9: 389.0720 1714.390 CNS_Mock_01 1
## 10: 97.2920 689.716 CNS_Mock_01 1
NOTE: The n parameter denotes the number of rows to show.
TIP
You can also see the entire data content by clicking the “table” icon in the environment tab on top right panel.
A handy function to see what markers/columns we have in the data:
names(dat)
## [1] "NK11" "CD3" "CD45" "Ly6G" "CD11b" "B220"
## [7] "CD8a" "Ly6C" "CD4" "FileName" "FileNo"
We have a fair few markers as well as a column denoting the files each cell come from (FileName and FileNo). Let’s say we want don’t want to see the FileName and FileNo column in our dataset. How do we do that? First we store the column names we want to keep in a vector called markers_cols:
markers_cols <- names(dat)[1:9]
markers_cols
## [1] "NK11" "CD3" "CD45" "Ly6G" "CD11b" "B220" "CD8a" "Ly6C" "CD4"
Then we simply subset the data and store it in another variable:
dat_marker_only <- dat[, ..markers_cols]
Let’s inspect it:
head(dat_marker_only)
## NK11 CD3 CD45 Ly6G CD11b B220 CD8a
## 1: 42.3719 40.098700 6885.08 -344.7830 14787.30 -40.2399 83.7175
## 2: 42.9586 119.014000 1780.29 -429.6650 5665.73 86.6673 34.7219
## 3: 59.2366 206.238000 10248.30 -1603.8400 19894.30 427.8310 285.8800
## 4: 364.9480 -0.233878 3740.04 -815.9800 9509.43 182.4200 333.6050
## 5: 440.2470 40.035200 9191.38 40.5055 5745.82 -211.6940 149.2200
## 6: 151.5890 124.525000 4256.17 -596.1300 12200.80 94.0770 109.3110
## Ly6C CD4
## 1: 958.7000 711.072
## 2: 448.2590 307.272
## 3: 1008.8300 707.094
## 4: 440.0710 249.784
## 5: 87.4815 867.570
## 6: 417.4010 352.982
Let’s pretend we have forgotten how many samples/files we had previously and want to retrieve it back:
samples <- dat$FileName
The above command will show you the sample each file come from, but we’re only interested in what samples we have, so we use the unique function to see all the distinct samples we have:
unique(samples)
## [1] "CNS_Mock_01" "CNS_Mock_02" "CNS_Mock_03" "CNS_Mock_04"
## [5] "CNS_Mock_05" "CNS_Mock_06" "CNS_WNV_D7_01" "CNS_WNV_D7_02"
## [9] "CNS_WNV_D7_03" "CNS_WNV_D7_04" "CNS_WNV_D7_05" "CNS_WNV_D7_06"
Say we’re instered in knowing the mean of CD3 and NK11 expression
cd3_mean <- mean(dat$CD3)
nk11_mean <- mean(dat$NK11)
cd3_mean
## [1] 318.6597
nk11_mean
## [1] 333.1011
TIP
You can refer to the environment panel to see their values!
Say we want to sum them up and store them:
cd3_plus_nk11 <- cd3_mean + nk11_mean
cd3_plus_nk11
## [1] 651.7608
Say you want to add the expression of Ly6G and CD11b for each cell and append the value as a new column. First, we store it in a variable:
ly6g_plus_cd11b <- dat$Ly6G + dat$CD11b
Then append it as a column
dat$ly6g_plus_cd11b <- ly6g_plus_cd11b
Let’s inspect just the ly6G, cd11b and the new sum column
head(dat[, c("Ly6G", "CD11b", "ly6g_plus_cd11b")])
## Ly6G CD11b ly6g_plus_cd11b
## 1: -344.7830 14787.30 14442.517
## 2: -429.6650 5665.73 5236.065
## 3: -1603.8400 19894.30 18290.460
## 4: -815.9800 9509.43 8693.450
## 5: 40.5055 5745.82 5786.325
## 6: -596.1300 12200.80 11604.670
Say we want to isolate cells with Ly6G > 0 and B220 > 0
dat_subset <- dat[dat$Ly6G > 0 & dat$B220 > 0,]
head(dat_subset)
## NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C
## 1: -58.7602 224.69600 4845.85 281.1300 10309.40 384.199 12.9359 277.5870
## 2: -40.5028 125.56200 30759.30 7375.7400 21262.20 142.793 103.3740 15781.8000
## 3: 250.6860 210.81500 8046.83 326.3870 14379.60 412.146 29.7603 203.6220
## 4: 79.6307 520.37600 1221.33 893.9670 5292.79 899.481 210.5850 1888.4400
## 5: 78.7477 352.02300 2397.58 599.0960 6634.97 315.061 -103.1890 339.0870
## 6: -47.4769 -9.92212 2567.36 54.9505 6874.72 100.574 72.5257 -54.5288
## CD4 FileName FileNo ly6g_plus_cd11b
## 1: 828.3490 CNS_Mock_01 1 10590.530
## 2: 111.9670 CNS_Mock_01 1 28637.940
## 3: 1074.2200 CNS_Mock_01 1 14705.987
## 4: 2146.3700 CNS_Mock_01 1 6186.757
## 5: 1503.0800 CNS_Mock_01 1 7234.066
## 6: 99.6476 CNS_Mock_01 1 6929.671
The new subset won’t have any cells with negative Ly6G or B220
min(dat_subset$Ly6G)
## [1] 0.0697798
min(dat$Ly6G)
## [1] -88512.1
min(dat_subset$B220)
## [1] 0.0164425
min(dat$B220)
## [1] -75712.5
We will use Spectre’s write.files function.
TIP
If you forget what the write.files function looks like, you can ask Rstudio! Or if you don’t even remember what the function name is, look it up using the “Packages” tab at the bottom right panel.
?write.files
NOTE: This function relies on the flowCore package to write out FCS file. If you don’t have it installed, we can install it using Bioconductor then load it.
BiocManager::install("flowCore")
## Bioconductor version 3.13 (BiocManager 1.30.16), R 4.1.1 (2021-08-10)
## Warning: package(s) not installed when version(s) same as current; use `force = TRUE` to
## re-install: 'flowCore'
## Old packages: 'deldir', 'rlang', 'Seurat'
library("flowCore")
Then run the write.files function:
write.files(dat = dat_subset,
file.prefix = "demo_data_subset",
write.csv = TRUE,
write.fcs = TRUE)