There two options for using Spectre:
In Spectre v1.1 and above we have removed the package
dependencies rgeos
and rgdal
as these are no
longer available on CRAN. The package should install fine without these
dependencies, but some spatial functions may not work properly. If
required, one can download the archived packages, unzip them, and then
placed them in the R library location.
Download and install the latest version of R (from CRAN) and R Studio.
There are two ways to install Spectre, using pak
or
remotes
package. You can try installing using
pak
first, and if that does not work, you can use the
remotes
package. Instructions for both are available
below.
Important: If you are unfamiliar with running R code using Rstudio, please visit our Basics guide section first.
Run the following R code to install Spectre using the pak R package.
# Installs the package 'pak' if required
if (!requireNamespace("pak", quietly = TRUE)) {
install.packages("pak")
}
# Install the Spectre package
pak::pkg_install("immunedynamics/Spectre")
When running the first command to install the pak
package for the first time, it might ask you to select which CRAN mirror
to use. Pick any mirror closest to where you are, enter the number, and
press enter.
When running the second command, Pak will show a list of dependency packages it has to install before it can install Spectre. The list may look like the following:
→ Will install 52 packages.
→ Will update 1 package.
→ Will download 49 CRAN packages (68.97 MB), cached: 4 (0 B).
+ ALL 1.48.0 👷🏼♂️
+ car 3.1-3 ⬇ (1.52 MB)
+ carData 3.0-5 ⬇ (1.83 MB)
+ colorRamps 2.3.4 ⬇ (28.21 kB)
+ ConsensusClusterPlus 1.70.0
+ corrplot 0.95 ⬇ (3.83 MB)
+ crosstalk 1.2.1 ⬇ (407.91 kB)
+ dendextend 1.19.0 ⬇ (5.06 MB)
+ dendsort 0.3.4 ⬇ (1.17 MB)
+ Deriv 4.2.0 ⬇ (153.06 kB)
+ doBy 4.7.0 ⬇ (4.91 MB)
+ DT 0.33 ⬇ (1.79 MB)
+ ellipse 0.5.0 ⬇ (218.52 kB)
+ emmeans 1.11.2 ⬇ (2.23 MB)
+ estimability 1.5.1 ⬇ (49.22 kB)
+ FactoMineR 2.11 🔧 ⬇ (3.82 MB)
+ flashClust 1.01-2 🔧 ⬇ (25.21 kB)
+ FlowSOM 2.14.0
+ FNN 1.1.4.1 🔧 ⬇ (130.43 kB)
+ ggforce 0.5.0 🔧 ⬇ (2.67 MB)
+ ggnewscale 0.5.2 ⬇ (355.68 kB)
+ ggpointdensity 0.2.0 🔧 ⬇ (4.37 MB)
+ ggpubr 0.6.1 ⬇ (2.13 MB)
+ ggrepel 0.9.6 🔧 ⬇ (624.15 kB)
+ ggsci 3.2.0 ⬇ (2.39 MB)
+ ggsignif 0.6.4 ⬇ (603.29 kB)
+ ggthemes 5.1.0 ⬇ (448.64 kB)
+ httpuv 1.6.16 🔧 ⬇ (2.78 MB)
+ later 1.4.2 🔧 ⬇ (786.21 kB)
+ lazyeval 0.2.2 🔧 ⬇ (162.22 kB)
+ leaps 3.2 🔧 ⬇ (104.19 kB)
+ MatrixModels 0.5-4 ⬇ (409.70 kB)
+ microbenchmark 1.5.0 🔧 ⬇ (72.58 kB)
+ modelr 0.1.11 ⬇ (203.70 kB)
+ multcompView 0.1-10 ⬇ (114.76 kB)
+ mvtnorm 1.3-3 🔧 ⬇ (1.00 MB)
+ pbkrtest 0.5.4 ⬇ (221.05 kB)
+ pheatmap 1.0.13 ⬇ (78.22 kB)
+ polyclip 1.10-7 🔧 ⬇ (440.78 kB)
+ polynom 1.4-1 ⬇ (402.59 kB)
+ promises 1.3.3 🔧 ⬇ (1.86 MB)
+ quantreg 6.1 🔧 ⬇ (1.55 MB)
+ RcppAnnoy 0.0.22 🔧 ⬇ (1.34 MB)
+ reshape2 1.4.4 🔧 ⬇ (332.09 kB)
+ rstatix 0.7.2 ⬇ (615.55 kB)
+ scattermore 1.2 🔧 ⬇ (389.60 kB)
+ scatterplot3d 0.3-44 ⬇ (348.86 kB)
+ SparseM 1.84-2 🔧 ⬇ (942.60 kB)
+ Spectre 2.0.0-00 → 1.2.0 👷🏾♀️🔧 (GitHub: 9bcdb3b)
+ systemfonts 1.2.3 🔧 ⬇ (7.23 MB)
+ tweenr 2.0.3 🔧 ⬇ (974.49 kB)
+ uwot 0.2.3 🔧 ⬇ (3.90 MB)
+ XML 3.99-0.18 🔧 ⬇ (1.94 MB)
? Do you want to continue (Y/n)
To continue type letter Y
and enter.
You should see a progress bar and messages showing the installation progress. Wait until it finishes.
If you have trouble installing using the pak
package
above, you can use remotes
package to do the installation
instead.
# Installs the package 'remotes' if required
if (!requireNamespace("remotes", quietly = TRUE)) {
install.packages("remotes")
}
# Install the Spectre package
remotes::install_github("immunedynamics/Spectre")
R might ask you to update some packages. In general we recommend updating packages – to do this, enter 1. However, you might wish to delay this if you are in the middle of an analysis project.
Some packages may have two different versions, compiled and not compiled. Rstudio will ask whether you want to install the version that require compilation.
If you want, then enter Yes
so it will download the
version that need to be compiled, compile it, then install. However
this may require extra softwares installed on your computer to run
properly.
It is ok to also enter No
so it will install the version
that has been precompiled. This is often easier.
However, some packages must be compiled on your computer to run, like so:
In this case, we recommend you specify Yes
and let the
computer do the compiling.
If the installation was successful, you should see something similar to the following:
To check whether Spectre has been installed successfully, run the following command:
if (requireNamespace("Spectre", quietly = TRUE)) {
message("Spectre is installed")
} else {
message("Spectre is NOT installed")
}
## Spectre is installed
If you get the message “Spectre is installed”, then it has successfully been installed.
Spectre is available as a Docker container, thanks to the efforts of Dr. Givanna Putri. Docker is a cloud service that allows anyone to deliver software packages as a “complete unit” in a form of container. What this means is that the software package is released as a standalone computing environment, pre-installed with necessary pre-requisite libraries required by the software, and pre-configured for user convenience. Spectre’s Docker image will match the most recent versions of Spectre (denoted as ‘master’, referring to the master branch in Github). To download this version, simply follow the instructions below. If you wish to load a specific version of Spectre, you can specify a version (e.g. v0.5.3, v0.5.4. etc) instead of using ‘master’. You can see a list of versions available via docker on this page.
Go to the following address: https://www.docker.com/products/docker-desktop
Download the correct version of Docker Desktop.
Go to your downloads folder and open the ‘.dmg’ file.
When the following window opens, drag and drop the ‘Docker.app’ icon into the ‘Applications’ shortcut.
You should now be able to find ‘Docker’ in your applications folder.
Open Docker app from the applications folder or bar at the bottom of the screen.
Once you turn it on, you should see the following icon in the bar at the top of the screen.
Additionally the program itself should open. Wait a few moments while the ‘Docker Engine’ starts.
The icon in the lower left will turn green when ready.
Click the gears icon to open ‘preferences’. Make sure you untick the options in General to mimic the screenshot below. This is to ensure Docker doesn’t start when you boot your computer, and it doesn’t share your usage data to Docker by default. Leave the automatic update checked if you prefer it to check for updates regularly.
Press ‘Apply & Restart’ when done. Close app until you would like to use it.
Open the Docker app.
Wait a few moments while the ‘Docker Engine’ starts.
The icon in the lower left will turn green when ready.
Open ‘Terminal’ (on Mac, can be found in the applications folder, or the bar at the bottom of your screen).
Copy the following code into the terminal and press ENTER.
docker run --rm -e PASSWORD=spectre -p 8787:8787 -v ~:/home/rstudio/spectre_dir --name=spectre immunedynamics/spectre:master
By default, the RStudio session that Docker will launch will be able to see everything in your Home directory and below. This is achieved in the code above using the ‘~’ symbol. If you have data elsewhere (on a server etc) you can navigate to that location using Finger and drag + drop the target directory/folder into terminal:
In terminal, add the following segments, and then hit ENTER:
docker run --rm -e PASSWORD=spectre -p 8787:8787 -v
:/home/rstudio/spectre_dir --name=spectre immunedynamics/spectre:master
Open your preferred web browser (chrome, safari, etc) and go to the following address
Sign in
rstudio
spectre
Open desired analysis script
Perform analysis
Close when finished
Here provide a brief and high-level introduction to using R, RStudio, and Spectre. Additional educational material on using R and R Studio are available on many sites, including the RStudio education site or this R Spatial page.
To interact with the R programming language, we recommend using RStudio.
Open RStudio, and you should see something similar to the following:
There are two important types of text commonly found in R scripts:
Comments Any line in R code that starts with a
#
is considered a comment. These are not executed by
RStudio as R code, but rather are used as notes to the user.
This is a comment:
## Run the following line to find your current working directory
Executable code A line or segment of code can be run
and will return some form of result. In the example below, the
getwd()
function will return the location of the current
working directory.
This is the code:
getwd()
When the code is run, the output may look something like this:
[1] "/Users/Tom/Desktop"
To get started, create a new .R file and save it
For this demo we will use the ‘iris’ dataset, which consists of measurements of 150 flowers. Each row represents one flower, and each column represents a different measurement of that flower.
To run code in RStudio, we can either enter code into the script and selectively run elements of the code (preferred), or we can enter it directly into the console and run the code. For each of the code-blocks below, copy the code into your new script, press save, and then highlight and press CMD/CTRL return to execute the code
Copy the following into your script, save, then highlight the code and press CMD/CTRL return. The first command we will run is to load the ‘iris’ dataset and save it as the object ‘dat’. The lines starting with ‘#’ are only comments, and will not excute as commands (even if you select them and press CMD + return).
## Part 1: read the dataset
# Use the 'iris' dataset (150 flowers one per row) with various measurement (each column is a different measurement)
dat <- iris
After executing, you should should see a new object in the workspace (top right). This will be called ‘dat’, containing 150 observations, and 5 variables.
dat
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5.0 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## 11 5.4 3.7 1.5 0.2 setosa
## 12 4.8 3.4 1.6 0.2 setosa
## 13 4.8 3.0 1.4 0.1 setosa
## 14 4.3 3.0 1.1 0.1 setosa
## 15 5.8 4.0 1.2 0.2 setosa
## 16 5.7 4.4 1.5 0.4 setosa
## 17 5.4 3.9 1.3 0.4 setosa
## 18 5.1 3.5 1.4 0.3 setosa
## 19 5.7 3.8 1.7 0.3 setosa
## 20 5.1 3.8 1.5 0.3 setosa
## 21 5.4 3.4 1.7 0.2 setosa
## 22 5.1 3.7 1.5 0.4 setosa
## 23 4.6 3.6 1.0 0.2 setosa
## 24 5.1 3.3 1.7 0.5 setosa
## 25 4.8 3.4 1.9 0.2 setosa
## 26 5.0 3.0 1.6 0.2 setosa
## 27 5.0 3.4 1.6 0.4 setosa
## 28 5.2 3.5 1.5 0.2 setosa
## 29 5.2 3.4 1.4 0.2 setosa
## 30 4.7 3.2 1.6 0.2 setosa
## 31 4.8 3.1 1.6 0.2 setosa
## 32 5.4 3.4 1.5 0.4 setosa
## 33 5.2 4.1 1.5 0.1 setosa
## 34 5.5 4.2 1.4 0.2 setosa
## 35 4.9 3.1 1.5 0.2 setosa
## 36 5.0 3.2 1.2 0.2 setosa
## 37 5.5 3.5 1.3 0.2 setosa
## 38 4.9 3.6 1.4 0.1 setosa
## 39 4.4 3.0 1.3 0.2 setosa
## 40 5.1 3.4 1.5 0.2 setosa
## 41 5.0 3.5 1.3 0.3 setosa
## 42 4.5 2.3 1.3 0.3 setosa
## 43 4.4 3.2 1.3 0.2 setosa
## 44 5.0 3.5 1.6 0.6 setosa
## 45 5.1 3.8 1.9 0.4 setosa
## 46 4.8 3.0 1.4 0.3 setosa
## 47 5.1 3.8 1.6 0.2 setosa
## 48 4.6 3.2 1.4 0.2 setosa
## 49 5.3 3.7 1.5 0.2 setosa
## 50 5.0 3.3 1.4 0.2 setosa
## 51 7.0 3.2 4.7 1.4 versicolor
## 52 6.4 3.2 4.5 1.5 versicolor
## 53 6.9 3.1 4.9 1.5 versicolor
## 54 5.5 2.3 4.0 1.3 versicolor
## 55 6.5 2.8 4.6 1.5 versicolor
## 56 5.7 2.8 4.5 1.3 versicolor
## 57 6.3 3.3 4.7 1.6 versicolor
## 58 4.9 2.4 3.3 1.0 versicolor
## 59 6.6 2.9 4.6 1.3 versicolor
## 60 5.2 2.7 3.9 1.4 versicolor
## 61 5.0 2.0 3.5 1.0 versicolor
## 62 5.9 3.0 4.2 1.5 versicolor
## 63 6.0 2.2 4.0 1.0 versicolor
## 64 6.1 2.9 4.7 1.4 versicolor
## 65 5.6 2.9 3.6 1.3 versicolor
## 66 6.7 3.1 4.4 1.4 versicolor
## 67 5.6 3.0 4.5 1.5 versicolor
## 68 5.8 2.7 4.1 1.0 versicolor
## 69 6.2 2.2 4.5 1.5 versicolor
## 70 5.6 2.5 3.9 1.1 versicolor
## 71 5.9 3.2 4.8 1.8 versicolor
## 72 6.1 2.8 4.0 1.3 versicolor
## 73 6.3 2.5 4.9 1.5 versicolor
## 74 6.1 2.8 4.7 1.2 versicolor
## 75 6.4 2.9 4.3 1.3 versicolor
## 76 6.6 3.0 4.4 1.4 versicolor
## 77 6.8 2.8 4.8 1.4 versicolor
## 78 6.7 3.0 5.0 1.7 versicolor
## 79 6.0 2.9 4.5 1.5 versicolor
## 80 5.7 2.6 3.5 1.0 versicolor
## 81 5.5 2.4 3.8 1.1 versicolor
## 82 5.5 2.4 3.7 1.0 versicolor
## 83 5.8 2.7 3.9 1.2 versicolor
## 84 6.0 2.7 5.1 1.6 versicolor
## 85 5.4 3.0 4.5 1.5 versicolor
## 86 6.0 3.4 4.5 1.6 versicolor
## 87 6.7 3.1 4.7 1.5 versicolor
## 88 6.3 2.3 4.4 1.3 versicolor
## 89 5.6 3.0 4.1 1.3 versicolor
## 90 5.5 2.5 4.0 1.3 versicolor
## 91 5.5 2.6 4.4 1.2 versicolor
## 92 6.1 3.0 4.6 1.4 versicolor
## 93 5.8 2.6 4.0 1.2 versicolor
## 94 5.0 2.3 3.3 1.0 versicolor
## 95 5.6 2.7 4.2 1.3 versicolor
## 96 5.7 3.0 4.2 1.2 versicolor
## 97 5.7 2.9 4.2 1.3 versicolor
## 98 6.2 2.9 4.3 1.3 versicolor
## 99 5.1 2.5 3.0 1.1 versicolor
## 100 5.7 2.8 4.1 1.3 versicolor
## 101 6.3 3.3 6.0 2.5 virginica
## 102 5.8 2.7 5.1 1.9 virginica
## 103 7.1 3.0 5.9 2.1 virginica
## 104 6.3 2.9 5.6 1.8 virginica
## 105 6.5 3.0 5.8 2.2 virginica
## 106 7.6 3.0 6.6 2.1 virginica
## 107 4.9 2.5 4.5 1.7 virginica
## 108 7.3 2.9 6.3 1.8 virginica
## 109 6.7 2.5 5.8 1.8 virginica
## 110 7.2 3.6 6.1 2.5 virginica
## 111 6.5 3.2 5.1 2.0 virginica
## 112 6.4 2.7 5.3 1.9 virginica
## 113 6.8 3.0 5.5 2.1 virginica
## 114 5.7 2.5 5.0 2.0 virginica
## 115 5.8 2.8 5.1 2.4 virginica
## 116 6.4 3.2 5.3 2.3 virginica
## 117 6.5 3.0 5.5 1.8 virginica
## 118 7.7 3.8 6.7 2.2 virginica
## 119 7.7 2.6 6.9 2.3 virginica
## 120 6.0 2.2 5.0 1.5 virginica
## 121 6.9 3.2 5.7 2.3 virginica
## 122 5.6 2.8 4.9 2.0 virginica
## 123 7.7 2.8 6.7 2.0 virginica
## 124 6.3 2.7 4.9 1.8 virginica
## 125 6.7 3.3 5.7 2.1 virginica
## 126 7.2 3.2 6.0 1.8 virginica
## 127 6.2 2.8 4.8 1.8 virginica
## 128 6.1 3.0 4.9 1.8 virginica
## 129 6.4 2.8 5.6 2.1 virginica
## 130 7.2 3.0 5.8 1.6 virginica
## 131 7.4 2.8 6.1 1.9 virginica
## 132 7.9 3.8 6.4 2.0 virginica
## 133 6.4 2.8 5.6 2.2 virginica
## 134 6.3 2.8 5.1 1.5 virginica
## 135 6.1 2.6 5.6 1.4 virginica
## 136 7.7 3.0 6.1 2.3 virginica
## 137 6.3 3.4 5.6 2.4 virginica
## 138 6.4 3.1 5.5 1.8 virginica
## 139 6.0 3.0 4.8 1.8 virginica
## 140 6.9 3.1 5.4 2.1 virginica
## 141 6.7 3.1 5.6 2.4 virginica
## 142 6.9 3.1 5.1 2.3 virginica
## 143 5.8 2.7 5.1 1.9 virginica
## 144 6.8 3.2 5.9 2.3 virginica
## 145 6.7 3.3 5.7 2.5 virginica
## 146 6.7 3.0 5.2 2.3 virginica
## 147 6.3 2.5 5.0 1.9 virginica
## 148 6.5 3.0 5.2 2.0 virginica
## 149 6.2 3.4 5.4 2.3 virginica
## 150 5.9 3.0 5.1 1.8 virginica
Next we will review the dimensions of ‘dat’ (how many rows and columns) and preview data from the first 6 rows of dat.
Copy the following into your script, save, then highlight the code and press CMD/CTRL return. You should now see the following in the console. Lines starting with ‘>’ denote the commands that were executed. Lines without ‘>’ are the output. As you can see below the request to show the dimensions of our dataset using dim(dat) has given us 150 rows and 5 columns.
# Determine the number of rows and columns in the dataset
dim(dat)
## [1] 150 5
Copy the following into your script, save, then highlight the code and press CMD/CTRL return. You should now see the following in the console. Lines starting with ‘>’ denote the commands that were executed. Lines without ‘>’ are the output. The request to preview the first 6 rows of our data using head(dat) has shown us the contents of the first 6 rows.
# Examine the first few lines of dataset
head(dat)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
Next, we will plot some of the dataset. Copy the following into your script, save, then highlight the code and press CMD/CTRL return. After executing, your should see the following under ‘Plots’.
## Part 2: plot the dataset
# Plot iris dataset (all plots)
plot(dat)
To be a little more specific, let’s try plotting one column of the dataset against another. Copy the following into your script, save, then highlight the code and press CMD/CTRL return. Now we should see a plot of the sepal width vs length.
# Plot iris dataset (chosen X and Y parameters)
plot(x = dat$Sepal.Length, y = dat$Sepal.Width)
Now, let’s save the dataset as a .csv file. A .csv file is kind of like an .xlsx file, without the bells and whistles. Data in a table format is saved, using commas to indicate the separation of new columns. When this is read by excel or RStudio, it displays a table. Run the following lines to determine the current working directory (where you will read files from and write files to).
Copy the following into your script, save, then highlight the code and press CMD/CTRL return.
## Part 3: save the dataset
# Determine the current working directory
getwd()
This will return the location of your current working directory. In my case:
[1] "/Users/thomasa"
Let’s aim to save the CSV to our desktop. To do this we would have to a) change the ‘working directory’ to the desktop (on a mac, it would look something like “/Users/Tom/Desktop”). When we set a working directory, we are telling R a) where to look for files when we ask it to, and b) where to create files when we ask it to.
To set the working directory, type setwd() into the script but don’t run it yet.
setwd()
Finding a specific directory (absolute path)
If I now select this line, or highlight the code and press ENTER, R will set the working directory:
setwd("/Users/thomasa/Desktop/")
The following should be returned
> setwd("/Users/thomasa/Desktop/")
Now I can check the working directory has been set correctly by running the following:
getwd()
If everything has gone correctly, the following (or equivalent) should be returned:
[1] "/Users/thomasa/Desktop/"
Now we will write the dataset to a .csv file (which will be saved in the working directory). We will use the function ‘write.csv’. The input variables here are what dataset we want to write (x = dat) and what we want to call the file (file = “iris_dataset.csv”).
Execute the following, and check the folder (set as your working directory) to see that the new file has been created.
# Write a .csv file of the dataset
write.csv(x = dat, file = "iris_dataset.csv")
To interact with the Spectre package in R, we will use RStudio.
To use Spectre, we first need to load the ‘Spectre’ package, as well as other relevant packages. To do this, follow the instructions below.
## Load the Spectre packages from library
library('Spectre')
If successful:
>
if unsuccessful:
> Error in library("Spectre") : there is no package called ‘Spectre’
Rather than having to load each individual package required one-by-one (library(‘plyr’), library(‘data.table’) etc), we have created two functions to simplify this process:
packages.check()
will check if all the required
packages are installedpackage.load()
will load all the required packages## Check if the other required packages are installed
Spectre::package.check()
## Load the required packages
Spectre::package.load()
As each package is loaded, you will see the following:
> Loading required package: PACKAGENAME
So far you should have the following code in your script:
Normally this would be the location of the files you would like to analyse. For now, you can just this as your desktop or similar. If you aren’t sure how to search for directories, please have a look at our basic R tutorial.
## Set working directory
setwd("/Users/thomasa/Desktop")
## Check that it has been set correctly
getwd()
## Save the working directory as an object called 'Primary Directory'
PrimaryDirectory <- getwd()
Now we will create a directory where we can save the data and plots we will generate shortly.
## Create an output directory
dir.create("Spectre-demo-output")
## Go to that directory and save it as an object called 'Output Directory'
setwd("Spectre-demo-output")
getwd()
OutputDirectory <- getwd()
## Finally, set the current working directory to 'PrimaryDirectory'
setwd(PrimaryDirectory)
Normally, we load some CSV or FCS files from the disk, or a server into R for analysis. In this tutorial, we will skip this step, and use an included demo dataset called ‘demo.start’. This is a dataset of 120,000 cells, from 12 bone marrow samples: 6x from mock-infected mice, and 6x West Nile virus (WNV)-infected mice. This amounts to 10,000 cells per sample. The data set is structured as a large data.frame (a table) where each column is a cellular marker (e.g. FITC-CD4, etc), and each row is a cell.
Assign the included demo.start dataset to a new object we will call cell.dat.
cell.dat <- Spectre::demo.clustered
cell.dat <- cell.dat[,1:19]
cell.dat <- Spectre::do.subsample(dat = cell.dat, targets = 10000)
## Loading required package: data.table
You can review the structure of cell.dat by using str(). You can see the cell.dat is both a ‘data.table’ and a ‘data.frame’.
str(cell.dat)
## Classes 'data.table' and 'data.frame': 10000 obs. of 19 variables:
## $ FileName : chr "CNS_Mock_05.csv" "CNS_Mock_04.csv" "CNS_WNV_D7_02.csv" "CNS_WNV_D7_01.csv" ...
## $ NK11 : num 106 133 -178 889 237 ...
## $ CD3 : num 235.1 16.5 1603.4 441.9 143.3 ...
## $ CD45 : num 13575 9534 131586 99332 7806 ...
## $ Ly6G : num -179 -673 -6522 -3229 -605 ...
## $ CD11b : num 31386 22061 1163 40168 11658 ...
## $ B220 : num -169 185 893 -623 659 ...
## $ CD8a : num -394.7 282.5 -40.3 -764.2 107.7 ...
## $ Ly6C : num 381 608 2727 64542 805 ...
## $ CD4 : num 509.46 1095.47 -5.04 2345.11 1937.42 ...
## $ NK11_asinh : num 0.106 0.132 -0.177 0.8 0.235 ...
## $ CD3_asinh : num 0.233 0.0165 1.2508 0.4287 0.1428 ...
## $ CD45_asinh : num 3.3 2.95 5.57 5.29 2.75 ...
## $ Ly6G_asinh : num -0.178 -0.631 -2.574 -1.888 -0.573 ...
## $ CD11b_asinh: num 4.14 3.787 0.992 4.386 3.151 ...
## $ B220_asinh : num -0.169 0.184 0.804 -0.588 0.619 ...
## $ CD8a_asinh : num -0.3851 0.2789 -0.0403 -0.7044 0.1075 ...
## $ Ly6C_asinh : num 0.372 0.576 1.728 4.861 0.737 ...
## $ CD4_asinh : num 0.48966 0.9473 -0.00504 1.58812 1.41529 ...
## - attr(*, ".internal.selfref")=<externalptr>
You can review the dimensionality of cell.dat by using dim(). The first entry returned is the number of rows, and the second is the number of columns.
dim(cell.dat)
## [1] 10000 19
You can review the first 6 rows (out of the 10,000 rows) of cell.dat by using head(). Each column is a marker or cellular feature, and each row is a cell.
head(cell.dat)
Now let’s set some preferences.
## Look at the names of the columns in the dataset, and take note of the number of each column
as.matrix(names(cell.dat))
## [,1]
## [1,] "FileName"
## [2,] "NK11"
## [3,] "CD3"
## [4,] "CD45"
## [5,] "Ly6G"
## [6,] "CD11b"
## [7,] "B220"
## [8,] "CD8a"
## [9,] "Ly6C"
## [10,] "CD4"
## [11,] "NK11_asinh"
## [12,] "CD3_asinh"
## [13,] "CD45_asinh"
## [14,] "Ly6G_asinh"
## [15,] "CD11b_asinh"
## [16,] "B220_asinh"
## [17,] "CD8a_asinh"
## [18,] "Ly6C_asinh"
## [19,] "CD4_asinh"
Now we can choose the number of each column that we want to use for clustering, rather than having to write out each column name. To do this, we can put the number of the columns in a vector (i.e. c(5,6,8) for columns 5, 6, and 8) within the function below. You can replace these with the column numbers you would prefer to use (if you leave it as c(5,6,8), then the columns used for clustering will be CD117, CD16/32, and CD115.
## Save the column names that you wish to use for clustering as an object called 'cluster.cols'.
cluster.cols <- names(cell.dat)[c(11:19)]
We can check to make sure the names have been saved by running ‘cluster.cols’.
as.matrix(cluster.cols)
## [,1]
## [1,] "NK11_asinh"
## [2,] "CD3_asinh"
## [3,] "CD45_asinh"
## [4,] "Ly6G_asinh"
## [5,] "CD11b_asinh"
## [6,] "B220_asinh"
## [7,] "CD8a_asinh"
## [8,] "Ly6C_asinh"
## [9,] "CD4_asinh"
Now we can perform out clustering and dimensionality reduction. First we are going to cluster the data using FlowSOM.
We can use the function ‘run.flowsom’ to run FlowSOM on our ‘cell.dat’ dataset. For more information on performing clustering in Spectre, see this page. There are two key arguments we need to provide to the function. The first is ‘dat’, or the dataset to be used. The second is ‘use.cols’, which is the columns to be used for clustering. In this case, we want to set dat to cell.dat, and use.cols to cluster.cols (which we just created).
## Run FlowSOM
cell.dat <- Spectre::run.flowsom(dat = cell.dat, use.cols = cluster.cols)
As the clustering is running, you will see the following red button show up on your RStudio window. That means that RStudio is in the middle of processing something, and it won’t respond to other commands while it is working.
While FlowSOM runs, you will progressively see the three following updates:
Creating SOM
Mapping data to SOM
Creating MST
Once FlowSOM has finished (and the red button has gone away) you can check the data to ensure the FlowSOM columns have been added correctly. At the end of what’s returned, your should see the FlowSOM metaclusters and clusters added to the dataset
# Check cell.dat to ensure FlowSOM data correctly attached -- by looking at the last two columns
cell.dat
## FileName NK11 CD3 CD45 Ly6G CD11b
## <char> <num> <num> <num> <num> <num>
## 1: CNS_Mock_05.csv 106.0150 235.1390 13574.70 -179.318 31385.80
## 2: CNS_Mock_04.csv 132.7940 16.5279 9533.69 -673.468 22061.20
## 3: CNS_WNV_D7_02.csv -178.4230 1603.4400 131586.00 -6522.110 1162.98
## 4: CNS_WNV_D7_01.csv 888.5810 441.9300 99331.80 -3228.530 40167.60
## 5: CNS_Mock_04.csv 236.6740 143.3300 7805.94 -605.170 11658.20
## ---
## 9996: CNS_Mock_05.csv 88.8495 -16.0294 9019.21 -553.415 15242.10
## 9997: CNS_WNV_D7_06.csv 161.5100 -100.0990 27818.10 -2519.940 23801.70
## 9998: CNS_Mock_04.csv 178.9050 42.6089 3812.18 -1143.190 9142.55
## 9999: CNS_WNV_D7_02.csv 412.7230 106.0890 40488.70 -1155.170 23287.20
## 10000: CNS_Mock_04.csv 141.8560 -53.2798 3731.28 -432.442 9666.66
## B220 CD8a Ly6C CD4 NK11_asinh CD3_asinh
## <num> <num> <num> <num> <num> <num>
## 1: -169.4210 -394.6730 380.647 509.45900 0.10581741 0.23302438
## 2: 184.8820 282.5120 608.245 1095.47000 0.13240678 0.01652715
## 3: 893.3750 -40.3012 2726.790 -5.03532 -0.17748963 1.25080512
## 4: -623.0090 -764.1700 64542.500 2345.11000 0.80035513 0.42867938
## 5: 658.9580 107.7280 805.445 1937.42000 0.23451837 0.14284373
## ---
## 9996: 19.2891 266.9110 592.160 682.64800 0.08873301 -0.01602871
## 9997: 231.2610 -57.8610 4093.780 311.10100 0.16081594 -0.09993259
## 9998: 148.6150 379.1590 332.479 695.47900 0.17796412 0.04259602
## 9999: -411.0680 -129.9800 15257.500 459.92300 0.40182226 0.10589100
## 10000: 617.3400 100.3510 286.046 859.43200 0.14138449 -0.05325462
## CD45_asinh Ly6G_asinh CD11b_asinh B220_asinh CD8a_asinh Ly6C_asinh
## <num> <num> <num> <num> <num> <num>
## 1: 3.302709 -0.1783707 4.1397564 -0.1686208 -0.38508479 0.3720071
## 2: 2.950718 -0.6307953 3.7874809 0.1838446 0.27888287 0.5758821
## 3: 5.572822 -2.5741711 0.9920561 0.8039345 -0.04029030 1.7283160
## 4: 5.291638 -1.8883390 4.3863628 -0.5884546 -0.70444465 4.8605311
## 5: 2.752110 -0.5732531 3.1509913 0.6187197 0.10752071 0.7369144
## ---
## 9996: 2.895563 -0.5284706 3.4182829 0.0192879 0.26383930 0.5620905
## 9997: 4.019157 -1.6546113 3.8633452 0.2292477 -0.05782876 2.1172101
## 9998: 2.048123 -0.9790900 2.9090641 0.1480733 0.37061613 0.3266396
## 9999: 4.394323 -0.9869543 3.8415117 -0.4002920 -0.12961676 3.4192906
## 10000: 2.027390 -0.4199859 2.9644947 0.5836368 0.10018333 0.2822822
## CD4_asinh FlowSOM_cluster FlowSOM_metacluster
## <num> <num> <int>
## 1: 0.489656167 146 3
## 2: 0.947296284 147 3
## 3: -0.005035299 155 6
## 4: 1.588118198 1 1
## 5: 1.415293384 134 3
## ---
## 9996: 0.638393323 147 3
## 9997: 0.306289484 91 1
## 9998: 0.648958885 136 3
## 9999: 0.445081804 74 1
## 10000: 0.778407337 178 3
Now we can perform dimensionality reduction on our data for visualisation. For this we are going to use UMAP. For more information on dimensionality reduction and cytometry data, please see this page. There are two key arguments we need to provide to the function. The first is ‘dat’, or the dataset to be used. The second is ‘use.cols’, which is the columns to be used for clustering. In this case, we want to set dat to cell.dat, and use.cols to cluster.cols (which we just created). UMAP by default doesn’t provide progress updates. It might take 1-2 minutes for UMAP to finish running.
## Run UMAP
cell.dat <- Spectre::run.umap(dat = cell.dat, use.cols = cluster.cols)
## 11:28:00 UMAP embedding parameters a = 1.577 b = 0.8951
## 11:28:00 Converting dataframe to numerical matrix
## 11:28:00 Read 10000 rows and found 9 numeric columns
## 11:28:00 Using Annoy for neighbor search, n_neighbors = 15
## 11:28:00 Building Annoy index with metric = euclidean, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 11:28:01 Writing NN index file to temp file /var/folders/7q/lqx0fzgn1zqfmlc5tx1c0n0r0004z0/T//RtmpaRtPes/file14e733fd040f3
## 11:28:01 Searching Annoy index using 13 threads, search_k = 1500
## 11:28:01 Annoy recall = 100%
## 11:28:01 Commencing smooth kNN distance calibration using 13 threads with target n_neighbors = 15
## 11:28:01 Initializing from normalized Laplacian + noise (using irlba)
## 11:28:01 Commencing optimization for 200 epochs, with 200182 positive edges using 13 threads
## 11:28:01 Using rng type: pcg
## 11:28:01 Optimization finished
Once UMAP has finished (and the red button has gone away) you can check the data to ensure the UMAP columns have been added correctly.
## Check cell.dat to ensure the two new UMAP columns have been correctly attached.
cell.dat
## FileName NK11 CD3 CD45 Ly6G CD11b
## <char> <num> <num> <num> <num> <num>
## 1: CNS_Mock_05.csv 106.0150 235.1390 13574.70 -179.318 31385.80
## 2: CNS_Mock_04.csv 132.7940 16.5279 9533.69 -673.468 22061.20
## 3: CNS_WNV_D7_02.csv -178.4230 1603.4400 131586.00 -6522.110 1162.98
## 4: CNS_WNV_D7_01.csv 888.5810 441.9300 99331.80 -3228.530 40167.60
## 5: CNS_Mock_04.csv 236.6740 143.3300 7805.94 -605.170 11658.20
## ---
## 9996: CNS_Mock_05.csv 88.8495 -16.0294 9019.21 -553.415 15242.10
## 9997: CNS_WNV_D7_06.csv 161.5100 -100.0990 27818.10 -2519.940 23801.70
## 9998: CNS_Mock_04.csv 178.9050 42.6089 3812.18 -1143.190 9142.55
## 9999: CNS_WNV_D7_02.csv 412.7230 106.0890 40488.70 -1155.170 23287.20
## 10000: CNS_Mock_04.csv 141.8560 -53.2798 3731.28 -432.442 9666.66
## B220 CD8a Ly6C CD4 NK11_asinh CD3_asinh
## <num> <num> <num> <num> <num> <num>
## 1: -169.4210 -394.6730 380.647 509.45900 0.10581741 0.23302438
## 2: 184.8820 282.5120 608.245 1095.47000 0.13240678 0.01652715
## 3: 893.3750 -40.3012 2726.790 -5.03532 -0.17748963 1.25080512
## 4: -623.0090 -764.1700 64542.500 2345.11000 0.80035513 0.42867938
## 5: 658.9580 107.7280 805.445 1937.42000 0.23451837 0.14284373
## ---
## 9996: 19.2891 266.9110 592.160 682.64800 0.08873301 -0.01602871
## 9997: 231.2610 -57.8610 4093.780 311.10100 0.16081594 -0.09993259
## 9998: 148.6150 379.1590 332.479 695.47900 0.17796412 0.04259602
## 9999: -411.0680 -129.9800 15257.500 459.92300 0.40182226 0.10589100
## 10000: 617.3400 100.3510 286.046 859.43200 0.14138449 -0.05325462
## CD45_asinh Ly6G_asinh CD11b_asinh B220_asinh CD8a_asinh Ly6C_asinh
## <num> <num> <num> <num> <num> <num>
## 1: 3.302709 -0.1783707 4.1397564 -0.1686208 -0.38508479 0.3720071
## 2: 2.950718 -0.6307953 3.7874809 0.1838446 0.27888287 0.5758821
## 3: 5.572822 -2.5741711 0.9920561 0.8039345 -0.04029030 1.7283160
## 4: 5.291638 -1.8883390 4.3863628 -0.5884546 -0.70444465 4.8605311
## 5: 2.752110 -0.5732531 3.1509913 0.6187197 0.10752071 0.7369144
## ---
## 9996: 2.895563 -0.5284706 3.4182829 0.0192879 0.26383930 0.5620905
## 9997: 4.019157 -1.6546113 3.8633452 0.2292477 -0.05782876 2.1172101
## 9998: 2.048123 -0.9790900 2.9090641 0.1480733 0.37061613 0.3266396
## 9999: 4.394323 -0.9869543 3.8415117 -0.4002920 -0.12961676 3.4192906
## 10000: 2.027390 -0.4199859 2.9644947 0.5836368 0.10018333 0.2822822
## CD4_asinh FlowSOM_cluster FlowSOM_metacluster UMAP_X UMAP_Y
## <num> <num> <int> <num> <num>
## 1: 0.489656167 146 3 3.201487 2.7689815
## 2: 0.947296284 147 3 2.368746 2.2072628
## 3: -0.005035299 155 6 -1.037489 -3.9368465
## 4: 1.588118198 1 1 -6.735621 0.8203820
## 5: 1.415293384 134 3 2.497291 1.1667060
## ---
## 9996: 0.638393323 147 3 3.076637 2.0855365
## 9997: 0.306289484 91 1 -2.264018 -0.6218252
## 9998: 0.648958885 136 3 3.731015 -0.1667122
## 9999: 0.445081804 74 1 -3.604877 0.6061344
## 10000: 0.778407337 178 3 4.686792 0.4920845
Now that we have run FlowSOM and UMAP, we want to do a quick visual check to make sure everything looks correct. To do this we are going to create a ‘factor’ plot – a dot plot with our two UMAP columns as the X and Y axis, and the FlowSOM_metacluster as the colour. We are going to add the labels of each cluster to the plot, and we will tell the function not to save the image to disk. Running this command should generate a plot in the viewer window in RStudio.
## Make a 'factor' plot
Spectre::make.colour.plot(dat = cell.dat,
x.axis = "UMAP_X",
y.axis = "UMAP_Y",
col.axis = "FlowSOM_metacluster",
col.type = 'factor',
add.label = TRUE,
save.to.disk = FALSE)
Now that we have added the cluster and UMAP information to our data, we should save the files and capture our progress.
First, let’s set our working directory to ‘OutputDirectory’, so the data goes to the right place.
## Set working directory to OutputDirectory
setwd(OutputDirectory)
getwd()
## [1] "/Users/putri.g/Documents/GitHub/ImmuneDynamics.github.io/spectre/protocols/Install"
## Save CSV files
Spectre::write.files(dat = cell.dat,
file.prefix = "Sample_CSV_file",
write.csv = TRUE,
write.fcs = FALSE)
To further explore this data in FlowJo, let’s also save some FCS files.
## Save FCS files
Spectre::write.files(dat = cell.dat,
file.prefix = "Sample_FCS_file",
write.csv = FALSE,
write.fcs = TRUE)
Now we should create some informative plots.
First we will make another factor plot of the FlowSOM metaclusters, but this time we will set ‘save.to.disk’ to TRUE. Once this has been run, check your working directory for the image.
## Make a 'factor' plot coloured by cluster
Spectre::make.colour.plot(dat = cell.dat,
x.axis = "UMAP_X",
y.axis = "UMAP_Y",
col.axis = "FlowSOM_metacluster",
col.type = 'factor',
add.label = TRUE)
Next we’ll make a colour plot showing the expression of a specific marker
This page explains the initial data analysis and preparation steps for different forms of cytometry data, and how to export data from FlowJo in preparation for analysis with Spectre. The overall objective is to export a population of interest (e.g. leukocytes), usually following the removal of doublets, debris, dead cells, and any irrelevant cells.
Population of interest (POI) gating
Gate to your ‘population of interest’ (POI). Typically this will be ‘live cells’, or potentially live CD45+ cells (live leukocytes).
Select the POI you wish to export, and then click ‘Select Equivalent Nodes’ in the ‘Edit’ space.
… this will select each POI gates in each samples.
Right click on any of the POI populations (as long as all have been selected) and select ’Export / Concatenate Populations’
Exporting data as an FCS or CSV file
Spectre can import data as FCS or CSV files.
Choose a format for export:
By default we suggest exporting CSV (scale value) files
You can also export the data as FCS files
If you wish, you can export the ‘channel values’ as a CSV file instead. The channel-values are pre-transformed, which removes the requirement for an arcsinh transformation in R. See this page for more information.
Under ‘Parameters’, select ‘Custom set of parameters’, and select the parameters you wish to export. In this case, select FSC-A and SSC-A, as well as the compensated (Comp-…) parameters.
Make sure to select the compensated (‘Comp-…’) parameters.
Choose a location for the export.
Result:
To make the analysis a little easier, we usually create a file that contains relevant metadata for each file (e.g. sample name, group name, batch, etc). This allows us to add that sample information to each cell (row) in the data.table in R, making it easy to navigate, filter, and plot the data by any factor (group, batch, etc). If you have cell count for your files, this can be added here as well.
For most of our workflows, within the folder you are using for your analysis there will be:
The R script
A ‘data’ folder, and
A ‘metadata’ folder
Using Microsoft Excel (or similar), create a new file and save it as a CSV file in the metadata folder
On a Mac, select the files, right click and select ‘copy’ (or press CMD + C).
In the ‘sample.details.csv’ file, name the first column ‘Filename’ (A1), then in A2 right click and select ‘paste’ (or press CMD + V). This will paste the filenames into the CSV file.
On Windows: select files, press CTRL + A, then paste into excel. Use find and replace to remove the full file path (see this video for a demonstration).
You can then add as much information relevant to each file that you like. Sample, Group, and Batch are ‘required’ for most of the Spectre workflows (they aren’t actually required, but it makes it easier to use the default scripts). If all you samples are from one batch, just enter ‘1’ or ‘A’ (or some other batch name) into each row under ‘batch’. If you would like to add other information (time point, infection, treatment, etc) then feel free to.
“Sample” is a recommended column, as this can be a more simplified name for each sample
“Group” is extremely useful for most analyses
“Batch” is helpful if you have prepared, stained, or run samples in multiple batches. If all you samples are from one batch, just enter ‘1’ or ‘A’ (or some other batch name) into each row under ‘batch’.
“Cells per sample” is a useful column to add if you intend to generate absolute counts of each population per sample during the generation of summary data, but is not required otherwise.
If you would like to add other information (time point, infection, treatment, etc) then feel free to.
You are now ready to get started with Spectre. Check out our workflows on the Spectre Home page. The Simple Discovery Workflow is a great place to get started.
When you are ready to start analysis, check out our structured workflows and tutorials on the following pages:
If Spectre or any of the dependencies was not installed successfully, you may see some error messages like the following.
Alternatively, if Spectre was successfully installed, but when
running package.check()
some of the dependencies were not
been installed correctly, you will see something like the following:
Checking dependency packages...
-- Biobase is required but not installed. Please install from BioConductor.
Check out 'https://immunedynamics.github.io/spectre/getting-started/' for help with installation
In this case, try install the offending packages independently. Note whether the packages need to be installed from CRAN or Bioconductor. Alternatively, you can report an issue on Github, ask for help on our discussion board, or email us.