Spectre is an R package and computational toolkit that enables comprehensive end-to-end integration, exploration, and analysis of high-dimensional cytometry or imaging data from different batches or experiments. Spectre streamlines the analytical stages of raw data pre-processing, batch alignment, data integration, clustering, dimensionality reduction, visualisation and population labelling, as well as quantitative and statistical analysis; with a simple, clear, and modular design of analysis workflows, that can be utilised by data and laboratory scientists.
Spectre v1 uses a simple table structure to store data, based on the data.table
framework. Here cells = rows, and markers/genes = columns (as is the convention for cytometry data). For example, this demo dataset consists of 169004 cells, with 9 cellular markers, from 12 samples:
library('Spectre')
cell.dat <- Spectre::demo.start
cell.dat
## FileName NK11 CD3 CD45 Ly6G CD11b
## 1: CNS_Mock_01.csv 42.3719 40.098700 6885.08 -344.7830 14787.30
## 2: CNS_Mock_01.csv 42.9586 119.014000 1780.29 -429.6650 5665.73
## 3: CNS_Mock_01.csv 59.2366 206.238000 10248.30 -1603.8400 19894.30
## 4: CNS_Mock_01.csv 364.9480 -0.233878 3740.04 -815.9800 9509.43
## 5: CNS_Mock_01.csv 440.2470 40.035200 9191.38 40.5055 5745.82
## ---
## 169000: CNS_WNV_D7_06.csv 910.8890 72.856100 31466.20 -316.5570 28467.80
## 169001: CNS_WNV_D7_06.csv -10.2642 64.188700 45188.00 -540.5140 22734.00
## 169002: CNS_WNV_D7_06.csv -184.2910 -9.445650 11842.60 -97.9383 17237.00
## 169003: CNS_WNV_D7_06.csv 248.3860 229.986000 32288.20 -681.1630 19255.80
## 169004: CNS_WNV_D7_06.csv 738.9810 95.470300 46185.10 -1004.6000 22957.80
## B220 CD8a Ly6C CD4
## 1: -40.2399 83.7175 958.7000 711.0720
## 2: 86.6673 34.7219 448.2590 307.2720
## 3: 427.8310 285.8800 1008.8300 707.0940
## 4: 182.4200 333.6050 440.0710 249.7840
## 5: -211.6940 149.2200 87.4815 867.5700
## ---
## 169000: -7.7972 -271.8040 12023.7000 1103.0500
## 169001: 202.4110 -936.4920 4188.3300 315.9400
## 169002: 123.4760 -219.9320 8923.4000 -453.4640
## 169003: -656.0540 -201.5880 10365.7000 61.6765
## 169004: -661.6280 72.3356 9704.4700 -31.8532
When data processing (e.g. asinh transformation) or analysis (r.g. clustering, dimensionality reduction) is performed, new columns are simply added to the table.
For example, to asinh transform the cellular columns:
cols <- names(cell.dat)[c(2:10)]
cols
## [1] "NK11" "CD3" "CD45" "Ly6G" "CD11b" "B220" "CD8a" "Ly6C" "CD4"
cell.dat <- do.asinh(cell.dat, cols)
cell.dat
## FileName NK11 CD3 CD45 Ly6G CD11b
## 1: CNS_Mock_01.csv 42.3719 40.098700 6885.08 -344.7830 14787.30
## 2: CNS_Mock_01.csv 42.9586 119.014000 1780.29 -429.6650 5665.73
## 3: CNS_Mock_01.csv 59.2366 206.238000 10248.30 -1603.8400 19894.30
## 4: CNS_Mock_01.csv 364.9480 -0.233878 3740.04 -815.9800 9509.43
## 5: CNS_Mock_01.csv 440.2470 40.035200 9191.38 40.5055 5745.82
## ---
## 169000: CNS_WNV_D7_06.csv 910.8890 72.856100 31466.20 -316.5570 28467.80
## 169001: CNS_WNV_D7_06.csv -10.2642 64.188700 45188.00 -540.5140 22734.00
## 169002: CNS_WNV_D7_06.csv -184.2910 -9.445650 11842.60 -97.9383 17237.00
## 169003: CNS_WNV_D7_06.csv 248.3860 229.986000 32288.20 -681.1630 19255.80
## 169004: CNS_WNV_D7_06.csv 738.9810 95.470300 46185.10 -1004.6000 22957.80
## B220 CD8a Ly6C CD4 NK11_asinh CD3_asinh
## 1: -40.2399 83.7175 958.7000 711.0720 2.833658 2.77891776
## 2: 86.6673 34.7219 448.2590 307.2720 2.847316 3.86339136
## 3: 427.8310 285.8800 1008.8300 707.0940 3.167025 4.41288702
## 4: 182.4200 333.6050 440.0710 249.7840 4.983511 -0.04675856
## 5: -211.6940 149.2200 87.4815 867.5700 5.171077 2.77734511
## ---
## 169000: -7.7972 -271.8040 12023.7000 1103.0500 5.898138 3.37337092
## 169001: 202.4110 -936.4920 4188.3300 315.9400 -1.467020 3.24704993
## 169002: 123.4760 -219.9320 8923.4000 -453.4640 -4.300409 -1.39292433
## 169003: -656.0540 -201.5880 10365.7000 61.6765 4.598795 4.52184585
## 169004: -661.6280 72.3356 9704.4700 -31.8532 5.688993 3.64320948
## CD45_asinh Ly6G_asinh CD11b_asinh B220_asinh CD8a_asinh Ly6C_asinh
## 1: 7.920821 -4.926677 8.685233 -2.782406 3.512048 5.949294
## 2: 6.568243 -5.146749 7.725900 3.546617 2.636224 5.189112
## 3: 8.318576 -6.463868 8.981898 5.142472 4.739358 6.000262
## 4: 7.310561 -5.788108 8.243749 4.290209 4.893723 5.170678
## 5: 8.209731 2.788935 7.739937 -4.438991 4.089412 3.555952
## ---
## 169000: 9.440379 -4.841275 9.340238 -1.227289 -4.688875 8.478344
## 169001: 9.802296 -5.376251 9.115326 4.394162 -5.925857 7.423767
## 169002: 8.463168 -3.668698 8.838523 3.900166 -4.477157 8.180142
## 169003: 9.466166 -5.607524 8.949277 -5.569967 -4.390089 8.329967
## 169004: 9.824122 -5.996060 9.125122 -5.578427 3.366218 8.264051
## CD4_asinh
## 1: 5.650495
## 2: 4.811509
## 3: 5.644885
## 4: 4.604406
## 5: 5.849414
## ---
## 169000: 6.089549
## 169001: 4.839324
## 169002: -5.200656
## 169003: 3.207251
## 169004: -2.550951
Then to add samples annotations, clusters, and dimensionality reduciton coordinates:
## FileName NK11 CD3 CD45 Ly6G CD11b
## 1: CNS_Mock_01.csv 42.3719 40.098700 6885.08 -344.7830 14787.30
## 2: CNS_Mock_01.csv 42.9586 119.014000 1780.29 -429.6650 5665.73
## 3: CNS_Mock_01.csv 59.2366 206.238000 10248.30 -1603.8400 19894.30
## 4: CNS_Mock_01.csv 364.9480 -0.233878 3740.04 -815.9800 9509.43
## 5: CNS_Mock_01.csv 440.2470 40.035200 9191.38 40.5055 5745.82
## ---
## 169000: CNS_WNV_D7_06.csv 910.8890 72.856100 31466.20 -316.5570 28467.80
## 169001: CNS_WNV_D7_06.csv -10.2642 64.188700 45188.00 -540.5140 22734.00
## 169002: CNS_WNV_D7_06.csv -184.2910 -9.445650 11842.60 -97.9383 17237.00
## 169003: CNS_WNV_D7_06.csv 248.3860 229.986000 32288.20 -681.1630 19255.80
## 169004: CNS_WNV_D7_06.csv 738.9810 95.470300 46185.10 -1004.6000 22957.80
## B220 CD8a Ly6C CD4 NK11_asinh CD3_asinh
## 1: -40.2399 83.7175 958.7000 711.0720 0.04235923 0.040087962
## 2: 86.6673 34.7219 448.2590 307.2720 0.04294540 0.118734817
## 3: 427.8310 285.8800 1008.8300 707.0940 0.05920201 0.204803270
## 4: 182.4200 333.6050 440.0710 249.7840 0.35729716 -0.000233878
## 5: -211.6940 149.2200 87.4815 867.5700 0.42713953 0.040024513
## ---
## 169000: -7.7972 -271.8040 12023.7000 1103.0500 0.81693878 0.072791800
## 169001: 202.4110 -936.4920 4188.3300 315.9400 -0.01026402 0.064144703
## 169002: 123.4760 -219.9320 8923.4000 -453.4640 -0.18326344 -0.009445510
## 169003: -656.0540 -201.5880 10365.7000 61.6765 0.24590035 0.228005328
## 169004: -661.6280 72.3356 9704.4700 -31.8532 0.68430866 0.095325863
## CD45_asinh Ly6G_asinh CD11b_asinh B220_asinh CD8a_asinh Ly6C_asinh
## 1: 2.627736 -0.33829345 3.388057 -0.040229048 0.08362002 0.8518665
## 2: 1.340828 -0.41743573 2.435282 0.086559169 0.03471493 0.4344615
## 3: 3.022631 -1.25101677 3.684212 0.415750122 0.28212257 0.8876036
## 4: 2.029655 -0.74509796 2.948184 0.181423123 0.32770787 0.4269784
## 5: 2.914359 0.04049443 2.449108 -0.210143906 0.14867171 0.0873703
## ---
## 169000: 4.142314 -0.31149515 4.042229 -0.007797121 -0.26856390 3.1817517
## 169001: 4.504101 -0.51715205 3.817492 0.201053740 -0.83574631 2.1394053
## 169002: 3.166628 -0.09778240 3.541046 0.123164374 -0.21819650 2.8849492
## 169003: 4.168089 -0.63716643 3.651633 -0.616293228 -0.20024703 3.0339681
## 169004: 4.525922 -0.88462254 3.827279 -0.620947819 0.07227267 2.9683779
## CD4_asinh Sample Group Batch FlowSOM_cluster FlowSOM_metacluster
## 1: 0.66171351 01_Mock_01 Mock A 23 2
## 2: 0.30263135 01_Mock_01 Mock A 55 2
## 3: 0.65846851 01_Mock_01 Mock A 64 2
## 4: 0.24725691 01_Mock_01 Mock A 53 2
## 5: 0.78456678 01_Mock_01 Mock A 110 4
## ---
## 169000: 0.95239703 12_WNV_06 WNV A 72 3
## 169001: 0.31090687 12_WNV_06 WNV A 46 3
## 169002: -0.43920651 12_WNV_06 WNV A 133 3
## 169003: 0.06163746 12_WNV_06 WNV A 133 3
## 169004: -0.03184782 12_WNV_06 WNV A 103 3
## Population UMAP_X UMAP_Y
## 1: Microglia -2.3603757 6.201213
## 2: Microglia 2.7505242 7.119595
## 3: Microglia -2.9486033 4.012670
## 4: Microglia 0.6482904 6.481466
## 5: NK cells -2.3941295 6.975885
## ---
## 169000: Infil Macrophages -2.9640724 -5.058265
## 169001: Infil Macrophages -1.2644785 -3.555824
## 169002: Infil Macrophages -2.3592682 -2.429467
## 169003: Infil Macrophages -1.9531062 -4.049705
## 169004: Infil Macrophages -0.7404098 -4.686928
This simple structure is very easy to interact with and manage for high-dimensional cytometry data. It means that various plotting functions are also easy to apply. For example, to make a dimensionality reduction plot:
make.colour.plot(Spectre::demo.clustered, 'UMAP_X', 'UMAP_Y', 'FlowSOM_metacluster', add.label = TRUE)
The simple data structure in Spectre v1 is one of the most valued features of our users, as the simplicity makes interaction straightforward. While this works for datasets where the feature columns (i.e. ‘markers’ etc) are in the 10s-100s. However, when managing single-cell sequencing data, the number of cellular features will reach into the 1000’s or 10,000’s. Additionally, storing RNA sequencing data as a sparse matrix is important to save on memory consumption, which is not currently possible with data.table. Moreover, multi-omic data will include columns/features of different data types, adding to the complexity. The objects in popular single-cell analysis tools (such as Seurat or the SinglCellExperiment objects) provide more structure, but at the cost of added complexity.
In Spectre v2, we are introducing the ‘Spectre object’. The objective is to prioritise simplicity, staying true to the intent of the simple data.table structure used in v1, but incorporating a low level of organisation to facilitate the management of any type of single-cell data. We attempt to stay true to the simple table-oriented design from Spectre v1. To implement this, we have created a simple list structure, which essentially splits up the table into different groups based on columns.
For a dataset ‘cell.dat’:, running dat would return the following:
dat@meta
: is a data.table
containing row (cell) metadata. E.g. sample names, group names, batch names, etc.
dat@data
: is a list containing cellular data. E.g. raw data, transformed/scaled data. This can be any form of single-cell data, including cytometry or single-cell sequencing. Sequencing data is stored as a sparse matrix, rather than a data.table.
dat@analysis
: is a list containing any kind of derived analysis. E.g. Clusters, dimensionality reduction coordinates, cluster annotations, etc.
As a result, the functions work in largely the same way, but they can now be directed to a specific dataset (e.g. run.umap(cell.dat, 'asinh', cols)
etc).
Single-cell data, including mulit-omic data, can be handled with broadly the same structure: