Spectre v2 object

Introduction

Spectre is an R package and computational toolkit that enables comprehensive end-to-end integration, exploration, and analysis of high-dimensional cytometry or imaging data from different batches or experiments. Spectre streamlines the analytical stages of raw data pre-processing, batch alignment, data integration, clustering, dimensionality reduction, visualisation and population labelling, as well as quantitative and statistical analysis; with a simple, clear, and modular design of analysis workflows, that can be utilised by data and laboratory scientists.

Spectre v1: data.table

Spectre v1 uses a simple table structure to store data, based on the data.table framework. Here cells = rows, and markers/genes = columns (as is the convention for cytometry data). For example, this demo dataset consists of 169004 cells, with 9 cellular markers, from 12 samples:

library('Spectre')

cell.dat <- Spectre::demo.start
cell.dat

##                  FileName      NK11        CD3     CD45       Ly6G    CD11b
##      1:   CNS_Mock_01.csv   42.3719  40.098700  6885.08  -344.7830 14787.30
##      2:   CNS_Mock_01.csv   42.9586 119.014000  1780.29  -429.6650  5665.73
##      3:   CNS_Mock_01.csv   59.2366 206.238000 10248.30 -1603.8400 19894.30
##      4:   CNS_Mock_01.csv  364.9480  -0.233878  3740.04  -815.9800  9509.43
##      5:   CNS_Mock_01.csv  440.2470  40.035200  9191.38    40.5055  5745.82
##     ---                                                                    
## 169000: CNS_WNV_D7_06.csv  910.8890  72.856100 31466.20  -316.5570 28467.80
## 169001: CNS_WNV_D7_06.csv  -10.2642  64.188700 45188.00  -540.5140 22734.00
## 169002: CNS_WNV_D7_06.csv -184.2910  -9.445650 11842.60   -97.9383 17237.00
## 169003: CNS_WNV_D7_06.csv  248.3860 229.986000 32288.20  -681.1630 19255.80
## 169004: CNS_WNV_D7_06.csv  738.9810  95.470300 46185.10 -1004.6000 22957.80
##              B220      CD8a       Ly6C       CD4
##      1:  -40.2399   83.7175   958.7000  711.0720
##      2:   86.6673   34.7219   448.2590  307.2720
##      3:  427.8310  285.8800  1008.8300  707.0940
##      4:  182.4200  333.6050   440.0710  249.7840
##      5: -211.6940  149.2200    87.4815  867.5700
##     ---                                         
## 169000:   -7.7972 -271.8040 12023.7000 1103.0500
## 169001:  202.4110 -936.4920  4188.3300  315.9400
## 169002:  123.4760 -219.9320  8923.4000 -453.4640
## 169003: -656.0540 -201.5880 10365.7000   61.6765
## 169004: -661.6280   72.3356  9704.4700  -31.8532

When data processing (e.g. asinh transformation) or analysis (r.g. clustering, dimensionality reduction) is performed, new columns are simply added to the table.

For example, to asinh transform the cellular columns:

cols <- names(cell.dat)[c(2:10)]
cols

## [1] "NK11"  "CD3"   "CD45"  "Ly6G"  "CD11b" "B220"  "CD8a"  "Ly6C"  "CD4"

cell.dat <- do.asinh(cell.dat, cols)
cell.dat

##                  FileName      NK11        CD3     CD45       Ly6G    CD11b
##      1:   CNS_Mock_01.csv   42.3719  40.098700  6885.08  -344.7830 14787.30
##      2:   CNS_Mock_01.csv   42.9586 119.014000  1780.29  -429.6650  5665.73
##      3:   CNS_Mock_01.csv   59.2366 206.238000 10248.30 -1603.8400 19894.30
##      4:   CNS_Mock_01.csv  364.9480  -0.233878  3740.04  -815.9800  9509.43
##      5:   CNS_Mock_01.csv  440.2470  40.035200  9191.38    40.5055  5745.82
##     ---                                                                    
## 169000: CNS_WNV_D7_06.csv  910.8890  72.856100 31466.20  -316.5570 28467.80
## 169001: CNS_WNV_D7_06.csv  -10.2642  64.188700 45188.00  -540.5140 22734.00
## 169002: CNS_WNV_D7_06.csv -184.2910  -9.445650 11842.60   -97.9383 17237.00
## 169003: CNS_WNV_D7_06.csv  248.3860 229.986000 32288.20  -681.1630 19255.80
## 169004: CNS_WNV_D7_06.csv  738.9810  95.470300 46185.10 -1004.6000 22957.80
##              B220      CD8a       Ly6C       CD4 NK11_asinh   CD3_asinh
##      1:  -40.2399   83.7175   958.7000  711.0720   2.833658  2.77891776
##      2:   86.6673   34.7219   448.2590  307.2720   2.847316  3.86339136
##      3:  427.8310  285.8800  1008.8300  707.0940   3.167025  4.41288702
##      4:  182.4200  333.6050   440.0710  249.7840   4.983511 -0.04675856
##      5: -211.6940  149.2200    87.4815  867.5700   5.171077  2.77734511
##     ---                                                                
## 169000:   -7.7972 -271.8040 12023.7000 1103.0500   5.898138  3.37337092
## 169001:  202.4110 -936.4920  4188.3300  315.9400  -1.467020  3.24704993
## 169002:  123.4760 -219.9320  8923.4000 -453.4640  -4.300409 -1.39292433
## 169003: -656.0540 -201.5880 10365.7000   61.6765   4.598795  4.52184585
## 169004: -661.6280   72.3356  9704.4700  -31.8532   5.688993  3.64320948
##         CD45_asinh Ly6G_asinh CD11b_asinh B220_asinh CD8a_asinh Ly6C_asinh
##      1:   7.920821  -4.926677    8.685233  -2.782406   3.512048   5.949294
##      2:   6.568243  -5.146749    7.725900   3.546617   2.636224   5.189112
##      3:   8.318576  -6.463868    8.981898   5.142472   4.739358   6.000262
##      4:   7.310561  -5.788108    8.243749   4.290209   4.893723   5.170678
##      5:   8.209731   2.788935    7.739937  -4.438991   4.089412   3.555952
##     ---                                                                   
## 169000:   9.440379  -4.841275    9.340238  -1.227289  -4.688875   8.478344
## 169001:   9.802296  -5.376251    9.115326   4.394162  -5.925857   7.423767
## 169002:   8.463168  -3.668698    8.838523   3.900166  -4.477157   8.180142
## 169003:   9.466166  -5.607524    8.949277  -5.569967  -4.390089   8.329967
## 169004:   9.824122  -5.996060    9.125122  -5.578427   3.366218   8.264051
##         CD4_asinh
##      1:  5.650495
##      2:  4.811509
##      3:  5.644885
##      4:  4.604406
##      5:  5.849414
##     ---          
## 169000:  6.089549
## 169001:  4.839324
## 169002: -5.200656
## 169003:  3.207251
## 169004: -2.550951

Then to add samples annotations, clusters, and dimensionality reduciton coordinates:

##                  FileName      NK11        CD3     CD45       Ly6G    CD11b
##      1:   CNS_Mock_01.csv   42.3719  40.098700  6885.08  -344.7830 14787.30
##      2:   CNS_Mock_01.csv   42.9586 119.014000  1780.29  -429.6650  5665.73
##      3:   CNS_Mock_01.csv   59.2366 206.238000 10248.30 -1603.8400 19894.30
##      4:   CNS_Mock_01.csv  364.9480  -0.233878  3740.04  -815.9800  9509.43
##      5:   CNS_Mock_01.csv  440.2470  40.035200  9191.38    40.5055  5745.82
##     ---                                                                    
## 169000: CNS_WNV_D7_06.csv  910.8890  72.856100 31466.20  -316.5570 28467.80
## 169001: CNS_WNV_D7_06.csv  -10.2642  64.188700 45188.00  -540.5140 22734.00
## 169002: CNS_WNV_D7_06.csv -184.2910  -9.445650 11842.60   -97.9383 17237.00
## 169003: CNS_WNV_D7_06.csv  248.3860 229.986000 32288.20  -681.1630 19255.80
## 169004: CNS_WNV_D7_06.csv  738.9810  95.470300 46185.10 -1004.6000 22957.80
##              B220      CD8a       Ly6C       CD4  NK11_asinh    CD3_asinh
##      1:  -40.2399   83.7175   958.7000  711.0720  0.04235923  0.040087962
##      2:   86.6673   34.7219   448.2590  307.2720  0.04294540  0.118734817
##      3:  427.8310  285.8800  1008.8300  707.0940  0.05920201  0.204803270
##      4:  182.4200  333.6050   440.0710  249.7840  0.35729716 -0.000233878
##      5: -211.6940  149.2200    87.4815  867.5700  0.42713953  0.040024513
##     ---                                                                  
## 169000:   -7.7972 -271.8040 12023.7000 1103.0500  0.81693878  0.072791800
## 169001:  202.4110 -936.4920  4188.3300  315.9400 -0.01026402  0.064144703
## 169002:  123.4760 -219.9320  8923.4000 -453.4640 -0.18326344 -0.009445510
## 169003: -656.0540 -201.5880 10365.7000   61.6765  0.24590035  0.228005328
## 169004: -661.6280   72.3356  9704.4700  -31.8532  0.68430866  0.095325863
##         CD45_asinh  Ly6G_asinh CD11b_asinh   B220_asinh  CD8a_asinh Ly6C_asinh
##      1:   2.627736 -0.33829345    3.388057 -0.040229048  0.08362002  0.8518665
##      2:   1.340828 -0.41743573    2.435282  0.086559169  0.03471493  0.4344615
##      3:   3.022631 -1.25101677    3.684212  0.415750122  0.28212257  0.8876036
##      4:   2.029655 -0.74509796    2.948184  0.181423123  0.32770787  0.4269784
##      5:   2.914359  0.04049443    2.449108 -0.210143906  0.14867171  0.0873703
##     ---                                                                       
## 169000:   4.142314 -0.31149515    4.042229 -0.007797121 -0.26856390  3.1817517
## 169001:   4.504101 -0.51715205    3.817492  0.201053740 -0.83574631  2.1394053
## 169002:   3.166628 -0.09778240    3.541046  0.123164374 -0.21819650  2.8849492
## 169003:   4.168089 -0.63716643    3.651633 -0.616293228 -0.20024703  3.0339681
## 169004:   4.525922 -0.88462254    3.827279 -0.620947819  0.07227267  2.9683779
##           CD4_asinh     Sample Group Batch FlowSOM_cluster FlowSOM_metacluster
##      1:  0.66171351 01_Mock_01  Mock     A              23                   2
##      2:  0.30263135 01_Mock_01  Mock     A              55                   2
##      3:  0.65846851 01_Mock_01  Mock     A              64                   2
##      4:  0.24725691 01_Mock_01  Mock     A              53                   2
##      5:  0.78456678 01_Mock_01  Mock     A             110                   4
##     ---                                                                       
## 169000:  0.95239703  12_WNV_06   WNV     A              72                   3
## 169001:  0.31090687  12_WNV_06   WNV     A              46                   3
## 169002: -0.43920651  12_WNV_06   WNV     A             133                   3
## 169003:  0.06163746  12_WNV_06   WNV     A             133                   3
## 169004: -0.03184782  12_WNV_06   WNV     A             103                   3
##                Population     UMAP_X    UMAP_Y
##      1:         Microglia -2.3603757  6.201213
##      2:         Microglia  2.7505242  7.119595
##      3:         Microglia -2.9486033  4.012670
##      4:         Microglia  0.6482904  6.481466
##      5:          NK cells -2.3941295  6.975885
##     ---                                       
## 169000: Infil Macrophages -2.9640724 -5.058265
## 169001: Infil Macrophages -1.2644785 -3.555824
## 169002: Infil Macrophages -2.3592682 -2.429467
## 169003: Infil Macrophages -1.9531062 -4.049705
## 169004: Infil Macrophages -0.7404098 -4.686928

This simple structure is very easy to interact with and manage for high-dimensional cytometry data. It means that various plotting functions are also easy to apply. For example, to make a dimensionality reduction plot:

make.colour.plot(Spectre::demo.clustered, 'UMAP_X', 'UMAP_Y', 'FlowSOM_metacluster', add.label = TRUE)

Spectre v2: ‘Spectre’ object

The simple data structure in Spectre v1 is one of the most valued features of our users, as the simplicity makes interaction straightforward. While this works for datasets where the feature columns (i.e. ‘markers’ etc) are in the 10s-100s. However, when managing single-cell sequencing data, the number of cellular features will reach into the 1000’s or 10,000’s. Additionally, storing RNA sequencing data as a sparse matrix is important to save on memory consumption, which is not currently possible with data.table. Moreover, multi-omic data will include columns/features of different data types, adding to the complexity. The objects in popular single-cell analysis tools (such as Seurat or the SinglCellExperiment objects) provide more structure, but at the cost of added complexity.

In Spectre v2, we are introducing the ‘Spectre object’. The objective is to prioritise simplicity, staying true to the intent of the simple data.table structure used in v1, but incorporating a low level of organisation to facilitate the management of any type of single-cell data. We attempt to stay true to the simple table-oriented design from Spectre v1. To implement this, we have created a simple list structure, which essentially splits up the table into different groups based on columns.

Cytometry data

For a dataset ‘cell.dat’:, running dat would return the following:

dat@meta: is a data.table containing row (cell) metadata. E.g. sample names, group names, batch names, etc.

dat@data: is a list containing cellular data. E.g. raw data, transformed/scaled data. This can be any form of single-cell data, including cytometry or single-cell sequencing. Sequencing data is stored as a sparse matrix, rather than a data.table.

dat@analysis: is a list containing any kind of derived analysis. E.g. Clusters, dimensionality reduction coordinates, cluster annotations, etc.

As a result, the functions work in largely the same way, but they can now be directed to a specific dataset (e.g. run.umap(cell.dat, 'asinh', cols) etc).

Single-cell and mutliomic data

Single-cell data, including mulit-omic data, can be handled with broadly the same structure: