Notes on the design of the ‘Spectre’ object.
Spectre v1
Spectre v1 uses a simple table structure to store data, based on the data.table framework. Here cells = rows, and markers/genes = columns (as is the convention for cytometry data).
For example, this demo dataset consists of 169004 cells, with 9 cellular markers, from 12 samples:
cell.dat <- Spectre::demo.start
cell.dat
FileName NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4
1: CNS_Mock_01.csv 42.3719 40.098700 6885.08 -344.7830 14787.30 -40.2399 83.7175 958.7000 711.0720
2: CNS_Mock_01.csv 42.9586 119.014000 1780.29 -429.6650 5665.73 86.6673 34.7219 448.2590 307.2720
3: CNS_Mock_01.csv 59.2366 206.238000 10248.30 -1603.8400 19894.30 427.8310 285.8800 1008.8300 707.0940
4: CNS_Mock_01.csv 364.9480 -0.233878 3740.04 -815.9800 9509.43 182.4200 333.6050 440.0710 249.7840
5: CNS_Mock_01.csv 440.2470 40.035200 9191.38 40.5055 5745.82 -211.6940 149.2200 87.4815 867.5700
---
169000: CNS_WNV_D7_06.csv 910.8890 72.856100 31466.20 -316.5570 28467.80 -7.7972 -271.8040 12023.7000 1103.0500
169001: CNS_WNV_D7_06.csv -10.2642 64.188700 45188.00 -540.5140 22734.00 202.4110 -936.4920 4188.3300 315.9400
169002: CNS_WNV_D7_06.csv -184.2910 -9.445650 11842.60 -97.9383 17237.00 123.4760 -219.9320 8923.4000 -453.4640
169003: CNS_WNV_D7_06.csv 248.3860 229.986000 32288.20 -681.1630 19255.80 -656.0540 -201.5880 10365.7000 61.6765
169004: CNS_WNV_D7_06.csv 738.9810 95.470300 46185.10 -1004.6000 22957.80 -661.6280 72.3356 9704.4700 -31.8532
When data processing (e.g. asinh transformation) or analysis (r.g. clustering, dimensionality reduction) is performed, new columns are simply added to the table.
For example, to asinh transform the cellular columns:
cols <- names(cell.dat)[c(2:10)]
cell.dat <- do.asinh(cell.dat, cols)
FileName NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4
1: CNS_Mock_01.csv 42.3719 40.098700 6885.08 -344.7830 14787.30 -40.2399 83.7175 958.7000 711.0720
2: CNS_Mock_01.csv 42.9586 119.014000 1780.29 -429.6650 5665.73 86.6673 34.7219 448.2590 307.2720
3: CNS_Mock_01.csv 59.2366 206.238000 10248.30 -1603.8400 19894.30 427.8310 285.8800 1008.8300 707.0940
4: CNS_Mock_01.csv 364.9480 -0.233878 3740.04 -815.9800 9509.43 182.4200 333.6050 440.0710 249.7840
5: CNS_Mock_01.csv 440.2470 40.035200 9191.38 40.5055 5745.82 -211.6940 149.2200 87.4815 867.5700
---
169000: CNS_WNV_D7_06.csv 910.8890 72.856100 31466.20 -316.5570 28467.80 -7.7972 -271.8040 12023.7000 1103.0500
169001: CNS_WNV_D7_06.csv -10.2642 64.188700 45188.00 -540.5140 22734.00 202.4110 -936.4920 4188.3300 315.9400
169002: CNS_WNV_D7_06.csv -184.2910 -9.445650 11842.60 -97.9383 17237.00 123.4760 -219.9320 8923.4000 -453.4640
169003: CNS_WNV_D7_06.csv 248.3860 229.986000 32288.20 -681.1630 19255.80 -656.0540 -201.5880 10365.7000 61.6765
169004: CNS_WNV_D7_06.csv 738.9810 95.470300 46185.10 -1004.6000 22957.80 -661.6280 72.3356 9704.4700 -31.8532
NK11_asinh CD3_asinh CD45_asinh Ly6G_asinh CD11b_asinh B220_asinh CD8a_asinh Ly6C_asinh CD4_asinh
1: 0.04235923 0.040087962 2.627736 -0.33829345 3.388057 -0.040229048 0.08362002 0.8518665 0.66171351
2: 0.04294540 0.118734817 1.340828 -0.41743573 2.435282 0.086559169 0.03471493 0.4344615 0.30263135
3: 0.05920201 0.204803270 3.022631 -1.25101677 3.684212 0.415750122 0.28212257 0.8876036 0.65846851
4: 0.35729716 -0.000233878 2.029655 -0.74509796 2.948184 0.181423123 0.32770787 0.4269784 0.24725691
5: 0.42713953 0.040024513 2.914359 0.04049443 2.449108 -0.210143906 0.14867171 0.0873703 0.78456678
---
169000: 0.81693878 0.072791800 4.142314 -0.31149515 4.042229 -0.007797121 -0.26856390 3.1817517 0.95239703
169001: -0.01026402 0.064144703 4.504101 -0.51715205 3.817492 0.201053740 -0.83574631 2.1394053 0.31090687
169002: -0.18326344 -0.009445510 3.166628 -0.09778240 3.541046 0.123164374 -0.21819650 2.8849492 -0.43920651
169003: 0.24590035 0.228005328 4.168089 -0.63716643 3.651633 -0.616293228 -0.20024703 3.0339681 0.06163746
169004: 0.68430866 0.095325863 4.525922 -0.88462254 3.827279 -0.620947819 0.07227267 2.9683779 -0.03184782
Then to add samples annotations, clusters, and dimensionality reduciton coordinates:
FileName NK11 CD3 CD45 Ly6G CD11b B220 CD8a Ly6C CD4
1: CNS_Mock_01.csv 42.3719 40.098700 6885.08 -344.7830 14787.30 -40.2399 83.7175 958.7000 711.0720
2: CNS_Mock_01.csv 42.9586 119.014000 1780.29 -429.6650 5665.73 86.6673 34.7219 448.2590 307.2720
3: CNS_Mock_01.csv 59.2366 206.238000 10248.30 -1603.8400 19894.30 427.8310 285.8800 1008.8300 707.0940
4: CNS_Mock_01.csv 364.9480 -0.233878 3740.04 -815.9800 9509.43 182.4200 333.6050 440.0710 249.7840
5: CNS_Mock_01.csv 440.2470 40.035200 9191.38 40.5055 5745.82 -211.6940 149.2200 87.4815 867.5700
---
169000: CNS_WNV_D7_06.csv 910.8890 72.856100 31466.20 -316.5570 28467.80 -7.7972 -271.8040 12023.7000 1103.0500
169001: CNS_WNV_D7_06.csv -10.2642 64.188700 45188.00 -540.5140 22734.00 202.4110 -936.4920 4188.3300 315.9400
169002: CNS_WNV_D7_06.csv -184.2910 -9.445650 11842.60 -97.9383 17237.00 123.4760 -219.9320 8923.4000 -453.4640
169003: CNS_WNV_D7_06.csv 248.3860 229.986000 32288.20 -681.1630 19255.80 -656.0540 -201.5880 10365.7000 61.6765
169004: CNS_WNV_D7_06.csv 738.9810 95.470300 46185.10 -1004.6000 22957.80 -661.6280 72.3356 9704.4700 -31.8532
NK11_asinh CD3_asinh CD45_asinh Ly6G_asinh CD11b_asinh B220_asinh CD8a_asinh Ly6C_asinh CD4_asinh
1: 0.04235923 0.040087962 2.627736 -0.33829345 3.388057 -0.040229048 0.08362002 0.8518665 0.66171351
2: 0.04294540 0.118734817 1.340828 -0.41743573 2.435282 0.086559169 0.03471493 0.4344615 0.30263135
3: 0.05920201 0.204803270 3.022631 -1.25101677 3.684212 0.415750122 0.28212257 0.8876036 0.65846851
4: 0.35729716 -0.000233878 2.029655 -0.74509796 2.948184 0.181423123 0.32770787 0.4269784 0.24725691
5: 0.42713953 0.040024513 2.914359 0.04049443 2.449108 -0.210143906 0.14867171 0.0873703 0.78456678
---
169000: 0.81693878 0.072791800 4.142314 -0.31149515 4.042229 -0.007797121 -0.26856390 3.1817517 0.95239703
169001: -0.01026402 0.064144703 4.504101 -0.51715205 3.817492 0.201053740 -0.83574631 2.1394053 0.31090687
169002: -0.18326344 -0.009445510 3.166628 -0.09778240 3.541046 0.123164374 -0.21819650 2.8849492 -0.43920651
169003: 0.24590035 0.228005328 4.168089 -0.63716643 3.651633 -0.616293228 -0.20024703 3.0339681 0.06163746
169004: 0.68430866 0.095325863 4.525922 -0.88462254 3.827279 -0.620947819 0.07227267 2.9683779 -0.03184782
Sample Group Batch FlowSOM_cluster FlowSOM_metacluster Population UMAP_X UMAP_Y
1: 01_Mock_01 Mock A 23 2 Microglia -2.3603757 6.201213
2: 01_Mock_01 Mock A 55 2 Microglia 2.7505242 7.119595
3: 01_Mock_01 Mock A 64 2 Microglia -2.9486033 4.012670
4: 01_Mock_01 Mock A 53 2 Microglia 0.6482904 6.481466
5: 01_Mock_01 Mock A 110 4 NK cells -2.3941295 6.975885
---
169000: 12_WNV_06 WNV A 72 3 Infil Macrophages -2.9640724 -5.058265
169001: 12_WNV_06 WNV A 46 3 Infil Macrophages -1.2644785 -3.555824
169002: 12_WNV_06 WNV A 133 3 Infil Macrophages -2.3592682 -2.429467
169003: 12_WNV_06 WNV A 133 3 Infil Macrophages -1.9531062 -4.049705
169004: 12_WNV_06 WNV A 103 3 Infil Macrophages -0.7404098 -4.686928
This simple structure is very easy to interact with and manage for high-dimensional cytometry data. It means that various plotting functions are also easy to apply. For example, to make a dimensionality reduction plot:
make.colour.plot(cell.dat, 'UMAP_X', 'UMAP_Y', 'FlowSOM_metacluster'
Spectre v2
The simple data structure in Spectre v1 is one of the most valued features of our users, as the simplicity makes interaction straightforward. While this works for datasets where the feature columns (i.e. ‘markers’ etc) are in the 10s-100s. However, when managing single-cell sequencing data, the number of cellular features will reach into the 1000’s or 10,000’s. Additionally, storing RNA sequencing data as a sparse matrix is important to save on memory consumption, which is not currently possible with data.table. Moreover, multi-omic data will include columns/features of different data types, adding to the complexity. The objects in popular single-cell analysis tools (such as Seurat or the SinglCellExperiment objects) provide more structure, but at the cost of added complexity.
In Spectre v2, we are introducing the ‘Spectre object’. The objective is to prioritise simplicity, staying true to the intent of the simple data.table structure used in v1, but incorporating a low level of organisation to facilitate the management of any type of single-cell data. We attempt to stay true to the simple table-oriented design from Spectre v1. To implement this, we have created a simple list structure, which essentially splits up the table into different groups based on columns.
For a dataset ‘dat’:, running dat
would return the following:
This is a simple list with three slots: meta
, data
, and analysis
.
dat@meta
: contains row metadata. E.g. sample names, group names, batch names, etc.
dat@data
: is a list containing cellular data. E.g. raw data, transformed/scaled data. This can be any form of single-cell data, including cytometry or single-cell sequencing.
dat@analysis
: is a list containing any kind of derived analysis. E.g. Clusters, dimensionality reduction coordinates, cluster annotations, etc. In the image below they are all stored in one table, but in the updated version they are stored in a similar maner to the data above (e.g. dat@analysis$FlowSOM
, dat@analysis$UMAP
)
As a result, the functions work in largely the same way, but they can now be directed to a specific dataset (e.g. run.umap(cell.dat, 'asinh', cols)
etc).