Introduction

SingleCellExperiment or SCE in short, is a data structure specifically designed for storing single cell (scRNAseq) experiments. It is, to a degree, an extension of the SummarizedExperiment. Many (if not almost all) scRNAseq-related packages in Bioconductor either support or use SCE to store data.

This short tutorial will show you how to convert data.table object used by Spectre into SCE, and vice versa.

Converting SCE to data.table

To do this, we have built into Spectre, a function to convert SCE into data.table. Currently, the function is only available in development branch. It will be made available in the master branch on our next release.

To use it, you need to first install code from the development branch (note, this may take some time):

remotes::install_github("immunedynamics/Spectre", branch="development")

Let’s create some example SCE object from SCE vignette:

library(SingleCellExperiment)
pretend_counts <- matrix(rpois(100, lambda = 10), ncol=10, nrow=10)
pretend_cell_labels <- sample(letters, ncol(pretend_counts), replace=TRUE)
pretend_gene_lengths <- sample(10000, nrow(pretend_counts))

pretend_cell_id <- paste0("cell_", c(1: length(pretend_cell_labels)))
pretend_gene_id <- paste0("gene_", c(1: length(pretend_gene_lengths)))

sce <- SingleCellExperiment(list(counts=pretend_counts),
    colData=DataFrame(label=pretend_cell_labels, cell_id=pretend_cell_id),
    rowData=DataFrame(length=pretend_gene_lengths, gene_id=pretend_gene_id),
    metadata=list(study="PretendStudy")
)
rownames(sce) <- pretend_gene_id
colnames(sce) <- pretend_cell_id

# View the SCE object
sce
## class: SingleCellExperiment 
## dim: 10 10 
## metadata(1): study
## assays(1): counts
## rownames(10): gene_1 gene_2 ... gene_9 gene_10
## rowData names(2): length gene_id
## colnames(10): cell_1 cell_2 ... cell_9 cell_10
## colData names(2): label cell_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

Now to convert this into a data.table:

dt_sce <- Spectre::create.dt(sce)

Few things to note:

  1. Cells in SCE are stored as columns while genes are stored as row. This will be reversed in the converted data.table object!
  2. Your SCE must have colnames and rownames filled. The former represents the unique ID assigned to each cell while the latter represents the unique ID assigned to each gene (can be something like Ensembl ID for the genes). Unique ID for the cell can be the unique cell barcode from 10x if you use 10x kit. If your raw data do not come with any ID that uniquely identify the cells, you can just create one yourself and then attach it to the SCE object using colnames(sce) = paste0("cell_", c(1: nrow(sce))).
  3. The SCE object will be converted into a list. More on this below.

Let’s look at the list converted from the SCE object:

names(dt_sce)
## [1] "data.table" "geneNames"  "cellNames"  "meta.data"  "assays"    
## [6] "dim.reds"

data.table element basically contains the data.table object that Spectre’s functions operate on:

head(dt_sce$data.table)
##    cellNames label cell_id gene_1_counts gene_2_counts gene_3_counts
## 1:    cell_1     u  cell_1             9             9             9
## 2:    cell_2     l  cell_2            12             6             9
## 3:    cell_3     y  cell_3             7            11            15
## 4:    cell_4     k  cell_4            15            10             9
## 5:    cell_5     a  cell_5            11            11             7
## 6:    cell_6     n  cell_6            15            13            17
##    gene_4_counts gene_5_counts gene_6_counts gene_7_counts gene_8_counts
## 1:             8            12             6            12            10
## 2:             7            10             5             6             9
## 3:             8            15            13             7             7
## 4:             8            12            10             9             7
## 5:             2             7             8             6            13
## 6:            10             6             4            14             8
##    gene_9_counts gene_10_counts
## 1:            14              7
## 2:            12              5
## 3:             6              4
## 4:            13             12
## 5:            17             11
## 6:            13              7

geneNames and cellNames are the unique ID for your genes and cells. These are the same as the data you used for SCE’s rownames and colname.

meta.data contains the metadata for the cells, i.e., whatever you have in the colData of the SCE object.

assays shows you the name of all the assays in the SCE object. Assays in SCE act like slots for your data. Each assay can store an alternative version of your data. Raw gene counts are generally stored in the counts assay while the log transformed version of it is stored in the logcounts assay. See SCE vignette for more details. The assays element in our list only stores the name of all the assays in your SCE object. The data for each of these assays should be in the data.table element, in the columns which names are preprended with the assay name. For this example, gene_1 value from counts assay is stored under gene_1_counts column.

dim.reds element will store the name of all the low-dimensional representations (e.g., the UMAP or tSNE representation of your cells) in the SCE object. Like assays, the data should be embedded in the data.table element. See SCE vignette for more details.

Converting data.table to SCE

We haven’t built a function in Spectre to do this, but it is possible to convert data.table to SCE. Let’s start with loading up the sample data we have in Spectre:

dt <- Spectre::demo.clustered
head(dt)
##           FileName     NK11        CD3     CD45       Ly6G    CD11b      B220
## 1: CNS_Mock_01.csv  42.3719  40.098700  6885.08  -344.7830 14787.30  -40.2399
## 2: CNS_Mock_01.csv  42.9586 119.014000  1780.29  -429.6650  5665.73   86.6673
## 3: CNS_Mock_01.csv  59.2366 206.238000 10248.30 -1603.8400 19894.30  427.8310
## 4: CNS_Mock_01.csv 364.9480  -0.233878  3740.04  -815.9800  9509.43  182.4200
## 5: CNS_Mock_01.csv 440.2470  40.035200  9191.38    40.5055  5745.82 -211.6940
## 6: CNS_Mock_01.csv 151.5890 124.525000  4256.17  -596.1300 12200.80   94.0770
##        CD8a      Ly6C     CD4 NK11_asinh    CD3_asinh CD45_asinh  Ly6G_asinh
## 1:  83.7175  958.7000 711.072 0.04235923  0.040087962   2.627736 -0.33829345
## 2:  34.7219  448.2590 307.272 0.04294540  0.118734817   1.340828 -0.41743573
## 3: 285.8800 1008.8300 707.094 0.05920201  0.204803270   3.022631 -1.25101677
## 4: 333.6050  440.0710 249.784 0.35729716 -0.000233878   2.029655 -0.74509796
## 5: 149.2200   87.4815 867.570 0.42713953  0.040024513   2.914359  0.04049443
## 6: 109.3110  417.4010 352.982 0.15101436  0.124205401   2.155040 -0.56550357
##    CD11b_asinh  B220_asinh CD8a_asinh Ly6C_asinh CD4_asinh     Sample Group
## 1:    3.388057 -0.04022905 0.08362002  0.8518665 0.6617135 01_Mock_01  Mock
## 2:    2.435282  0.08655917 0.03471493  0.4344615 0.3026313 01_Mock_01  Mock
## 3:    3.684212  0.41575012 0.28212257  0.8876036 0.6584685 01_Mock_01  Mock
## 4:    2.948184  0.18142312 0.32770787  0.4269784 0.2472569 01_Mock_01  Mock
## 5:    2.449108 -0.21014391 0.14867171  0.0873703 0.7845668 01_Mock_01  Mock
## 6:    3.196324  0.09393878 0.10909447  0.4061429 0.3460348 01_Mock_01  Mock
##    Batch FlowSOM_cluster FlowSOM_metacluster Population     UMAP_X   UMAP_Y
## 1:     A              23                   2  Microglia -2.3603757 6.201213
## 2:     A              55                   2  Microglia  2.7505242 7.119595
## 3:     A              64                   2  Microglia -2.9486033 4.012670
## 4:     A              53                   2  Microglia  0.6482904 6.481466
## 5:     A             110                   4   NK cells -2.3941295 6.975885
## 6:     A              24                   2  Microglia -0.4012698 6.679605

We can’t just load the whole data.table into SingleCellExperiment function and expect it to magically create SCE object.

First, let’s isolate the marker expression and transpose it such that the cells are the columns and the markers are the rows. We have to transpose as most functions that use SCE will treat the columns as the cells and the rows as the markers (or genes). We will store the raw and asinh data in separate assays.

markers <- c("NK11", "CD3", "CD45", "Ly6G", 
             "CD11b", "B220", "CD8a", "Ly6C", "CD4")
sce_counts <- t(dt[, markers, with=F])
sce_counts_asinh <- t(dt[, paste0(markers, "_asinh"), with=F])
rownames(sce_counts_asinh) <- markers

Note here, I have to rename the row names of the asinh data.table as they have to match the raw data when we load the asinh data into SCE.

Next, we will give our cells a unique ID, and store any metadata or information (e.g. clusters and cell type annotation) about the cells in a DataFrame object.

cell_id <- paste0("cell_", c(1:nrow(dt)))
cell_meta <- DataFrame(
    dt[, c(
        "FileName", "Sample", "Group", "Batch", 
        "FlowSOM_cluster", "FlowSOM_metacluster", "Population"
        )
       ],
    row.names=cell_id
)

Note here, I use the cell ID as the row names of the DataFrame.

In this sample data, we don’t have additional information about our markers. So I’ll make one up just as an example:

marker_meta <- DataFrame(
    data.table(Panel=rep("Panel_1", 9), Type=rep("Marker", 9)), 
    row.names=c("NK11", "CD3", "CD45", "Ly6G", 
                "CD11b", "B220", "CD8a", "Ly6C", "CD4")
)

And lastly, we will extract the UMAP coordinates:

umap_dt <- dt[, c("UMAP_X", "UMAP_Y")]
rownames(umap_dt) <- cell_id

Note here, I use the cell ID as the row names of the umap_dt. This is mandatory for adding this into SCE later.

We will then create SCE object:

library(SingleCellExperiment)
sce <- SingleCellExperiment(
    assays=list(
        counts=sce_counts,
        asinhcounts=sce_counts_asinh
    ),
    colData=cell_meta,
    rowData=marker_meta,
)
colnames(sce) <- cell_id
rownames(sce) <- markers
reducedDims(sce) <- list(UMAP=umap_dt)

Some things worth noting:

  1. Assigning colnames and rownames to SCE have to be done separately (and after!) from the actual SCE object creation as the function does not allow us to specify colnames or rownames
  2. Assigning UMAP can only be done after we assign colnames as SCE requires the row name for the low-dimensional representations data to be the same as SCE’s colnames.

Inspect the SCE object:

sce
## class: SingleCellExperiment 
## dim: 9 169004 
## metadata(0):
## assays(2): counts asinhcounts
## rownames(9): NK11 CD3 ... Ly6C CD4
## rowData names(2): Panel Type
## colnames(169004): cell_1 cell_2 ... cell_169003 cell_169004
## colData names(7): FileName Sample ... FlowSOM_metacluster Population
## reducedDimNames(1): UMAP
## mainExpName: NULL
## altExpNames(0):

And you are good to go!