SingleCellExperiment or SCE in short, is a data structure specifically designed for storing single cell (scRNAseq) experiments. It is, to a degree, an extension of the SummarizedExperiment. Many (if not almost all) scRNAseq-related packages in Bioconductor either support or use SCE to store data.
This short tutorial will show you how to convert data.table object used by Spectre into SCE, and vice versa.
To do this, we have built into Spectre, a function to convert SCE
into data.table. Currently, the function is only available in
development
branch. It will be made available in the
master
branch on our next release.
To use it, you need to first install code from the
development
branch (note, this may take some time):
remotes::install_github("immunedynamics/Spectre", branch="development")
Let’s create some example SCE object from SCE vignette:
library(SingleCellExperiment)
pretend_counts <- matrix(rpois(100, lambda = 10), ncol=10, nrow=10)
pretend_cell_labels <- sample(letters, ncol(pretend_counts), replace=TRUE)
pretend_gene_lengths <- sample(10000, nrow(pretend_counts))
pretend_cell_id <- paste0("cell_", c(1: length(pretend_cell_labels)))
pretend_gene_id <- paste0("gene_", c(1: length(pretend_gene_lengths)))
sce <- SingleCellExperiment(list(counts=pretend_counts),
colData=DataFrame(label=pretend_cell_labels, cell_id=pretend_cell_id),
rowData=DataFrame(length=pretend_gene_lengths, gene_id=pretend_gene_id),
metadata=list(study="PretendStudy")
)
rownames(sce) <- pretend_gene_id
colnames(sce) <- pretend_cell_id
# View the SCE object
sce
## class: SingleCellExperiment
## dim: 10 10
## metadata(1): study
## assays(1): counts
## rownames(10): gene_1 gene_2 ... gene_9 gene_10
## rowData names(2): length gene_id
## colnames(10): cell_1 cell_2 ... cell_9 cell_10
## colData names(2): label cell_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
Now to convert this into a data.table:
dt_sce <- Spectre::create.dt(sce)
Few things to note:
colnames
and rownames
filled. The former represents the unique ID assigned to each cell while
the latter represents the unique ID assigned to each gene (can be
something like Ensembl ID for the genes). Unique ID for the cell can be
the unique cell barcode from 10x if you use 10x kit. If your raw data do
not come with any ID that uniquely identify the cells, you can just
create one yourself and then attach it to the SCE object using
colnames(sce) = paste0("cell_", c(1: nrow(sce)))
.Let’s look at the list converted from the SCE object:
names(dt_sce)
## [1] "data.table" "geneNames" "cellNames" "meta.data" "assays"
## [6] "dim.reds"
data.table
element basically contains the data.table
object that Spectre’s functions operate on:
head(dt_sce$data.table)
## cellNames label cell_id gene_1_counts gene_2_counts gene_3_counts
## 1: cell_1 u cell_1 9 9 9
## 2: cell_2 l cell_2 12 6 9
## 3: cell_3 y cell_3 7 11 15
## 4: cell_4 k cell_4 15 10 9
## 5: cell_5 a cell_5 11 11 7
## 6: cell_6 n cell_6 15 13 17
## gene_4_counts gene_5_counts gene_6_counts gene_7_counts gene_8_counts
## 1: 8 12 6 12 10
## 2: 7 10 5 6 9
## 3: 8 15 13 7 7
## 4: 8 12 10 9 7
## 5: 2 7 8 6 13
## 6: 10 6 4 14 8
## gene_9_counts gene_10_counts
## 1: 14 7
## 2: 12 5
## 3: 6 4
## 4: 13 12
## 5: 17 11
## 6: 13 7
geneNames
and cellNames
are the unique ID
for your genes and cells. These are the same as the data you used for
SCE’s rownames
and colname
.
meta.data
contains the metadata for the cells, i.e.,
whatever you have in the colData
of the SCE object.
assays
shows you the name of all the assays in the SCE
object. Assays in SCE act like slots for your data. Each assay can store
an alternative version of your data. Raw gene counts are generally
stored in the counts
assay while the log transformed
version of it is stored in the logcounts
assay. See SCE
vignette for more details. The assays
element in our
list only stores the name of all the assays in
your SCE object. The data for each of these assays should be in the
data.table
element, in the columns which names are
preprended with the assay name. For this example, gene_1
value from counts
assay is stored under
gene_1_counts
column.
dim.reds
element will store the
name of all the low-dimensional
representations (e.g., the UMAP or tSNE representation of your cells) in
the SCE object. Like assays
, the data should be embedded in
the data.table
element. See SCE
vignette for more details.
We haven’t built a function in Spectre to do this, but it is possible to convert data.table to SCE. Let’s start with loading up the sample data we have in Spectre:
dt <- Spectre::demo.clustered
head(dt)
## FileName NK11 CD3 CD45 Ly6G CD11b B220
## 1: CNS_Mock_01.csv 42.3719 40.098700 6885.08 -344.7830 14787.30 -40.2399
## 2: CNS_Mock_01.csv 42.9586 119.014000 1780.29 -429.6650 5665.73 86.6673
## 3: CNS_Mock_01.csv 59.2366 206.238000 10248.30 -1603.8400 19894.30 427.8310
## 4: CNS_Mock_01.csv 364.9480 -0.233878 3740.04 -815.9800 9509.43 182.4200
## 5: CNS_Mock_01.csv 440.2470 40.035200 9191.38 40.5055 5745.82 -211.6940
## 6: CNS_Mock_01.csv 151.5890 124.525000 4256.17 -596.1300 12200.80 94.0770
## CD8a Ly6C CD4 NK11_asinh CD3_asinh CD45_asinh Ly6G_asinh
## 1: 83.7175 958.7000 711.072 0.04235923 0.040087962 2.627736 -0.33829345
## 2: 34.7219 448.2590 307.272 0.04294540 0.118734817 1.340828 -0.41743573
## 3: 285.8800 1008.8300 707.094 0.05920201 0.204803270 3.022631 -1.25101677
## 4: 333.6050 440.0710 249.784 0.35729716 -0.000233878 2.029655 -0.74509796
## 5: 149.2200 87.4815 867.570 0.42713953 0.040024513 2.914359 0.04049443
## 6: 109.3110 417.4010 352.982 0.15101436 0.124205401 2.155040 -0.56550357
## CD11b_asinh B220_asinh CD8a_asinh Ly6C_asinh CD4_asinh Sample Group
## 1: 3.388057 -0.04022905 0.08362002 0.8518665 0.6617135 01_Mock_01 Mock
## 2: 2.435282 0.08655917 0.03471493 0.4344615 0.3026313 01_Mock_01 Mock
## 3: 3.684212 0.41575012 0.28212257 0.8876036 0.6584685 01_Mock_01 Mock
## 4: 2.948184 0.18142312 0.32770787 0.4269784 0.2472569 01_Mock_01 Mock
## 5: 2.449108 -0.21014391 0.14867171 0.0873703 0.7845668 01_Mock_01 Mock
## 6: 3.196324 0.09393878 0.10909447 0.4061429 0.3460348 01_Mock_01 Mock
## Batch FlowSOM_cluster FlowSOM_metacluster Population UMAP_X UMAP_Y
## 1: A 23 2 Microglia -2.3603757 6.201213
## 2: A 55 2 Microglia 2.7505242 7.119595
## 3: A 64 2 Microglia -2.9486033 4.012670
## 4: A 53 2 Microglia 0.6482904 6.481466
## 5: A 110 4 NK cells -2.3941295 6.975885
## 6: A 24 2 Microglia -0.4012698 6.679605
We can’t just load the whole data.table into
SingleCellExperiment
function and expect it to magically
create SCE object.
First, let’s isolate the marker expression and transpose it such that the cells are the columns and the markers are the rows. We have to transpose as most functions that use SCE will treat the columns as the cells and the rows as the markers (or genes). We will store the raw and asinh data in separate assays.
markers <- c("NK11", "CD3", "CD45", "Ly6G",
"CD11b", "B220", "CD8a", "Ly6C", "CD4")
sce_counts <- t(dt[, markers, with=F])
sce_counts_asinh <- t(dt[, paste0(markers, "_asinh"), with=F])
rownames(sce_counts_asinh) <- markers
Note here, I have to rename the row names of the asinh
data.table
as they have to match the raw data when we load
the asinh data into SCE.
Next, we will give our cells a unique ID, and store any metadata or
information (e.g. clusters and cell type annotation) about the cells in
a DataFrame
object.
cell_id <- paste0("cell_", c(1:nrow(dt)))
cell_meta <- DataFrame(
dt[, c(
"FileName", "Sample", "Group", "Batch",
"FlowSOM_cluster", "FlowSOM_metacluster", "Population"
)
],
row.names=cell_id
)
Note here, I use the cell ID as the row names of the
DataFrame
.
In this sample data, we don’t have additional information about our markers. So I’ll make one up just as an example:
marker_meta <- DataFrame(
data.table(Panel=rep("Panel_1", 9), Type=rep("Marker", 9)),
row.names=c("NK11", "CD3", "CD45", "Ly6G",
"CD11b", "B220", "CD8a", "Ly6C", "CD4")
)
And lastly, we will extract the UMAP coordinates:
umap_dt <- dt[, c("UMAP_X", "UMAP_Y")]
rownames(umap_dt) <- cell_id
Note here, I use the cell ID as the row names of the
umap_dt
. This is mandatory for adding this into SCE
later.
We will then create SCE object:
library(SingleCellExperiment)
sce <- SingleCellExperiment(
assays=list(
counts=sce_counts,
asinhcounts=sce_counts_asinh
),
colData=cell_meta,
rowData=marker_meta,
)
colnames(sce) <- cell_id
rownames(sce) <- markers
reducedDims(sce) <- list(UMAP=umap_dt)
Some things worth noting:
colnames
and rownames
to SCE
have to be done separately (and after!) from the actual SCE object
creation as the function does not allow us to specify
colnames
or rownames
colnames
as SCE requires the row name for the
low-dimensional representations data to be the same as SCE’s
colnames
.Inspect the SCE object:
sce
## class: SingleCellExperiment
## dim: 9 169004
## metadata(0):
## assays(2): counts asinhcounts
## rownames(9): NK11 CD3 ... Ly6C CD4
## rowData names(2): Panel Type
## colnames(169004): cell_1 cell_2 ... cell_169003 cell_169004
## colData names(7): FileName Sample ... FlowSOM_metacluster Population
## reducedDimNames(1): UMAP
## mainExpName: NULL
## altExpNames(0):
And you are good to go!