CITE-seq data with CiteFuse and MuData
Danila Bredikhin
European Molecular Biology Laboratory, Heidelberg, Germanydanila.bredikhin@embl.de
Ilia Kats
German Cancer Research Center, Heidelberg, Germanyi.kats@dkfz-heidelberg.de
2023-10-12
Source:vignettes/Blood-CITE-seq.Rmd
Blood-CITE-seq.Rmd
Introduction
CITE-seq data provide RNA and surface protein counts for the same cells. This tutorial shows how MuData can be integrated into with Bioconductor workflows to analyse CITE-seq data.
Installation
The most recent dev build can be installed from GitHub:
Stable version of MuData
will be available in future
bioconductor versions.
Loading data
We will use CITE-seq data available within CiteFuse
Bioconductor package.
data("CITEseq_example", package = "CiteFuse")
lapply(CITEseq_example, dim)
#> $RNA
#> [1] 19521 500
#>
#> $ADT
#> [1] 49 500
#>
#> $HTO
#> [1] 4 500
This dataset contains three matrices — one with RNA
counts, one with antibody-derived tags (ADT
) counts and one
with hashtag oligonucleotide (HTO
) counts.
Processing count matrices
While CITE-seq analysis workflows such as CiteFuse should be consulted for more details, below we exemplify simple data transformations in order to demonstrate how their output can be saved to an H5MU file later on.
Following the CiteFuse tutorial, we start with creating a
SingleCellExperiment
object with the three matrices:
sce_citeseq <- preprocessing(CITEseq_example)
sce_citeseq
#> class: SingleCellExperiment
#> dim: 19521 500
#> metadata(0):
#> assays(1): counts
#> rownames(19521): hg19_AL627309.1 hg19_AL669831.5 ... hg19_MT-ND6
#> hg19_MT-CYB
#> rowData names(0):
#> colnames(500): AAGCCGCGTTGTCTTT GATCGCGGTTATCGGT ... TTGGCAACACTAGTAC
#> GCTGCGAGTTGTGGCC
#> colData names(0):
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(2): ADT HTO
We will add a new assay with normalised RNA counts:
sce_citeseq <- scater::logNormCounts(sce_citeseq)
sce_citeseq # new assay: logcounts
#> class: SingleCellExperiment
#> dim: 19521 500
#> metadata(0):
#> assays(2): counts logcounts
#> rownames(19521): hg19_AL627309.1 hg19_AL669831.5 ... hg19_MT-ND6
#> hg19_MT-CYB
#> rowData names(0):
#> colnames(500): AAGCCGCGTTGTCTTT GATCGCGGTTATCGGT ... TTGGCAACACTAGTAC
#> GCTGCGAGTTGTGGCC
#> colData names(1): sizeFactor
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(2): ADT HTO
To the ADT modality, we will add an assay with normalised counts:
sce_citeseq <- CiteFuse::normaliseExprs(
sce_citeseq, altExp_name = "ADT", transform = "log"
)
altExp(sce_citeseq, "ADT") # new assay: logcounts
#> class: SummarizedExperiment
#> dim: 49 500
#> metadata(0):
#> assays(2): counts logcounts
#> rownames(49): B220 (CD45R) B7-H1 (PD-L1) ... TCRb TCRg
#> rowData names(0):
#> colnames(500): AAGCCGCGTTGTCTTT GATCGCGGTTATCGGT ... TTGGCAACACTAGTAC
#> GCTGCGAGTTGTGGCC
#> colData names(0):
We will also generate reduced dimensions:
sce_citeseq <- scater::runPCA(
sce_citeseq, exprs_values = "logcounts", ncomponents = 20
)
scater::plotReducedDim(sce_citeseq, dimred = "PCA",
by_exprs_values = "logcounts", colour_by = "CD27")
Making a MultiAssayExperiment object
An appropriate structure for multimodal datasets is MultiAssayExperiment
.
We will make a respective MultiAssayExperiment object from
sce_citeseq
:
experiments <- list(
ADT = altExp(sce_citeseq, "ADT"),
HTO = altExp(sce_citeseq, "HTO")
)
# Drop other modalities from sce_citeseq
altExp(sce_citeseq) <- NULL
experiments[["RNA"]] <- sce_citeseq
mae <- MultiAssayExperiment(experiments)
Writing to H5MU
We can write the contents of the MultiAssayExperiment object into an H5MU file:
writeH5MU(mae, "citefuse_example.h5mu")
We can check that all the modalities were written to the file:
h5 <- rhdf5::H5Fopen("citefuse_example.h5mu")
h5ls(H5Gopen(h5, "mod"), recursive = FALSE)
#> group name otype dclass dim
#> 0 / ADT H5I_GROUP
#> 1 / HTO H5I_GROUP
#> 2 / RNA H5I_GROUP
… both assays for ADT — raw counts are stored in X
and
normalised counts are in the corresponding layer:
h5ls(H5Gopen(h5, "mod/ADT"), FALSE)
#> group name otype dclass dim
#> 0 / X H5I_GROUP
#> 1 / layers H5I_GROUP
#> 2 / obs H5I_GROUP
#> 3 / var H5I_GROUP
h5ls(H5Gopen(h5, "mod/ADT/layers"), FALSE)
#> group name otype dclass dim
#> 0 / logcounts H5I_DATASET FLOAT 49 x 500
… as well as reduced dimensions (PCA):
References
mudata (Python) documentation
muon documentation and web page
Kim HJ, Lin Y, Geddes TA, Yang P, Yang JYH (2020). “CiteFuse enables multi-modal analysis of CITE-seq data.” Bioinformatics, 36(14), 4137–4143. https://doi.org/10.1093/bioinformatics/btaa282.
Ramos M, Schiffer L, Re A, Azhar R, Basunia A, Cabrera CR, Chan T, Chapman P, Davis S, Gomez-Cabrero D, Culhane AC, Haibe-Kains B, Hansen K, Kodali H, Louis MS, Mer AS, Reister M, Morgan M, Carey V, Waldron L (2017). “Software For The Integration Of Multi-Omics Experiments In Bioconductor.” Cancer Research, 77(21); e39-42.
Session Info
sessionInfo()
#> R Under development (unstable) (2023-10-10 r85312)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] scater_1.29.4 ggplot2_3.4.3
#> [3] scuttle_1.11.3 CiteFuse_1.13.0
#> [5] MultiAssayExperiment_1.27.5 SingleCellExperiment_1.23.0
#> [7] SummarizedExperiment_1.31.1 Biobase_2.61.0
#> [9] GenomicRanges_1.53.2 GenomeInfoDb_1.37.6
#> [11] IRanges_2.35.2 MatrixGenerics_1.13.1
#> [13] matrixStats_1.0.0 MuData_0.99.9
#> [15] rhdf5_2.45.1 S4Vectors_0.39.2
#> [17] BiocGenerics_0.47.0 Matrix_1.6-1.1
#> [19] BiocStyle_2.29.2
#>
#> loaded via a namespace (and not attached):
#> [1] RColorBrewer_1.1-3 tensorA_0.36.2
#> [3] jsonlite_1.8.7 magrittr_2.0.3
#> [5] ggbeeswarm_0.7.2 farver_2.1.1
#> [7] rmarkdown_2.25 fs_1.6.3
#> [9] zlibbioc_1.47.0 ragg_1.2.6
#> [11] vctrs_0.6.3 memoise_2.0.1
#> [13] DelayedMatrixStats_1.23.4 RCurl_1.98-1.12
#> [15] htmltools_0.5.6.1 S4Arrays_1.1.6
#> [17] BiocNeighbors_1.19.0 Rhdf5lib_1.23.2
#> [19] SparseArray_1.1.12 sass_0.4.7
#> [21] bslib_0.5.1 htmlwidgets_1.6.2
#> [23] desc_1.4.2 plyr_1.8.9
#> [25] plotly_4.10.2 cachem_1.0.8
#> [27] igraph_1.5.1 lifecycle_1.0.3
#> [29] pkgconfig_2.0.3 rsvd_1.0.5
#> [31] R6_2.5.1 fastmap_1.1.1
#> [33] GenomeInfoDbData_1.2.10 digest_0.6.33
#> [35] colorspace_2.1-0 rprojroot_2.0.3
#> [37] dqrng_0.3.1 irlba_2.3.5.1
#> [39] textshaping_0.3.7 beachmat_2.17.16
#> [41] labeling_0.4.3 randomForest_4.7-1.1
#> [43] fansi_1.0.5 httr_1.4.7
#> [45] polyclip_1.10-6 abind_1.4-5
#> [47] compiler_4.4.0 withr_2.5.1
#> [49] BiocParallel_1.35.4 viridis_0.6.4
#> [51] ggforce_0.4.1 MASS_7.3-60.1
#> [53] bayesm_3.1-6 DelayedArray_0.27.10
#> [55] bluster_1.11.4 tools_4.4.0
#> [57] vipor_0.4.5 beeswarm_0.4.0
#> [59] glue_1.6.2 dbscan_1.1-11
#> [61] nlme_3.1-163 rhdf5filters_1.13.5
#> [63] grid_4.4.0 Rtsne_0.16
#> [65] cluster_2.1.4 reshape2_1.4.4
#> [67] generics_0.1.3 gtable_0.3.4
#> [69] tidyr_1.3.0 data.table_1.14.8
#> [71] metapod_1.9.0 BiocSingular_1.17.1
#> [73] tidygraph_1.2.3 ScaledMatrix_1.9.1
#> [75] utf8_1.2.3 XVector_0.41.1
#> [77] ggrepel_0.9.3 pillar_1.9.0
#> [79] stringr_1.5.0 limma_3.57.9
#> [81] robustbase_0.99-0 splines_4.4.0
#> [83] dplyr_1.1.3 tweenr_2.0.2
#> [85] lattice_0.21-9 survival_3.5-7
#> [87] compositions_2.0-6 tidyselect_1.2.0
#> [89] locfit_1.5-9.8 knitr_1.44
#> [91] gridExtra_2.3 bookdown_0.35
#> [93] edgeR_3.99.0 xfun_0.40
#> [95] graphlayouts_1.0.1 mixtools_2.0.0
#> [97] statmod_1.5.0 DEoptimR_1.1-3
#> [99] pheatmap_1.0.12 stringi_1.7.12
#> [101] lazyeval_0.2.2 yaml_2.3.7
#> [103] evaluate_0.22 codetools_0.2-19
#> [105] kernlab_0.9-32 ggraph_2.1.0
#> [107] tibble_3.2.1 BiocManager_1.30.22
#> [109] cli_3.6.1 uwot_0.1.16
#> [111] systemfonts_1.0.5 segmented_1.6-4
#> [113] munsell_0.5.0 jquerylib_0.1.4
#> [115] Rcpp_1.0.11 parallel_4.4.0
#> [117] pkgdown_2.0.7 scran_1.29.3
#> [119] sparseMatrixStats_1.13.4 bitops_1.0-7
#> [121] viridisLite_0.4.2 scales_1.2.1
#> [123] ggridges_0.5.4 purrr_1.0.2
#> [125] crayon_1.5.2 rlang_1.1.1
#> [127] cowplot_1.1.1