TY - JOUR
T1 - SimBu
T2 - Bias-Aware simulation of bulk RNA-seq data with variable cell-Type composition
AU - Dietrich, Alexander
AU - Sturm, Gregor
AU - Merotto, Lorenzo
AU - Marini, Federico
AU - Finotello, Francesca
AU - List, Markus
N1 - Publisher Copyright:
© 2022 The Author(s). Published by Oxford University Press. All rights reserved.
PY - 2022/9/1
Y1 - 2022/9/1
N2 - Motivation: As complex tissues are typically composed of various cell types, deconvolution tools have been developed to computationally infer their cellular composition from bulk RNA sequencing (RNA-seq) data. To comprehensively assess deconvolution performance, gold-standard datasets are indispensable. Gold-standard, experimental techniques like flow cytometry or immunohistochemistry are resource-intensive and cannot be systematically applied to the numerous cell types and tissues profiled with high-Throughput transcriptomics. The simulation of 'pseudo-bulk' data, generated by aggregating single-cell RNA-seq expression profiles in pre-defined proportions, offers a scalable and cost-effective alternative. This makes it feasible to create in silico gold standards that allow fine-grained control of cell-Type fractions not conceivable in an experimental setup. However, at present, no simulation software for generating pseudo-bulk RNA-seq data exists. Results: We developed SimBu, an R package capable of simulating pseudo-bulk samples based on various simulation scenarios, designed to test specific features of deconvolution methods. A unique feature of SimBu is the modeling of cell-Type-specific mRNA bias using experimentally derived or data-driven scaling factors. Here, we show that SimBu can generate realistic pseudo-bulk data, recapitulating the biological and statistical features of real RNA-seq data. Finally, we illustrate the impact of mRNA bias on the evaluation of deconvolution tools and provide recommendations for the selection of suitable methods for estimating mRNA content. SimBu is a user-friendly and flexible tool for simulating realistic pseudo-bulk RNA-seq datasets serving as in silico gold-standard for assessing cell-Type deconvolution methods.
AB - Motivation: As complex tissues are typically composed of various cell types, deconvolution tools have been developed to computationally infer their cellular composition from bulk RNA sequencing (RNA-seq) data. To comprehensively assess deconvolution performance, gold-standard datasets are indispensable. Gold-standard, experimental techniques like flow cytometry or immunohistochemistry are resource-intensive and cannot be systematically applied to the numerous cell types and tissues profiled with high-Throughput transcriptomics. The simulation of 'pseudo-bulk' data, generated by aggregating single-cell RNA-seq expression profiles in pre-defined proportions, offers a scalable and cost-effective alternative. This makes it feasible to create in silico gold standards that allow fine-grained control of cell-Type fractions not conceivable in an experimental setup. However, at present, no simulation software for generating pseudo-bulk RNA-seq data exists. Results: We developed SimBu, an R package capable of simulating pseudo-bulk samples based on various simulation scenarios, designed to test specific features of deconvolution methods. A unique feature of SimBu is the modeling of cell-Type-specific mRNA bias using experimentally derived or data-driven scaling factors. Here, we show that SimBu can generate realistic pseudo-bulk data, recapitulating the biological and statistical features of real RNA-seq data. Finally, we illustrate the impact of mRNA bias on the evaluation of deconvolution tools and provide recommendations for the selection of suitable methods for estimating mRNA content. SimBu is a user-friendly and flexible tool for simulating realistic pseudo-bulk RNA-seq datasets serving as in silico gold-standard for assessing cell-Type deconvolution methods.
UR - http://www.scopus.com/inward/record.url?scp=85146627413&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btac499
DO - 10.1093/bioinformatics/btac499
M3 - Article
AN - SCOPUS:85146627413
SN - 1367-4803
VL - 38
SP - II141-II147
JO - Bioinformatics
JF - Bioinformatics
ER -