TY - JOUR
T1 - CaSTLe - Classification of single cells by transfer learning
T2 - Harnessing the power of publicly available single cell RNA sequencing experiments to annotate new experiments
AU - Lieberman, Yuval
AU - Rokach, Lior
AU - Shay, Tal
N1 - Funding Information:
This work was supported by Israel Science Foundation Grant 500/15, Broad-Israel Science Foundation Grant 1644/15 and the National Institutes of Health Grants R24 AI072073 to TS and YL. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Publisher Copyright:
© 2018 Lieberman et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2018/10/1
Y1 - 2018/10/1
N2 - Single-cell RNA sequencing (scRNA-seq) is an emerging technology for profiling the gene expression of thousands of cells at the single cell resolution. Currently, the labeling of cells in an scRNA-seq dataset is performed by manually characterizing clusters of cells or by fluorescence- activated cell sorting (FACS). Both methods have inherent drawbacks: The first depends on the clustering algorithm used and the knowledge and arbitrary decisions of the annotator, and the second involves an experimental step in addition to the sequencing and cannot be incorporated into the higher throughput scRNA-seq methods. We therefore suggest a different approach for cell labeling, namely, classifying cells from scRNA-seq datasets by using a model transferred from different (previously labeled) datasets. This approach can complement existing methods, and-in some cases-even replace them. Such a transferlearning framework requires selecting informative features and training a classifier. The specific implementation for the framework that we propose, designated ''CaSTLe-classification of single cells by transfer learning,'' is based on a robust feature engineering workflow and an XGBoost classification model built on these features. Evaluation of CaSTLe against two benchmark feature-selection and classification methods showed that it outperformed the benchmark methods in most cases and yielded satisfactory classification accuracy in a consistent manner. CaSTLe has the additional advantage of being parallelizable and well suited to large datasets. We showed that it was possible to classify cell types using transfer learning, even when the databases contained a very small number of genes, and our study thus indicates the potential applicability of this approach for analysis of scRNA-seq datasets.
AB - Single-cell RNA sequencing (scRNA-seq) is an emerging technology for profiling the gene expression of thousands of cells at the single cell resolution. Currently, the labeling of cells in an scRNA-seq dataset is performed by manually characterizing clusters of cells or by fluorescence- activated cell sorting (FACS). Both methods have inherent drawbacks: The first depends on the clustering algorithm used and the knowledge and arbitrary decisions of the annotator, and the second involves an experimental step in addition to the sequencing and cannot be incorporated into the higher throughput scRNA-seq methods. We therefore suggest a different approach for cell labeling, namely, classifying cells from scRNA-seq datasets by using a model transferred from different (previously labeled) datasets. This approach can complement existing methods, and-in some cases-even replace them. Such a transferlearning framework requires selecting informative features and training a classifier. The specific implementation for the framework that we propose, designated ''CaSTLe-classification of single cells by transfer learning,'' is based on a robust feature engineering workflow and an XGBoost classification model built on these features. Evaluation of CaSTLe against two benchmark feature-selection and classification methods showed that it outperformed the benchmark methods in most cases and yielded satisfactory classification accuracy in a consistent manner. CaSTLe has the additional advantage of being parallelizable and well suited to large datasets. We showed that it was possible to classify cell types using transfer learning, even when the databases contained a very small number of genes, and our study thus indicates the potential applicability of this approach for analysis of scRNA-seq datasets.
UR - http://www.scopus.com/inward/record.url?scp=85054772412&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0205499
DO - 10.1371/journal.pone.0205499
M3 - Article
AN - SCOPUS:85054772412
SN - 1932-6203
VL - 13
JO - PLoS ONE
JF - PLoS ONE
IS - 10
M1 - e0205499
ER -