Abstract
Background: Numerous publications attempt to predict cancer survival outcome from gene expression data using machine-learning methods. A direct comparison of these works is challenging for the following reasons: (1) inconsistent measures used to evaluate the performance of different models, and (2) incomplete specification of critical stages in the process of knowledge discovery. There is a need for a platform that would allow researchers to replicate previous works and to test the impact of changes in the knowledge discovery process on the accuracy of the induced models. Results: We developed the PCM-SABRE platform, which supports the entire knowledge discovery process for cancer outcome analysis. PCM-SABRE was developed using KNIME. By using PCM-SABRE to reproduce the results of previously published works on breast cancer survival, we define a baseline for evaluating future attempts to predict cancer outcome with machine learning. We used PCM-SABRE to replicate previous work that describe predictive models of breast cancer recurrence, and tested the performance of all possible combinations of feature selection methods and data mining algorithms that was used in either of the works. We reconstructed the work of Chou et al. observing similar trends - superior performance of Probabilistic Neural Network (PNN) and logistic regression (LR) algorithms and inconclusive impact of feature pre-selection with the decision tree algorithm on subsequent analysis. Conclusions: PCM-SABRE is a software tool that provides an intuitive environment for rapid development of predictive models in cancer precision medicine.
Original language | English |
---|---|
Article number | 40 |
Journal | BMC Bioinformatics |
Volume | 18 |
Issue number | 1 |
DOIs | |
State | Published - 17 Jan 2017 |
Keywords
- Breast cancer
- Data mining
- Reproducible research
ASJC Scopus subject areas
- Structural Biology
- Biochemistry
- Molecular Biology
- Computer Science Applications
- Applied Mathematics