TY - GEN
T1 - Comparison of three classifiers for breast cancer outcome prediction
AU - Eyal, Noa
AU - Last, Mark
AU - Rubin, Eitan
N1 - Publisher Copyright:
© 2015 ACM.
PY - 2015/1/19
Y1 - 2015/1/19
N2 - Predicting the outcome of cancer is a challenging task; researchers have an interest in trying to predict the relapse-free survival of breast cancer patients based on gene expression data. Data mining methods offer more advanced approaches for dealing with survival data. The main objective in cancer treatment is to improve overall survival or, at the very least, the time to relapse ("relapse-free survival"). In this work, we compare the performance of three popular interpretable classifiers (decision tree, probabilistic neural networks and Naïve Bayes) for the task of classifying breast cancer patients into recurrence risk groups (low or high risk of recurrence within 5 or 10 years). For the 5-year recurrence risk prediction, the highest prediction accuracy was reached by the probabilistic neural networks classifier (Acc = 76.88% ± 1.09%, AUC=77.41%). For the 10-year recurrence risk prediction, the decision tree classifier and the probabilistic neural networks presented similar prediction accuracies (70.40% ± 1.36% and 70.50% ± 1.13%, respectively). However, while the PNN classifier achieved this accuracy using only 10 features with the highest information gain, the decision tree classifier needed 100 features to achieve comparable accuracy and its AUC was significantly lower (66.4% vs. 77.1%).
AB - Predicting the outcome of cancer is a challenging task; researchers have an interest in trying to predict the relapse-free survival of breast cancer patients based on gene expression data. Data mining methods offer more advanced approaches for dealing with survival data. The main objective in cancer treatment is to improve overall survival or, at the very least, the time to relapse ("relapse-free survival"). In this work, we compare the performance of three popular interpretable classifiers (decision tree, probabilistic neural networks and Naïve Bayes) for the task of classifying breast cancer patients into recurrence risk groups (low or high risk of recurrence within 5 or 10 years). For the 5-year recurrence risk prediction, the highest prediction accuracy was reached by the probabilistic neural networks classifier (Acc = 76.88% ± 1.09%, AUC=77.41%). For the 10-year recurrence risk prediction, the decision tree classifier and the probabilistic neural networks presented similar prediction accuracies (70.40% ± 1.36% and 70.50% ± 1.13%, respectively). However, while the PNN classifier achieved this accuracy using only 10 features with the highest information gain, the decision tree classifier needed 100 features to achieve comparable accuracy and its AUC was significantly lower (66.4% vs. 77.1%).
KW - Breast cancer
KW - Decision tree
KW - Microarray
KW - Naïve Bayes
KW - Probabilistic neural network
KW - Survival analysis
UR - http://www.scopus.com/inward/record.url?scp=85071169288&partnerID=8YFLogxK
U2 - 10.1145/2797143.2797157
DO - 10.1145/2797143.2797157
M3 - Conference contribution
T3 - ACM International Conference Proceeding Series
SP - 13:1-13:6
BT - Proceedings of the 2nd Workshop on Cryptography and Security in Computing Systems, CS2 2015
PB - Association for Computing Machinery
T2 - 16th International Conference on Engineering Applications of Neural Networks, EANN 2015
Y2 - 25 September 2015 through 28 September 2015
ER -