TY - GEN
T1 - MetaTPOT
T2 - 29th ACM International Conference on Information and Knowledge Management, CIKM 2020
AU - Laadan, Doron
AU - Vainshtein, Roman
AU - Curiel, Yarden
AU - Katz, Gilad
AU - Rokach, Lior
N1 - Funding Information:
This work was supported by the Defense Advanced Research Projects Agency (DARPA) Data-Driven Discovery of Models (D3M) Program.
Publisher Copyright:
© 2020 ACM.
PY - 2020/10/19
Y1 - 2020/10/19
N2 - Automatic machine learning (AutoML) aims to automate the different aspects of the data science process and, by extension, allow non-experts to utilize "off the shelf" machine learning solution. One of the more popular AutoML methods is the Tree-based Pipeline Optimization Tool (TPOT), which uses genetic programming (GP) to efficiently explore the vast space of ML pipelines and produce a working ML solution. However, TPOT's GP process comes with substantial time and computational costs. In this study, we explore TPOT's GP process and propose MetaTPOT, an enhanced variant that uses a meta learning-based approach to predict the performance of TPOT's pipeline candidates. MetaTPOT leverages domain knowledge in the form of pipelines pre-ranking to improve TPOT's speed and performance. Evaluation on 65 classification datasets shows that our approach often improves the outcome of the genetic process while simultaneously substantially reduce its running time and computational cost.
AB - Automatic machine learning (AutoML) aims to automate the different aspects of the data science process and, by extension, allow non-experts to utilize "off the shelf" machine learning solution. One of the more popular AutoML methods is the Tree-based Pipeline Optimization Tool (TPOT), which uses genetic programming (GP) to efficiently explore the vast space of ML pipelines and produce a working ML solution. However, TPOT's GP process comes with substantial time and computational costs. In this study, we explore TPOT's GP process and propose MetaTPOT, an enhanced variant that uses a meta learning-based approach to predict the performance of TPOT's pipeline candidates. MetaTPOT leverages domain knowledge in the form of pipelines pre-ranking to improve TPOT's speed and performance. Evaluation on 65 classification datasets shows that our approach often improves the outcome of the genetic process while simultaneously substantially reduce its running time and computational cost.
KW - automl
KW - genetic programming(gp)
KW - meta-learning
KW - tpot
UR - http://www.scopus.com/inward/record.url?scp=85095864416&partnerID=8YFLogxK
U2 - 10.1145/3340531.3412147
DO - 10.1145/3340531.3412147
M3 - Conference contribution
AN - SCOPUS:85095864416
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 2097
EP - 2100
BT - CIKM 2020 - Proceedings of the 29th ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
Y2 - 19 October 2020 through 23 October 2020
ER -