Abstract
Ensemble learning – the application of multiple learning models on the same task – is a common technique in multiple domains. While employing multiple models enables reaching higher classification accuracy, this process can be time consuming, costly, and make scaling more difficult. Given that each model may have different capabilities and costs, assigning the most cost-effective set of learners for each sample is challenging. We propose SPIREL, a novel method for cost-effective classification. Our method enables users to directly associate costs to correct/incorrect label assignment, computing resources and run-time, and then dynamically establishes a classification policy. For each analyzed sample, SPIREL dynamically assigns a different set of learning models, as well as its own classification threshold. Extensive evaluation on two large malware datasets – a domain in which the application of multiple analysis tools is common – demonstrates that SPIREL is highly cost-effective, enabling us to reduce running time by ∼80% while decreasing the accuracy and F1-score by only 0.5%. We also show that our approach is both highly transferable across different datasets and adaptable to changes in individual learning model performance.
Original language | English |
---|---|
Pages (from-to) | 133-148 |
Number of pages | 16 |
Journal | Information Fusion |
Volume | 77 |
DOIs | |
State | Published - 1 Jan 2022 |
Keywords
- Android package
- Malware detection
- Portable executable
- Reinforcement learning
- Transfer learning
ASJC Scopus subject areas
- Software
- Signal Processing
- Information Systems
- Hardware and Architecture