Abstract
This paper describes the impact of dataset characteristics on the results of Arabic document classification algorithms using TF-IDF representations. The experiments compared different stemmers, different categories and different training set sizes,
and found that different dataset characteristics produced widely differing results, in one case attaining a remarkable 99% recall (accuracy). The use of a standard dataset would eliminate this variability and enable researchers to gain comparable knowledge from the published results
and found that different dataset characteristics produced widely differing results, in one case attaining a remarkable 99% recall (accuracy). The use of a standard dataset would eliminate this variability and enable researchers to gain comparable knowledge from the published results
Original language | English |
---|---|
Pages (from-to) | 31-38 |
Number of pages | 8 |
Journal | International Journal of Computer Applications |
Volume | 101 |
Issue number | 7 |
DOIs | |
State | Published - 18 Sep 2014 |