On the Impact of Dataset Characteristics on Arabic Document Classification

Diab Abuaiadah, Jihad El Sana, Walid Abusalah

Research output: Contribution to journalArticlepeer-review

Abstract

This paper describes the impact of dataset characteristics on the results of Arabic document classification algorithms using TF-IDF representations. The experiments compared different stemmers, different categories and different training set sizes,
and found that different dataset characteristics produced widely differing results, in one case attaining a remarkable 99% recall (accuracy). The use of a standard dataset would eliminate this variability and enable researchers to gain comparable knowledge from the published results
Original languageEnglish
Pages (from-to)31-38
Number of pages8
JournalInternational Journal of Computer Applications
Volume101
Issue number7
DOIs
StatePublished - 18 Sep 2014

Fingerprint

Dive into the research topics of 'On the Impact of Dataset Characteristics on Arabic Document Classification'. Together they form a unique fingerprint.

Cite this