On the classification of a small imbalanced cytogenetic image database

Boaz Lerner, Josepha Yeshaya, Lev Koushnir

Research output: Contribution to journalArticlepeer-review

22 Scopus citations

Abstract

Solving a multiclass classification task using a small imbalanced database of patterns of high dimension is difficult due to the curse-of-dimensionality and the bias of the training toward the majority classes. Such a problem has arisen while diagnosing genetic abnormalities by classifying a small database of fluorescence in situ hybridization signals of types having different frequencies of occurrence. We propose and experimentally study using the cytogenetic domain two solutions to the problem. The first is hierarchical decomposition of the classification task, where each hierarchy level is designed to tackle a simpler problem which is represented by classes that are approximately balanced. The second solution is balancing the data by up-sampling the minority classes accompanied by dimensionality reduction. Implemented by the naive Bayesian classifier or the multilayer perceptron neural network, both solutions have diminished the problem and contributed to accuracy improvement. In addition, the experiments suggest that coping with the smallness of the data is more beneficial than dealing with its imbalance.

Original languageEnglish
Pages (from-to)204-215
Number of pages12
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume4
Issue number2
DOIs
StatePublished - 1 Apr 2007

Keywords

  • Classification
  • Dimensionality reduction
  • Genetic diagnosis
  • Imbalanced data
  • Multilayer perceptron (MLP)
  • Naive Bayesian classifier (NBC)
  • Small sample size

ASJC Scopus subject areas

  • Biotechnology
  • Genetics
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'On the classification of a small imbalanced cytogenetic image database'. Together they form a unique fingerprint.

Cite this