TY - JOUR
T1 - Bayesian fluorescence in situ hybridisation signal classification
AU - Lerner, Boaz
N1 - Funding Information:
This work was supported in part by the Paul Ivanier Center for Robotics and Production Management, Ben-Gurion University, Beer-Sheva, Israel. The author thanks Roy Malka for his assistance during the preparation of this manuscript and to Seema Dhanjal for FISH image acquisition and labelling.
PY - 2004/3/1
Y1 - 2004/3/1
N2 - Previous research has indicated the significance of accurate classification of fluorescence in situ hybridisation (FISH) signals for the detection of genetic abnormalities. Based on well-discriminating features and a trainable neural network (NN) classifier, a previous system enabled highly-accurate classification of valid signals and artefacts of two fluorophores. However, since this system employed several features that are considered independent, the naive Bayesian classifier (NBC) is suggested here as an alternative to the NN. The NBC independence assumption permits the decomposition of the high-dimensional likelihood of the model for the data into a product of one-dimensional probability densities. The naive independence assumption together with the Bayesian methodology allow the NBC to predict a posteriori probabilities of class membership using estimated class-conditional densities in a close and simple form. Since the probability densities are the only parameters of the NBC, the misclassification rate of the model is determined exclusively by the quality of density estimation. Densities are evaluated by three methods: single Gaussian estimation (SGE; parametric method), Gaussian mixture model assuming spherical covariance matrices (GMM; semi-parametric method) and kernel density estimation (KDE; non-parametric method). For low-dimensional densities, the GMM generally outperforms the KDE that tends to overfit the training set at the cost of reduced generalisation capability. But, it is the GMM that loses some accuracy when modelling higher-dimensional densities due to the violation of the assumption of spherical covariance matrices when dependent features are added to the set. Compared with these two methods, the SGE and NN provide inferior and superior performance, respectively. However, the NBC avoids the intensive training and optimisation required for the NN, demanding extensive resources and experimentation. Therefore, when supporting these two classifiers, the system enables a trade-off between the NN performance and NBC simplicity of implementation.
AB - Previous research has indicated the significance of accurate classification of fluorescence in situ hybridisation (FISH) signals for the detection of genetic abnormalities. Based on well-discriminating features and a trainable neural network (NN) classifier, a previous system enabled highly-accurate classification of valid signals and artefacts of two fluorophores. However, since this system employed several features that are considered independent, the naive Bayesian classifier (NBC) is suggested here as an alternative to the NN. The NBC independence assumption permits the decomposition of the high-dimensional likelihood of the model for the data into a product of one-dimensional probability densities. The naive independence assumption together with the Bayesian methodology allow the NBC to predict a posteriori probabilities of class membership using estimated class-conditional densities in a close and simple form. Since the probability densities are the only parameters of the NBC, the misclassification rate of the model is determined exclusively by the quality of density estimation. Densities are evaluated by three methods: single Gaussian estimation (SGE; parametric method), Gaussian mixture model assuming spherical covariance matrices (GMM; semi-parametric method) and kernel density estimation (KDE; non-parametric method). For low-dimensional densities, the GMM generally outperforms the KDE that tends to overfit the training set at the cost of reduced generalisation capability. But, it is the GMM that loses some accuracy when modelling higher-dimensional densities due to the violation of the assumption of spherical covariance matrices when dependent features are added to the set. Compared with these two methods, the SGE and NN provide inferior and superior performance, respectively. However, the NBC avoids the intensive training and optimisation required for the NN, demanding extensive resources and experimentation. Therefore, when supporting these two classifiers, the system enables a trade-off between the NN performance and NBC simplicity of implementation.
KW - Bayesian network
KW - Density estimation
KW - Fluorescence in situ hybridisation (FISH)
KW - Gaussian mixture model
KW - Naive Bayesian classifier
KW - Signal classification
UR - http://www.scopus.com/inward/record.url?scp=1842714926&partnerID=8YFLogxK
U2 - 10.1016/j.artmed.2003.11.005
DO - 10.1016/j.artmed.2003.11.005
M3 - Article
C2 - 15081077
AN - SCOPUS:1842714926
SN - 0933-3657
VL - 30
SP - 301
EP - 316
JO - Artificial Intelligence in Medicine
JF - Artificial Intelligence in Medicine
IS - 3
ER -