TY - GEN
T1 - A New Approach for Tuned Clustering Analysis
AU - Ben Ishay, Roni
AU - Herman, Maya
AU - Yosefi, Chaim
N1 - Publisher Copyright:
© 2018, Springer International Publishing AG, part of Springer Nature.
PY - 2018/1/1
Y1 - 2018/1/1
N2 - In this work, we present a new data mining (DM) approach (called tuned clustering analysis), which integrates clustering, and tuned clustering analysis. Usually, clusters which contain borderline results may be dismissed or ignored during the analysis stage. As a result, hidden insights that may be represented by these clusters, may not be revealed. This may harm the overall DM quality and especially, important hidden insights may be uncovered. Our new approach offers an iterative process which assist the data miner to make appropriate analysis decisions, and avoid dismissing possible insights. The idea is to apply an iterative DM process: clustering, analyzing, presenting new insights, or tuning and re-clustering those clusters which have borderline values. Clusters with borderline values are chosen and a new sub-database is built. Then, the sub-database is split, based on the attribute with the highest Entropy value. The tuning iterations, continues until new insights were found, or if the clusters quality are below a certain threshold. We demonstrated the tuned clustering analysis on real Echo heart measurements, using km-Impute clustering algorithm. During the implementation, initial clusters were produced. Although the quality of the clusters was high, no new medical insights were revealed. Therefore, we applied a clustering tuning and succeeded in finding new medical insights such as the influence of gender and the age on cardiac functioning and clinical modifications, with regard to resilience to diastolic disorder. Applying our approach has successfully managed to reveal new medical insights which were restored from borderline value clusters. This stands in contrast to traditional analysis methods, in which these potential insights may be missed or ignored.
AB - In this work, we present a new data mining (DM) approach (called tuned clustering analysis), which integrates clustering, and tuned clustering analysis. Usually, clusters which contain borderline results may be dismissed or ignored during the analysis stage. As a result, hidden insights that may be represented by these clusters, may not be revealed. This may harm the overall DM quality and especially, important hidden insights may be uncovered. Our new approach offers an iterative process which assist the data miner to make appropriate analysis decisions, and avoid dismissing possible insights. The idea is to apply an iterative DM process: clustering, analyzing, presenting new insights, or tuning and re-clustering those clusters which have borderline values. Clusters with borderline values are chosen and a new sub-database is built. Then, the sub-database is split, based on the attribute with the highest Entropy value. The tuning iterations, continues until new insights were found, or if the clusters quality are below a certain threshold. We demonstrated the tuned clustering analysis on real Echo heart measurements, using km-Impute clustering algorithm. During the implementation, initial clusters were produced. Although the quality of the clusters was high, no new medical insights were revealed. Therefore, we applied a clustering tuning and succeeded in finding new medical insights such as the influence of gender and the age on cardiac functioning and clinical modifications, with regard to resilience to diastolic disorder. Applying our approach has successfully managed to reveal new medical insights which were restored from borderline value clusters. This stands in contrast to traditional analysis methods, in which these potential insights may be missed or ignored.
KW - Clustering
KW - Clustering analysis
KW - Data mining
KW - Imputation
KW - Medical data mining
KW - Missing values
UR - http://www.scopus.com/inward/record.url?scp=85050531683&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-96136-1_34
DO - 10.1007/978-3-319-96136-1_34
M3 - Conference contribution
AN - SCOPUS:85050531683
SN - 9783319961354
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 436
EP - 452
BT - Machine Learning and Data Mining in Pattern Recognition - 14th International Conference, MLDM 2018, Proceedings
A2 - Perner, Petra
PB - Springer Verlag
T2 - 14th International Conference on Machine Learning and Data Mining in Pattern Recognition, MLDM 2018
Y2 - 15 July 2018 through 19 July 2018
ER -