Data mining in software metrics databases

Scott Dick, Aleksandra Meeks, Mark Last, Horst Bunke, Abraham Kandel

Research output: Contribution to journalArticlepeer-review

47 Scopus citations


We investigate the use of data mining for the analysis of software metric databases, and some of the issues in this application domain. Software metrics are collected at various phases of the software development process, in order to monitor and control the quality of a software product. However, software quality control is complicated by the complex relationship between these metrics and the attributes of a software development process. Data mining has been proposed as a potential technology for supporting and enhancing our understanding of software metrics and their relationship to software quality. In this paper, we use fuzzy clustering to investigate three datasets of software metrics, along with the larger issue of whether supervised or unsupervised learning is more appropriate for software engineering problems. While our findings generally confirm the known linear relationship between metrics and change rates, some interesting behaviors are noted. In addition, our results partly contradict earlier studies that only used correlation analysis to investigate these datasets. These results illustrate how intelligent technologies can augment traditional statistical inference in software quality control.

Original languageEnglish
Pages (from-to)81-110
Number of pages30
JournalFuzzy Sets and Systems
Issue number1
StatePublished - 1 Jul 2004


  • Artificial intelligence
  • Data mining
  • Fuzzy clustering
  • Machine learning
  • Software reliability
  • Software testing

ASJC Scopus subject areas

  • Logic
  • Artificial Intelligence


Dive into the research topics of 'Data mining in software metrics databases'. Together they form a unique fingerprint.

Cite this