Abstract
potential computer system users or buyers usually employ a computer performance evaluation technique only if they believe its results provide valuable information. System Performance
Evaluation Cooperative (SPEC) measures are perceived to provide such information and are therefore the ones most commonly used. SPEC measures are designed to evaluate the performance of engineering and scientific workstations, personal vector computers, and even minicomputers and superminicomputers. Along with the Transaction Processing Council (TPC) measures for database 1/0 performance, they have become de facto industry standards. However, do SPEC'S evaluation outcomes actually provide added information value? In this article, we examine these measures by considering their structure, advantages, and disadvantages. We use two criteria in our examination: Are the programs used in the SPEC suite properly blended to reflect
a representative mix of different applications? And are they properly synthesized so that the aggregate measures correctly rank computers by performance? Our analysis has determined the following: Many programs in the SPEC suites are superfluous; the benchmark size can be reduced by more than 50 percent. The way the measure is calculated may cause distortion. Substituting the harmonic mean for the geometric mean used by SPEC roughly preserves the measure, while giving better consistency. SPEC measures reflect the performance of the CPU rather than the entire system. Therefore, they might be inaccurate in ranking an entire system. To remedy these problems, we propose a revised methodology for obtaining SPEC measures.
Evaluation Cooperative (SPEC) measures are perceived to provide such information and are therefore the ones most commonly used. SPEC measures are designed to evaluate the performance of engineering and scientific workstations, personal vector computers, and even minicomputers and superminicomputers. Along with the Transaction Processing Council (TPC) measures for database 1/0 performance, they have become de facto industry standards. However, do SPEC'S evaluation outcomes actually provide added information value? In this article, we examine these measures by considering their structure, advantages, and disadvantages. We use two criteria in our examination: Are the programs used in the SPEC suite properly blended to reflect
a representative mix of different applications? And are they properly synthesized so that the aggregate measures correctly rank computers by performance? Our analysis has determined the following: Many programs in the SPEC suites are superfluous; the benchmark size can be reduced by more than 50 percent. The way the measure is calculated may cause distortion. Substituting the harmonic mean for the geometric mean used by SPEC roughly preserves the measure, while giving better consistency. SPEC measures reflect the performance of the CPU rather than the entire system. Therefore, they might be inaccurate in ranking an entire system. To remedy these problems, we propose a revised methodology for obtaining SPEC measures.
Original language | English |
---|---|
Pages | 33-42 |
Number of pages | 10 |
Volume | 28 |
No | 8 |
Specialist publication | Computer |
DOIs | |
State | Published - 1 Jan 1995 |
ASJC Scopus subject areas
- General Computer Science