TY - JOUR
T1 - Cell Broadband Engine processor performance optimization
T2 - Tracing tools implementation and use
AU - Biberstein, Marina
AU - Dori-Hacohen, Shiri
AU - Harel, Yuval
AU - Heilper, Andrei
AU - Mendelson, Bilha
AU - Shvadron, Uzi
AU - Treister, Eran
AU - Turek, Javier
AU - Chang, Moon S.
PY - 2009/1/1
Y1 - 2009/1/1
N2 - Optimizing performance on multicore processors is a daunting task M. S. Chang because of the increased importance of such factors as thread communication, memory contention, and memory access latency. This paper presents two tools that programmers and performance analysts can use to understand application performance on the Cell Broadband Engine® (Cell/B.E.) processor: the Performance Debugging Tool (PDT) and the Trace Analyzer (TA). PDT traces user-space events, augmenting them with scheduling data from the operating system; those traces are then read, analyzed, and presented visually by the TA. This paper describes the implementation issues arising from the fact that a common lowoverhead clock shared by all cores, essential for analysis and visualization, is not available on the Cell/B.E. processor. The TA employs an offline analysis to align the collected events to a common time based only on thread-local timestamps, event order, and context switch information. We also discuss the overhead of tracing and its impact on execution and performance analysis. We illustrate the use of the PDT and TA by analyzing several significant Cell/B.E. processor workloads, including native code and higher-level abstractions offered by the Data Communication and Synchronization services. We show how trace analysis can help identify performance issues in these workloads and how it can be used by programmers to spot performance antipatterns (common programming practices leading to suboptimal performance).
AB - Optimizing performance on multicore processors is a daunting task M. S. Chang because of the increased importance of such factors as thread communication, memory contention, and memory access latency. This paper presents two tools that programmers and performance analysts can use to understand application performance on the Cell Broadband Engine® (Cell/B.E.) processor: the Performance Debugging Tool (PDT) and the Trace Analyzer (TA). PDT traces user-space events, augmenting them with scheduling data from the operating system; those traces are then read, analyzed, and presented visually by the TA. This paper describes the implementation issues arising from the fact that a common lowoverhead clock shared by all cores, essential for analysis and visualization, is not available on the Cell/B.E. processor. The TA employs an offline analysis to align the collected events to a common time based only on thread-local timestamps, event order, and context switch information. We also discuss the overhead of tracing and its impact on execution and performance analysis. We illustrate the use of the PDT and TA by analyzing several significant Cell/B.E. processor workloads, including native code and higher-level abstractions offered by the Data Communication and Synchronization services. We show how trace analysis can help identify performance issues in these workloads and how it can be used by programmers to spot performance antipatterns (common programming practices leading to suboptimal performance).
UR - http://www.scopus.com/inward/record.url?scp=77955075803&partnerID=8YFLogxK
U2 - 10.1147/JRD.2009.5429073
DO - 10.1147/JRD.2009.5429073
M3 - Article
AN - SCOPUS:77955075803
SN - 0018-8646
VL - 53
JO - IBM Journal of Research and Development
JF - IBM Journal of Research and Development
IS - 5
ER -