Trace Based Performance Characterization and Optimization

No Thumbnail Available

Date

2007-06-20

Journal Title

Series/Report No.

Journal ISSN

Volume Title

Publisher

Abstract

Processor speeds have increased dramatically in the recent past, but improvement in memory access latencies has not kept pace. As a result, programs that do not make efficient use of the processor caches tend to become increasing memory-bound and do not experience speedups with increasing processor frequency. In this thesis, we present tools to characterize and optimize the memory access patterns of software programs. Our tools use the program's memory access trace as a primary input for analysis. Our efforts encompass two broad areas --- performance analysis and performance optimization. With performance analysis, our focus is on automating the analysis process as far as possible and on presenting the user with a rich set of metrics, both for single-threaded and multi-threaded programs. With performance optimization, we go one step further and perform automatic transformations based on observed program behavior. We make the following contributions in this thesis. First, we explore different tracing strategies --- software tracing with dynamic binary instrumentation, hardware-based tracing exploiting support found in contemporary microprocessors and a hybrid scheme that leverages hardware support with certain software modifications. Second, we present a range of performance analysis and optimization tools based on these trace inputs and additional auxiliary instrumentation. Our first tool, METRIC, characterizes the memory performance of single-threaded programs. Our second tool, ccSIM extends METRIC to characterize the coherence behavior of multithreaded OpenMP benchmarks. Our third tool extends ccSIM to work with hardware-generated and hybrid trace inputs. These three tools represent our performance analysis efforts. We also explore automated performance optimization with our remaining tools. Our fourth tool uses hardware-generated traces for automatic page placement in cache coherent non-uniform memory architectures (ccNUMA). Finally, our fifth tool explores a novel trace-driven instruction-level software data prefetching strategy. Overall, we demonstrate that memory traces represent a rich source of information about a program's behavior and can be effectively used for a wide range of performance analysis and optimization strategies.

Description

Keywords

performance characerization, performance optimization, hardware tracing, binary rewriting, memory traces, itanium, performance monitoring unit, prefetching, ccNUMA, page placement, coherence

Citation

Degree

PhD

Discipline

Computer Science

Collections