Trace Based Performance Characterization and Optimization

Show full item record

Title: Trace Based Performance Characterization and Optimization
Author: Marathe, Jaydeep Prakash
Advisors: Vincent Freeh, Committee Member
Yan Solihin, Committee Member
Tao Xie, Committee Member
Frank Mueller, Committee Chair
Abstract: Processor speeds have increased dramatically in the recent past, but improvement in memory access latencies has not kept pace. As a result, programs that do not make efficient use of the processor caches tend to become increasing memory-bound and do not experience speedups with increasing processor frequency. In this thesis, we present tools to characterize and optimize the memory access patterns of software programs. Our tools use the program's memory access trace as a primary input for analysis. Our efforts encompass two broad areas --- performance analysis and performance optimization. With performance analysis, our focus is on automating the analysis process as far as possible and on presenting the user with a rich set of metrics, both for single-threaded and multi-threaded programs. With performance optimization, we go one step further and perform automatic transformations based on observed program behavior. We make the following contributions in this thesis. First, we explore different tracing strategies --- software tracing with dynamic binary instrumentation, hardware-based tracing exploiting support found in contemporary microprocessors and a hybrid scheme that leverages hardware support with certain software modifications. Second, we present a range of performance analysis and optimization tools based on these trace inputs and additional auxiliary instrumentation. Our first tool, METRIC, characterizes the memory performance of single-threaded programs. Our second tool, ccSIM extends METRIC to characterize the coherence behavior of multithreaded OpenMP benchmarks. Our third tool extends ccSIM to work with hardware-generated and hybrid trace inputs. These three tools represent our performance analysis efforts. We also explore automated performance optimization with our remaining tools. Our fourth tool uses hardware-generated traces for automatic page placement in cache coherent non-uniform memory architectures (ccNUMA). Finally, our fifth tool explores a novel trace-driven instruction-level software data prefetching strategy. Overall, we demonstrate that memory traces represent a rich source of information about a program's behavior and can be effectively used for a wide range of performance analysis and optimization strategies.
Date: 2007-06-20
Degree: PhD
Discipline: Computer Science
URI: http://www.lib.ncsu.edu/resolver/1840.16/5777


Files in this item

Files Size Format View
etd.pdf 1.295Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record