Enhancing dependence-based prefetching for better timeliness, coverage, and practicality

Show full item record

Title: Enhancing dependence-based prefetching for better timeliness, coverage, and practicality
Author: Lim, Chungsoo
Advisors: Eric Rotenberg, Committee Member
Vincent W. Freeh, Committee Member
Gregory T. Byrd, Committee Chair
Yan Solihin, Committee Member
Abstract: This dissertation proposes an architecture that efficiently prefetches for loads whose effective addresses are dependent on previously-loaded values (dependence-based prefetching). For timely prefetches, the memory access patterns of producing loads are dynamically learned. These patterns (such as strides) are used to prefetch well ahead of the consumer load. Different prefetching algorithms are used for different patterns, and different algorithms are combined on top of dependence-based prefetching scheme. The proposed prefetcher is placed near the processor core and targets L1 cache misses, because removing L1 cache misses has greater performance potential than removing L2 cache misses. For higher coverage, dependence-based prefetching is extended by augmenting the dependence relation identification mechanism, to include not only direct relations (y = x) but also linear relations (y = ax + b) between producer (x) and consumer (y) loads. With these additional relations, higher performance, measured in instructions per cycle (IPC), can be obtained. We also show that the space overhead for storing the patterns can be reduced by leveraging chain prefetching and focusing on frequently missed loads. We specifically examine how to capture pointers in linked data structures (LDS) with pure hardware implementation. We find that the space requirement can be reduced, compared to previous work, if we selectively record patterns. Still, to make the prefetching scheme generally applicable, a large table is required for storing pointers. So we take one step further in order to eliminate the additional storage need for pointers. We propose a mechanism that utilizes a portion of the L2 cache for storing the pointers. With this mechanism, impractically huge on-chip storage for pointers, which is sometimes a total waste of silicon, can be removed. We show that storing the prefetch table in a partition of the L2 cache outperforms using the L2 cache conventionally for benchmarks that benefit from prefetching.
Date: 2008-12-22
Degree: PhD
Discipline: Computer Engineering
URI: http://www.lib.ncsu.edu/resolver/1840.16/3687

Files in this item

Files Size Format View
etd.pdf 3.307Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record