NCSU Institutional Repository >
NC State Theses and Dissertations >
Please use this identifier to cite or link to this item:
|Title: ||Enhancing dependence-based prefetching for better timeliness, coverage, and practicality|
|Authors: ||Lim, Chungsoo|
|Advisors: ||Eric Rotenberg, Committee Member|
Vincent W. Freeh, Committee Member
Gregory T. Byrd, Committee Chair
Yan Solihin, Committee Member
|Keywords: ||memory wall|
|Issue Date: ||22-Dec-2008|
|Discipline: ||Computer Engineering|
|Abstract: ||This dissertation proposes an architecture that efficiently prefetches for loads whose effective addresses are dependent on previously-loaded values (dependence-based prefetching). For timely prefetches, the memory access patterns of producing loads are dynamically learned. These patterns (such as strides) are used to prefetch well ahead of the consumer load. Different prefetching algorithms are used for different patterns, and different algorithms are combined on top of dependence-based prefetching scheme. The proposed prefetcher is placed near the processor core and targets L1 cache misses, because removing L1 cache misses has greater performance potential than removing L2 cache misses.
For higher coverage, dependence-based prefetching is extended by augmenting the dependence relation identification mechanism, to include not only direct relations (y = x) but also linear relations (y = ax + b) between producer (x) and consumer (y) loads. With these additional relations, higher performance, measured in instructions per cycle (IPC), can be obtained.
We also show that the space overhead for storing the patterns can be reduced by leveraging chain prefetching and focusing on frequently missed loads. We specifically examine how to capture pointers in linked data structures (LDS) with pure hardware implementation. We find that the space requirement can be reduced, compared to previous work, if we selectively record patterns. Still, to make the prefetching scheme generally applicable, a large table is required for storing pointers. So we take one step further in order to eliminate the additional storage need for pointers. We propose a mechanism that utilizes a portion of the L2 cache for storing the pointers. With this mechanism, impractically huge on-chip storage for pointers, which is sometimes a total waste of silicon, can be removed. We show that storing the prefetch table in a partition of the L2 cache outperforms using the L2 cache conventionally for benchmarks that benefit from prefetching.|
|Appears in Collections:||Dissertations|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.