Dynamic Page Migration on ccNUMA Platforms Guided by Hardware Tracing
No Thumbnail Available
Files
Date
2008-08-14
Authors
Journal Title
Series/Report No.
Journal ISSN
Volume Title
Publisher
Abstract
Non-uniform memory architectures with cache coherence (ccNUMA) are becoming increasingly common, not just for large-scale high performance platforms but also in the context of multi-cores architectures. Under ccNUMA, data placement may influence overall application performance significantly as references resolved locally to a processor⁄core impose lower latencies than remote ones.
This work develops a novel hardware-assisted dynamic page migration scheme based on automated tracing of the memory references made by application threads. The developed framework leverages the performance monitoring capabilities of contemporary x86 microprocessors to efficiently extract an approximate trace of memory accesses. This information along with multi-level hop latencies are used to decide page affinity, i.e., the node to which a page is bound. After determining affinities, page migration is initiated using Linux kernel mechanisms. All this automation is done in user space and transparent to the main application.
Experiments show that this method, although based on lossy tracing and system configuration limitation on trace hardware, can efficiently and effectively improve local data availability at run time, leading to an average wall-clock execution time saving of over 14% on AMD Opterons with a 1.3x⁄1.6x access penalty to non-local memory with a very minimal page migration overhead due to the advances in modern memory interconnect technologies. To the best of our knowledge, this is a first experimental study on a popular platform, a combination of x86 processors and Linux operating system.
Description
Keywords
Dynamic Page Migration, PMU, PEBS, ccNUMA, ISA, Perfmon2, Microarchitecture
Citation
Degree
MS
Discipline
Computer Science