Analyzing Memory Performance Bottlenecks in OpenMP Programs on SMP Architectures using ccSIM

dc.contributor.advisorDr. Frank Mueller, Committee Chairen_US
dc.contributor.advisorDr. Gregory Byrd, Committee Memberen_US
dc.contributor.advisorDr. Purushothaman Iyer, Committee Memberen_US
dc.contributor.authorNagarajan, Anitaen_US
dc.date.accessioned2010-04-02T18:00:13Z
dc.date.available2010-04-02T18:00:13Z
dc.date.issued2003-08-14en_US
dc.degree.disciplineComputer Scienceen_US
dc.degree.levelthesisen_US
dc.degree.nameMSen_US
dc.description.abstractAs computing demands increase, performance analysis of application behavior has become a widely researched topic. In order to obtain optimal application performance, an understanding of the interaction between hardware and software is essential. Program performance is quantified in terms of various metrics, and it is important to obtain detailed information in order to determine potential bottlenecks during execution. Upon isolation of the exact causes of performance problems, optimizations to overcome them can be proposed. In SMP systems, sharing of data could result in increased program latency due to the requirement of maintaining memory coherence. The main contribution of this thesis is ccSIM, a cache-coherent multilevel memory hierarchy simulator for shared memory multiprocessor systems, fed by traces obtained through on-the-fly dynamic binary rewriting of OpenMP programs. Interleaved parallel trace execution is simulated for the different processors and results are studied for several OpenMP benchmarks. The coherence-related metrics obtained from ccSIM are validated against hardware performance counters to verify simulation accuracy. Cumulative as well as per-reference statistics are provided, which help in a detailed analysis of performance and in isolating bottlenecks in the memory hierarchy. Results obtained for coherence events from the simulations indicate a good match with hardware counters for a Power3 SMP node. The exact locations of invalidations in source code and coherence misses caused by these invalidations are derived. This information, together with the classification of invalidates, helps in proposing optimization techniques or code transformations that could potentially yield better performance for a particular application on the architecture of interest.en_US
dc.identifier.otheretd-08052003-180232en_US
dc.identifier.urihttp://www.lib.ncsu.edu/resolver/1840.16/1039
dc.rightsI hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to NC State University or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.en_US
dc.subjectshared memory multiprocessorsen_US
dc.subjectOpenMPen_US
dc.subjectcache coherenceen_US
dc.titleAnalyzing Memory Performance Bottlenecks in OpenMP Programs on SMP Architectures using ccSIMen_US

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
etd.pdf
Size:
796.73 KB
Format:
Adobe Portable Document Format

Collections