Software Thread Integration for Instruction Level Parallelism

dc.contributor.advisorDr. Eric Rotenberg, Committee Memberen_US
dc.contributor.advisorDr. Thomas M. Conte, Committee Memberen_US
dc.contributor.advisorDr. Vincent W. Freeh, Committee Memberen_US
dc.contributor.advisorDr. Alexander G. Dean, Committee Chairen_US
dc.contributor.authorSo, Wonen_US
dc.date.accessioned2010-04-02T19:22:46Z
dc.date.available2010-04-02T19:22:46Z
dc.date.issued2007-07-05en_US
dc.degree.disciplineComputer Engineeringen_US
dc.degree.leveldissertationen_US
dc.degree.namePhDen_US
dc.description.abstractMultimedia applications require a significantly higher level of performance than previous workloads of embedded systems. They have driven digital signal processor (DSP) makers to adopt high-performance architectures like VLIW (Very-Long Instruction Word) or EPIC (Explicitly Parallel Instruction Computing). Despite many efforts to exploit instruction-level parallelism (ILP) in the application, the speed is a fraction of what it could be, limited by the difficulty of finding enough independent instructions to keep all of the processor's functional units busy. This dissertation proposes Software Thread Integration (STI) for Instruction Level Parallelism. STI is a software technique for interleaving multiple threads of control into a single implicitly multithreaded one. We use STI to improve the performance on ILP processors by merging parallel procedures into one, increasing the compiler's scope and hence allowing it to create a more efficient instruction schedule. STI is essentially procedure jamming with intraprocedural code motion transformations which allow arbitrary alignment of instructions or code regions. This alignment enables code to be moved to use available execution resources better and improve the execution schedule. Parallel procedures are identified by the programmer with either annotations in conventional procedural languages or graph analysis for stream coarse-grain dataflow programming languages. We use the method of procedure cloning and integration for improving program run-time performance by integrating parallel procedures via STI. This defines a new way of converting parallelism at the thread level to the instruction level. With filter integration we apply STI for streaming applications, exploiting explicit coarse-grain dataflow information expressed by stream programming languages. During integration of threads, various STI code transformations are applied in order to maximize the ILP and reconcile control flow differences between two threads. Different transformations are selectively applied according to the control structure and the ILP characteristics of the code, driven by interactions with software pipelining. This approach effectively combines ILP-improving code transformations with instruction scheduling techniques so that they complement each other. Code transformations involve code motion as well as loop transformations such as loop jamming, unrolling, splitting, and peeling. We propose a methodology for efficiently finding the best integration scenario among all possibilities. We quantitatively estimate the performance impact of integration, allowing various integration scenarios to be compared and ranked via profitability analysis. The estimated profitability is verified and corrected by an iterative compilation approach, compensating for possible estimation inaccuracy. Our modeling methods combined with limited compilation quickly find the best integration scenario without requiring exhaustive integration. The proposed methods are automated by the STI for ILP Tool Chain targeting Texas Instrument C6x VLIW DSPs. This work contributes to the definition of an alternative development path for DSP applications. We seek to provide efficient compilation of C or C-like languages with a small amount of additional high-level dataflow information targeting popular and practical VLIW DSP platforms, reducing the need for extensive manual C and assembly code optimization and tuning.en_US
dc.identifier.otheretd-03182007-151015en_US
dc.identifier.urihttp://www.lib.ncsu.edu/resolver/1840.16/5930
dc.rightsI hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dis sertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to NC State University or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.en_US
dc.subjectStreamIten_US
dc.subjectVLIWen_US
dc.subjectDSPen_US
dc.subjectdigital signal processoren_US
dc.subjectstream programmingen_US
dc.subjectItaniumen_US
dc.subjectsoftware thread integrationen_US
dc.subjectvery long instruction worden_US
dc.subjectTI C6000en_US
dc.subjectinstruction level parallelismen_US
dc.subjectthread level parallelismen_US
dc.titleSoftware Thread Integration for Instruction Level Parallelismen_US

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
etd.pdf
Size:
1.51 MB
Format:
Adobe Portable Document Format

Collections