Software Thread Integration for Instruction Level Parallelism

Show simple item record

dc.contributor.advisor Dr. Eric Rotenberg, Committee Member en_US
dc.contributor.advisor Dr. Thomas M. Conte, Committee Member en_US
dc.contributor.advisor Dr. Vincent W. Freeh, Committee Member en_US
dc.contributor.advisor Dr. Alexander G. Dean, Committee Chair en_US
dc.contributor.author So, Won en_US
dc.date.accessioned 2010-04-02T19:22:46Z
dc.date.available 2010-04-02T19:22:46Z
dc.date.issued 2007-07-05 en_US
dc.identifier.other etd-03182007-151015 en_US
dc.identifier.uri http://www.lib.ncsu.edu/resolver/1840.16/5930
dc.description.abstract Multimedia applications require a significantly higher level of performance than previous workloads of embedded systems. They have driven digital signal processor (DSP) makers to adopt high-performance architectures like VLIW (Very-Long Instruction Word) or EPIC (Explicitly Parallel Instruction Computing). Despite many efforts to exploit instruction-level parallelism (ILP) in the application, the speed is a fraction of what it could be, limited by the difficulty of finding enough independent instructions to keep all of the processor's functional units busy. This dissertation proposes Software Thread Integration (STI) for Instruction Level Parallelism. STI is a software technique for interleaving multiple threads of control into a single implicitly multithreaded one. We use STI to improve the performance on ILP processors by merging parallel procedures into one, increasing the compiler's scope and hence allowing it to create a more efficient instruction schedule. STI is essentially procedure jamming with intraprocedural code motion transformations which allow arbitrary alignment of instructions or code regions. This alignment enables code to be moved to use available execution resources better and improve the execution schedule. Parallel procedures are identified by the programmer with either annotations in conventional procedural languages or graph analysis for stream coarse-grain dataflow programming languages. We use the method of procedure cloning and integration for improving program run-time performance by integrating parallel procedures via STI. This defines a new way of converting parallelism at the thread level to the instruction level. With filter integration we apply STI for streaming applications, exploiting explicit coarse-grain dataflow information expressed by stream programming languages. During integration of threads, various STI code transformations are applied in order to maximize the ILP and reconcile control flow differences between two threads. Different transformations are selectively applied according to the control structure and the ILP characteristics of the code, driven by interactions with software pipelining. This approach effectively combines ILP-improving code transformations with instruction scheduling techniques so that they complement each other. Code transformations involve code motion as well as loop transformations such as loop jamming, unrolling, splitting, and peeling. We propose a methodology for efficiently finding the best integration scenario among all possibilities. We quantitatively estimate the performance impact of integration, allowing various integration scenarios to be compared and ranked via profitability analysis. The estimated profitability is verified and corrected by an iterative compilation approach, compensating for possible estimation inaccuracy. Our modeling methods combined with limited compilation quickly find the best integration scenario without requiring exhaustive integration. The proposed methods are automated by the STI for ILP Tool Chain targeting Texas Instrument C6x VLIW DSPs. This work contributes to the definition of an alternative development path for DSP applications. We seek to provide efficient compilation of C or C-like languages with a small amount of additional high-level dataflow information targeting popular and practical VLIW DSP platforms, reducing the need for extensive manual C and assembly code optimization and tuning. en_US
dc.rights I hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dis sertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to NC State University or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report. en_US
dc.subject StreamIt en_US
dc.subject VLIW en_US
dc.subject DSP en_US
dc.subject digital signal processor en_US
dc.subject stream programming en_US
dc.subject Itanium en_US
dc.subject software thread integration en_US
dc.subject very long instruction word en_US
dc.subject TI C6000 en_US
dc.subject instruction level parallelism en_US
dc.subject thread level parallelism en_US
dc.title Software Thread Integration for Instruction Level Parallelism en_US
dc.degree.name PhD en_US
dc.degree.level dissertation en_US
dc.degree.discipline Computer Engineering en_US


Files in this item

Files Size Format View
etd.pdf 1.514Mb PDF View/Open

This item appears in the following Collection(s)

Show simple item record