Software Thread Integration for Instruction Level Parallelism
dc.contributor.advisor | Dr. Eric Rotenberg, Committee Member | en_US |
dc.contributor.advisor | Dr. Thomas M. Conte, Committee Member | en_US |
dc.contributor.advisor | Dr. Vincent W. Freeh, Committee Member | en_US |
dc.contributor.advisor | Dr. Alexander G. Dean, Committee Chair | en_US |
dc.contributor.author | So, Won | en_US |
dc.date.accessioned | 2010-04-02T19:22:46Z | |
dc.date.available | 2010-04-02T19:22:46Z | |
dc.date.issued | 2007-07-05 | en_US |
dc.degree.discipline | Computer Engineering | en_US |
dc.degree.level | dissertation | en_US |
dc.degree.name | PhD | en_US |
dc.description.abstract | Multimedia applications require a significantly higher level of performance than previous workloads of embedded systems. They have driven digital signal processor (DSP) makers to adopt high-performance architectures like VLIW (Very-Long Instruction Word) or EPIC (Explicitly Parallel Instruction Computing). Despite many efforts to exploit instruction-level parallelism (ILP) in the application, the speed is a fraction of what it could be, limited by the difficulty of finding enough independent instructions to keep all of the processor's functional units busy. This dissertation proposes Software Thread Integration (STI) for Instruction Level Parallelism. STI is a software technique for interleaving multiple threads of control into a single implicitly multithreaded one. We use STI to improve the performance on ILP processors by merging parallel procedures into one, increasing the compiler's scope and hence allowing it to create a more efficient instruction schedule. STI is essentially procedure jamming with intraprocedural code motion transformations which allow arbitrary alignment of instructions or code regions. This alignment enables code to be moved to use available execution resources better and improve the execution schedule. Parallel procedures are identified by the programmer with either annotations in conventional procedural languages or graph analysis for stream coarse-grain dataflow programming languages. We use the method of procedure cloning and integration for improving program run-time performance by integrating parallel procedures via STI. This defines a new way of converting parallelism at the thread level to the instruction level. With filter integration we apply STI for streaming applications, exploiting explicit coarse-grain dataflow information expressed by stream programming languages. During integration of threads, various STI code transformations are applied in order to maximize the ILP and reconcile control flow differences between two threads. Different transformations are selectively applied according to the control structure and the ILP characteristics of the code, driven by interactions with software pipelining. This approach effectively combines ILP-improving code transformations with instruction scheduling techniques so that they complement each other. Code transformations involve code motion as well as loop transformations such as loop jamming, unrolling, splitting, and peeling. We propose a methodology for efficiently finding the best integration scenario among all possibilities. We quantitatively estimate the performance impact of integration, allowing various integration scenarios to be compared and ranked via profitability analysis. The estimated profitability is verified and corrected by an iterative compilation approach, compensating for possible estimation inaccuracy. Our modeling methods combined with limited compilation quickly find the best integration scenario without requiring exhaustive integration. The proposed methods are automated by the STI for ILP Tool Chain targeting Texas Instrument C6x VLIW DSPs. This work contributes to the definition of an alternative development path for DSP applications. We seek to provide efficient compilation of C or C-like languages with a small amount of additional high-level dataflow information targeting popular and practical VLIW DSP platforms, reducing the need for extensive manual C and assembly code optimization and tuning. | en_US |
dc.identifier.other | etd-03182007-151015 | en_US |
dc.identifier.uri | http://www.lib.ncsu.edu/resolver/1840.16/5930 | |
dc.rights | I hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dis sertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to NC State University or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report. | en_US |
dc.subject | StreamIt | en_US |
dc.subject | VLIW | en_US |
dc.subject | DSP | en_US |
dc.subject | digital signal processor | en_US |
dc.subject | stream programming | en_US |
dc.subject | Itanium | en_US |
dc.subject | software thread integration | en_US |
dc.subject | very long instruction word | en_US |
dc.subject | TI C6000 | en_US |
dc.subject | instruction level parallelism | en_US |
dc.subject | thread level parallelism | en_US |
dc.title | Software Thread Integration for Instruction Level Parallelism | en_US |
Files
Original bundle
1 - 1 of 1