NCSU Institutional Repository >
NC State Theses and Dissertations >
Dissertations >

Please use this identifier to cite or link to this item: http://www.lib.ncsu.edu/resolver/1840.16/5930

Title: Software Thread Integration for Instruction Level Parallelism
Authors: So, Won
Advisors: Dr. Eric Rotenberg, Committee Member
Dr. Thomas M. Conte, Committee Member
Dr. Vincent W. Freeh, Committee Member
Dr. Alexander G. Dean, Committee Chair
Keywords: StreamIt
VLIW
DSP
digital signal processor
stream programming
Itanium
software thread integration
very long instruction word
TI C6000
instruction level parallelism
thread level parallelism
Issue Date: 5-Jul-2007
Degree: PhD
Discipline: Computer Engineering
Abstract: Multimedia applications require a significantly higher level of performance than previous workloads of embedded systems. They have driven digital signal processor (DSP) makers to adopt high-performance architectures like VLIW (Very-Long Instruction Word) or EPIC (Explicitly Parallel Instruction Computing). Despite many efforts to exploit instruction-level parallelism (ILP) in the application, the speed is a fraction of what it could be, limited by the difficulty of finding enough independent instructions to keep all of the processor's functional units busy. This dissertation proposes Software Thread Integration (STI) for Instruction Level Parallelism. STI is a software technique for interleaving multiple threads of control into a single implicitly multithreaded one. We use STI to improve the performance on ILP processors by merging parallel procedures into one, increasing the compiler's scope and hence allowing it to create a more efficient instruction schedule. STI is essentially procedure jamming with intraprocedural code motion transformations which allow arbitrary alignment of instructions or code regions. This alignment enables code to be moved to use available execution resources better and improve the execution schedule. Parallel procedures are identified by the programmer with either annotations in conventional procedural languages or graph analysis for stream coarse-grain dataflow programming languages. We use the method of procedure cloning and integration for improving program run-time performance by integrating parallel procedures via STI. This defines a new way of converting parallelism at the thread level to the instruction level. With filter integration we apply STI for streaming applications, exploiting explicit coarse-grain dataflow information expressed by stream programming languages. During integration of threads, various STI code transformations are applied in order to maximize the ILP and reconcile control flow differences between two threads. Different transformations are selectively applied according to the control structure and the ILP characteristics of the code, driven by interactions with software pipelining. This approach effectively combines ILP-improving code transformations with instruction scheduling techniques so that they complement each other. Code transformations involve code motion as well as loop transformations such as loop jamming, unrolling, splitting, and peeling. We propose a methodology for efficiently finding the best integration scenario among all possibilities. We quantitatively estimate the performance impact of integration, allowing various integration scenarios to be compared and ranked via profitability analysis. The estimated profitability is verified and corrected by an iterative compilation approach, compensating for possible estimation inaccuracy. Our modeling methods combined with limited compilation quickly find the best integration scenario without requiring exhaustive integration. The proposed methods are automated by the STI for ILP Tool Chain targeting Texas Instrument C6x VLIW DSPs. This work contributes to the definition of an alternative development path for DSP applications. We seek to provide efficient compilation of C or C-like languages with a small amount of additional high-level dataflow information targeting popular and practical VLIW DSP platforms, reducing the need for extensive manual C and assembly code optimization and tuning.
URI: http://www.lib.ncsu.edu/resolver/1840.16/5930
Appears in Collections:Dissertations

Files in This Item:

File Description SizeFormat
etd.pdf1.55 MBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.