NCSU Institutional Repository >
NC State Theses and Dissertations >
Please use this identifier to cite or link to this item:
|Title: ||Software Thread Integration for Converting TLP to ILP on VLIW/EPIC Architectures|
|Authors: ||So, Won|
|Advisors: ||Eric Rotenberg, Committee Member|
Tom Conte, Committee Member
Alexander G. Dean, Committee Chair
|Keywords: ||Smart RTOS|
Software Thread Integration
|Issue Date: ||14-Jan-2003|
|Discipline: ||Computer Engineering|
|Abstract: ||Multimedia applications are pervasive in modern systems. They generally require a significantly higher level of performance than previous workloads of embedded systems. They have driven digital signal processor makers to adopt high-performance architectures like VLIW (Very-Long Instruction Word) or EPIC (Explicitly Parallel Instruction Computing). Despite many efforts to exploit instruction level parallelism (ILP) in the application, typical utilization levels for compiler-generated VLIW/EPIC code range from one-eighth to one-half because a single instruction stream has limited ILP.
Software Thread Integration (STI) is a software technique which interleaves multiple threads at the machine instruction level. Integration of threads increases the number of independent instructions, allowing the compiler to generate a more efficient instruction schedule and hence faster runtime performance. We have developed techniques to use STI for converting thread level parallelism (TLP) to ILP on VLIW/EPIC architectures. By focusing on the abundant parallelism at the procedure level in the multimedia applications, we integrate parallel procedure calls, which can be seen as threads, by gathering work in the application. We rely on the programmer to identify parallel procedures, rather than rely on compiler identification. Our methods extend whole-program optimization by expanding the scope of the compiler through software thread integration and procedure cloning. It is effectively a superset of loop jamming as it allows a larger variety of threads to be jammed together.
This thesis proposes a methodology to integrate multiple threads in multimedia applications and introduce the concept of a 'Smart RTOS' as an execution model for utilizing integrated threads efficiently in embedded systems. We demonstrate our technique by integrating three procedures from a JPEG application at C source code level, compiling with four compilers for the Itanium EPIC architecture and measuring the performance with the on-chip performance measurement units. Experimental results show procedure speedup of up to 18% and program speedup up to 11%. Detailed performance analysis demonstrates the primary bottleneck to be the Itanium's 16K instruction cache, which has limited room for the code expansion by STI.|
|Appears in Collections:||Theses|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.