Browsing by Author "Alexander G. Dean, Committee Chair"
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
- Data Allocation with Real-Time Scheduling (DARTS)(2006-12-21) Ghattas, Rony; Ralph C. Smith, Committee Member; Alexander G. Dean, Committee Chair; Thomas M. Conte, Committee Member; Eric Rotenberg, Committee MemberThe problem of utilizing memory and energy efficiently is common to all computing platforms. Many studies have addressed and investigated various methods to circumvent this problem. Nevertheless, most of these studies do not scale well to real-time embedded systems where resources might be limited and particular assumptions that are valid to general computing platforms do not continue to hold. First, memory has always been considered a bottleneck of system performance. It is well known that processors have been improving at a rate of about 60% per year, while memory latencies have been improving at less than 10% per year. This leads to a growing gap between processor cycle time and memory access Time. To compensate for this speed mismatch problem it is common to use a memory hierarchy with a fast cache that can dynamically allocate frequently used data objects close to the processor. Many embedded systems, however, cannot afford using a cache for many reasons presented later. Those systems opt to use a cacheless system which is particularly very popular for real-time embedded applications. Data is allocated at compile time, making memory access latencies deterministic and predictable. Nevertheless, the burden of allocating the data to memory is now the responsibility of the programmer⁄compiler. Second, the proliferation of portable and battery-operated devices has made the efficient use of the available energy budget a vital design constraint. This is particularly true since the energy storage technology is also improving at a rather slow pace. Techniques like dynamic voltage scaling (DVS) and dynamic frequency scaling (DFS) have been proposed to deal with these problems. Still, the applicability of those techniques to resource-constrained real-time system has not been investigated. In this work we propose techniques to deal with both of the above problems. Our main contribution, the data allocation with real-time scheduling (DARTS) framework solves the data allocation and scheduling problems in cacheless systems with the main goals of optimizing memory utilization, energy efficiency, and obviously overall system performance. DARTS is a synergistic optimal approach to allocating data objects and scheduling real-time tasks for embedded systems. It optimally allocates data objects to memory through the use of an integer linear programming (ILP) formulation, which minimizes the system’s worst-case execution times WCET resulting in more scheduling slack. This additional slack is used by our preemption threshold scheduler (PTS) to reduce stack memory requirements while maintaining all hard real-time constraints. The memory reduction of PTS allows these steps to be repeated. The data objects now require less memory, so more can fit into faster memory, further reducing WCET and resulting in more slack time. The increased slack time can be used by PTS to reduce preemptions further, until a fixed point is reached. Using a combination of synthetic and real workloads, we show that the DARTS platform leads to optimal memory utilization and increased energy efficiency. In addition to our main contribution given by the DARTS platform, we also present several techniques to optimize a system’s memory utilization in the absence of a memory hierarchy using PTS, which we enhance and improve. Furthermore, many advanced energy saving techniques like DFS and DVS are investigated as well, and the tradeoffs in their use is presented and analyzed.
- Providing Static Timing Anlaysis Support for an ARM7 Processor Platform(2008-05-07) Kang, Sang Yeol; Alexander G. Dean, Committee Chair; Douglas S. Reeves, Committee Member; James M. Tuck, Committee MemberScratchpad memory provides faster speed but smaller capacity than other memories do in embedded systems. It provides a visibly heterogeneous memory hierarchy rather than abstracting it as cache memory does. Unlike cache memory, program code and data can be allocated into the scratchpad memory as desired. This enables optimizing the performance in real-time embedded systems. Static timing analysis helps the optimization processes by providing microscopic information of the application program's timing information. Based on the WCET and BCET estimated by static timing analysis, the techniques using scratchpad memory may be enhanced. This study aims to provide a method of static timing analysis for an ARM processor platform (ARM7TDMI). Basic analysis is performed relying on well-known program analysis graphs such as control flow graphs, call graphs, depth-first search trees, and post-dominance trees. During this basic analysis, loops and unstructured code are also identified, which make static timing analysis more difficult. A control dependence analysis is a convenient way to analyze the WCET and BCET, since it represents the hierarchical control structure of a program. By traversal of the control dependence graph, the WCET and BCET are estimated. To confirm the feasibility of this study, a real target system and its development environment tool chains are developed and an existing application is ported. In addition, the static timing analysis framework of this study is implemented by the tool named ARMSAT. Experiments are performed in these all environments. The experimental results show that the actual execution times are bounded by the calculated analytical WCET and BCET bounds, although there are a few factors which interfere with computing the analytical execution times.
- Software Thread Integration for Converting TLP to ILP on VLIW/EPIC Architectures(2003-01-14) So, Won; Eric Rotenberg, Committee Member; Tom Conte, Committee Member; Alexander G. Dean, Committee ChairMultimedia applications are pervasive in modern systems. They generally require a significantly higher level of performance than previous workloads of embedded systems. They have driven digital signal processor makers to adopt high-performance architectures like VLIW (Very-Long Instruction Word) or EPIC (Explicitly Parallel Instruction Computing). Despite many efforts to exploit instruction level parallelism (ILP) in the application, typical utilization levels for compiler-generated VLIW/EPIC code range from one-eighth to one-half because a single instruction stream has limited ILP. Software Thread Integration (STI) is a software technique which interleaves multiple threads at the machine instruction level. Integration of threads increases the number of independent instructions, allowing the compiler to generate a more efficient instruction schedule and hence faster runtime performance. We have developed techniques to use STI for converting thread level parallelism (TLP) to ILP on VLIW/EPIC architectures. By focusing on the abundant parallelism at the procedure level in the multimedia applications, we integrate parallel procedure calls, which can be seen as threads, by gathering work in the application. We rely on the programmer to identify parallel procedures, rather than rely on compiler identification. Our methods extend whole-program optimization by expanding the scope of the compiler through software thread integration and procedure cloning. It is effectively a superset of loop jamming as it allows a larger variety of threads to be jammed together. This thesis proposes a methodology to integrate multiple threads in multimedia applications and introduce the concept of a 'Smart RTOS' as an execution model for utilizing integrated threads efficiently in embedded systems. We demonstrate our technique by integrating three procedures from a JPEG application at C source code level, compiling with four compilers for the Itanium EPIC architecture and measuring the performance with the on-chip performance measurement units. Experimental results show procedure speedup of up to 18% and program speedup up to 11%. Detailed performance analysis demonstrates the primary bottleneck to be the Itanium's 16K instruction cache, which has limited room for the code expansion by STI.
- Using Software Thread Integration with TinyOS(2008-02-11) Purvis, Zane Dustin; Alexander G. Dean, Committee Chair; Suleyman Sair, Committee Member; Eric Rotenberg, Committee Member; Thomas M. Conte, Committee Member
