Browsing by Author "Jun Xu, Committee Member"
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
- Asymmetric Task Scheduling on Simultaneous Multithreading Processors(2005-06-26) Smith, Daniel M; Vincent W. Freeh, Committee Chair; Jun Xu, Committee Member; Frank Mueller, Committee MemberThe performance of a simultaneous multithreaded CPU is greatly impacted by the behavioral characteristics of the threads competing for resources during concurrent execution. Most of the research aimed at improving SMT performance, or characterizing beneficial workload mixes, has targeted a multi-process parallel computation environment. Even in cases where the thread mix was heterogeneous, the CPU contexts were still viewed as two semi-independent resources, both of which were unbiased in their task selection. We investigate an alternative method for operating system designers to utilize an SMT CPU. By confining user processes to a single context of the CPU, and allowing kernel tasks to utilize the other context when necessary, we are able to, in many cases, provide better application performance than either an equivalent uniprocessor system, or an SMT system that is being treated as an SMP. In addition to operating in this special mode, an operating system may also choose to alternate between it and a conventional multiprocessing configuration, depending on which provides better performance. A modification to the Linux 2.6 kernel to achieve this desired behavior is presented, as well as test results of SPEC benchmarks which show where our modification improves performance. We also demonstrate how our modifications are sufficiently transparent to allow conditional mode selection at runtime.
- Exploring Energy-Time Tradeoff in High Performance Computing(2005-05-16) Pan, Feng; Vincent Freeh, Committee Chair; Jun Xu, Committee Member; Eric Rotenberg, Committee MemberRecently, energy has become an important issue in high-performance computing. For example, low power/energy supercomputers, such as Green Destiny, have been built; the idea is to increase the energy efficiency of nodes. However, these clusters tend to save energy at the expense of performance. Our approach is instead to use high-performance cluster nodes with frequency scalable AMD-64 processors; energy can be saved by scaling down the CPU. Our cluster provides a different balance of power and performance than low-power machines such as Green Destiny. In particular, its performance is on par with a Pentium 4-equipped cluster. This thesis investigates the energy consumption and execution time of a wide range of applications, both serial and parallel, on a power-scalable cluster. We study via direct measurement both intra-node and inter-node effects of memory and communication bottlenecks, respectively. Additionally, we present a framework for executing a single application in several frequency-voltage settings. The basic idea is to first divide programs in to phases and then execute a series of experiments, with each phase assigned a prescribed frequency. Our results show that a power-scalable cluster has the potential to save energy by scaling the processor down to lower energy levels. Furthermore, we found that for some programs, it is possible to both consume less energy and execute in less time by increasing number of nodes and reducing frequency-voltage setting of the nodes. Additionally, we found that our phase detecting heuristic can find assignments of frequency to phase that is superior to any fixed-frequency solution.
- HeapMon: a Low Overhead, Automatic, and Programmable Memory Bug Detector(2005-04-25) Shetty, Rithin Kumar; Yan Solihin, Committee Chair; Edward Gehringer, Committee Member; Jun Xu, Committee MemberEnabling memory-related bug detection in production runs is important for detecting and pinpointing bugs that survive debugging. Left undetected, such bugs may be manifested in behavior that is difficult to detect, such as wrong computation outputs, late or obscure system crashes, security attacks, and subtle performance loss. To be useful in production runs, a bug monitoring scheme must not slow down the monitored program much, must be automatic and not require programmer intervention, and must be easy to deploy. Unfortunately, existing tools and techniques either have a very high performance overhead or require a high degree of programmer involvement. This thesis presents HeapMon, a heap memory bug detection scheme that has a very low performance overhead, is automatic, and is easy to deploy. HeapMon relies on two new techniques. First, it completely decouples the application code from bug monitoring functions, which are implemented as a helper thread that runs on a separate core in a Chip Multi-Processor system. The helper thread monitors the status of each word on the heap by associating state bits with it.These bits indicate whether the word is unallocated, allocated but uninitialized, or allocated and initialized. Each state defines which accesses are legal to perform on the word and which are illegal (bugs). When a bug is detected, its type, PC, and data address are logged to enable developers to precisely pinpoint the bug's nature and location. The second new technique in HeapMon is to associate a filter bit with each cached word in order to safely and significantly reduce bug checking frequency (by 95% on average). We test the effectiveness of our approach with existing and injected memory bugs on SPEC 2000 applications. Our experimental results show that HeapMon effectively detects and identifies most forms of heap memory bugs, and incurs a performance overhead of only 3.5% on the average, which is orders of magnitude smaller than existing tools. Finally, HeapMon requires modest storage overhead: 3.1% of the cache size and a 32KB victim cache for on-chip filter bits, and 6.2% of the allocated heap memory size for state bits, which are maintained by the helper thread as a software data structure.
- Memory Predecryption: Hiding the Latency Overhead of Memory Encryption(2005-04-22) Rogers, Brian Michael; Jun Xu, Committee Member; Yan Solihin, Committee Chair; Gregory Byrd, Committee MemberSecurity has emerged as an important area in the field of computer research today. With the emergence of hardware-based attacks, research has been done not only on software solutions to security, but also on providing security with the help of architectural support. More specifically, hardware encryption and authentication of off-chip memory have recently been studied as ways to ensure that malicious agents cannot see data in its plaintext form or tamper with data in an undetected manner during an application's execution. When used in combination, encryption and authentication can help to provide a secure processing environment. While various techniques have been proposed for performing memory encryption in a secure processor, current schemes suffer from extra performance and storage overheads. This paper presents predecryption as a method of providing this encryption with less overhead. This is accomplished by using well-known prefetching techniques to retrieve data from memory and perform decryption before it is needed by the processor on latency-critical read operations. Our results, tested mostly on SPEC 2000 and NAS benchmarks, show that using this predecryption scheme can actually result in no increase in execution time over a system with no encryption, despite an extra 128 cycle decryption latency per memory block access.
