Log In
New user? Click here to register. Have you forgotten your password?
NC State University Libraries Logo
    Communities & Collections
    Browse NC State Repository
Log In
New user? Click here to register. Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Frank Mueller, Committee Member"

Filter results by typing the first few letters
Now showing 1 - 11 of 11
  • Results Per Page
  • Sort Options
  • No Thumbnail Available
    Adding Coordination to the Management of High-End Storage Systems
    (2009-11-20) Zhang, Zhe; Xiaosong Ma, Committee Chair; William Stewart , Committee Co-Chair; Frank Mueller, Committee Member; Robert Handfield, Committee Member; Sudharshan Vazhkudai, Committee Member
    Today’s scientific and commercial applications rely heavily on high-end computing(HEC) facilities, including large scale datacenters, supercomputers, and so forth. In these facilities, the storage subsystems are playing an increasingly important role in the overall computing experience perceived by users. Meanwhile, it is a challenging task to provide high performance and reliability to those high-end storage systems due to their high I/O demands, large scales, and complex architectures. We observe that in addition to the well-recognized lack of I/O resources relative to computing demands in an aggregate perspective, one main challenge faced by high-end storage systems lies in the growing scale and complexity of the entire environment. Individually developed system components or algorithms often behave with isolated local optimizations, and handle concurrent user workloads without considering inter-workload relationships. The author’s Ph.D. research focuses on three novel instances of bringing adaptive coordination to the management of commercial and scientific high-end storage systems, at different levels of the HEC storage hierarchy. Firstly, on a single storage server, we present a memory cache allocation mechanism which coordinates multiple concurrent sequential access streams with different request rates. Our work is based on the interesting observation that this problem bears a strong resemblance to situations long studied in the field of supply chain management (SCM), used by used by large vendors and retailers. Furthermore, in a multi-level storage architecture, we address the problem of information distortion in uncoordinated prefetching operations on different storage caches. We develop a simple information sharing mechanism, as well as a transparent hierarchy-aware optimization component named PreFetching-Coordinator (PFC), which monitors both upper- and lower-level caches, and adjusts the aggressiveness of lower-level prefetching. Finally, we improve the data availability in an entire distributed storage system by coordinating it with the HPC job scheduler and remote data sources. We implemented the proposed techniques in real software environments, including a state-of-the-art operating system kernel, a widely used job scheduler and a popular parallel file system, as well as verified simulators. Our experimental results collected from real system experiments and simulations show that our proposed techniques can significantly improve system performance and reliability by coordinating among system components and requests.
  • No Thumbnail Available
    Analysis-Managed Processor (AMP): Exceeding the Complexity Limit in Safe-Real-Time Systems
    (2006-04-28) Anantaraman, Aravindh Venkataseshadri; Alexander G. Dean, Committee Member; Frank Mueller, Committee Member; Thomas M. Conte, Committee Member; Eric Rotenberg, Committee Chair
    Safe-real-time systems need tasks' worst-case execution times (WCETs) to guarantee deadlines. With increasing microarchitectural complexity, the analysis required to derive WCETs is becoming complicated and, in some cases, intractable. Thus, complex microarchitectural features are discouraged in safe-real-time systems. My thesis is that microarchitectural complexity is viable in safe-real-time systems, if control is provided over this complexity. I propose a reconfigurable processor, the Analysis-Managed Processor (AMP), that offers complete control over its complex features. The ability to dynamically manage the AMP enables novel cooperative static and run-time WCET frameworks that break the limitations of the traditional static-only WCET model, allowing complex features to be safely included. (i) The Virtual Simple Architecture (VISA) framework avoids analyzing complex features. VISA derives tasks' WCETs assuming a simple processor. At run-time, tasks are speculatively attempted on the AMP with complex features enabled. A run-time framework dynamically confirms that WCETs are not exceeded. (ii) The Non-Uniform Program Analysis (NUPA) framework enables efficient analysis of complex features. NUPA matches different program segments to different operating modes of the AMP. NUPA yields reduced WCETs for program segments that can be analyzed in the context of complex features, without the severe burden of requiring all program segments to be analyzed this way. I propose that out-of-order execution is not inherently intractable, rather its interaction with control-flow is intractable. Out-of-order processors overlap the execution of 10s to 100s of in-flight instructions. Variable control-flow causes an explosion in the number of potential overlap schedules. I propose two timing analysis techniques that reduce the number of possible schedules. (i) Repeatable Execution Constraints for Out-of-ORDER (RECORDER) eliminates variable control-flow and implied data-flow variations, guaranteeing a single input-independent execution schedule that can be derived via simulation, using arbitrary (random) program inputs. (ii) Drain-and-Branch (DNB) restricts instruction overlap by insulating a branch's control-dependent region from the effects of instructions before and after the region. RECORDER and DNB are complementary, as they work well for branches with short regions and long regions, respectively. Further, in the context of a NUPA framework, different branch regions may favor RECORDER, DNB, or in-order execution mode of the AMP, for achieving a highly optimized overall WCET. Moreover, branch regions analyzed for downgraded in-order execution can still benefit from the VISA run-time framework by speculatively enabling out-of-order mode of the AMP. The flexible combination of all the above techniques multiplies benefits, yielding a powerful framework for fully and safely capitalizing on complex microarchitectures in safe-real-time systems.
  • No Thumbnail Available
    Asymmetric Task Scheduling on Simultaneous Multithreading Processors
    (2005-06-26) Smith, Daniel M; Vincent W. Freeh, Committee Chair; Jun Xu, Committee Member; Frank Mueller, Committee Member
    The performance of a simultaneous multithreaded CPU is greatly impacted by the behavioral characteristics of the threads competing for resources during concurrent execution. Most of the research aimed at improving SMT performance, or characterizing beneficial workload mixes, has targeted a multi-process parallel computation environment. Even in cases where the thread mix was heterogeneous, the CPU contexts were still viewed as two semi-independent resources, both of which were unbiased in their task selection. We investigate an alternative method for operating system designers to utilize an SMT CPU. By confining user processes to a single context of the CPU, and allowing kernel tasks to utilize the other context when necessary, we are able to, in many cases, provide better application performance than either an equivalent uniprocessor system, or an SMT system that is being treated as an SMP. In addition to operating in this special mode, an operating system may also choose to alternate between it and a conventional multiprocessing configuration, depending on which provides better performance. A modification to the Linux 2.6 kernel to achieve this desired behavior is presented, as well as test results of SPEC benchmarks which show where our modification improves performance. We also demonstrate how our modifications are sufficiently transparent to allow conditional mode selection at runtime.
  • No Thumbnail Available
    Automating and Simplifying Memory Corruption Attack Response Using Failure-Aware Computing
    (2006-07-21) Gauriar, Prachi; Jun Xu, Committee Chair; Laurie Williams, Committee Member; Frank Mueller, Committee Member
    Over the last two decades, advances in software engineering have produced new ways of creating robust, reliable software. Unfortunately, the dream of bug-free software still eludes us. When bugs are discovered in deployed software, software failures and service disruption can lead to significant losses, both monetary and otherwise. The typical failure response process is composed of three phases: failure detection, cause analysis, and solution formulation. To minimize the impact of software failures, it is critical that each of these phases be completed as quickly as possible. This thesis is separated into two parts. In the first part, we propose a general conceptual approach called emph{failure-aware computing} that aims to automate as much of the failure response process as possible. We describe the architecture of this proposed framework, some possible applications, and challenges if it were implemented. We then describe how this framework can be applied to responding to memory corruption errors. In the second part, we describe and evaluate an implementation of part of this framework for diagnosing memory corruption failures. In particular, we discuss a root cause analysis tool we have created that analyzes a program's source code to determine which memory-related program events potentially lead to a memory corruption error. Our tool then monitors the afflicted program's execution and outputs useful information to aid the developer in understanding the root cause of the failure. We finally evaluate our tool's effectiveness in identifying the root cause of memory access errors in both self-written and open-source code.
  • No Thumbnail Available
    Hard-Real-Time Multithreading: A Combined Microarchitectural and Scheduling Approach.
    (2006-05-04) El-Haj Mahmoud, Ali Ahmad; Alexander G. Dean, Committee Member; Eric Rotenberg, Committee Chair; Thomas M. Conte, Committee Member; Frank Mueller, Committee Member
    Simultaneous Multithreading (SMT) enables fine-grain resource sharing of a single superscalar processor among multiple tasks, improving cost-performance. However, SMT cannot be safely exploited in hard-real-time systems. These systems require analytical frameworks for making worst-case performance guarantees. SMT violates simplifying assumptions for deriving worst-case execution times (WCET) of tasks. Classic real-time theory uses single-task WCET analysis, where a task is assumed to have access to dedicated processor resources, hence, its WCET can be derived independent of its task-set context. This is not true for SMT, where tasks interfere due to resource sharing. Modeling interference requires whole task-set WCET analysis, but this approach is futile since co-scheduled tasks vary and compete for resources arbitrarily. Thus, formally proving real-time guarantees for SMT is intractable. This dissertation proposes flexible interference-free multithreading. Interference-free partitioning guarantees that the performance of a single task is not affected by its workload context (hence, preserving single-task WCET analysis), while flexible resource sharing emulates fine-grain resource sharing of SMT to achieve similar cost-performance efficiency. The Real-time Virtual Multiprocessor (RVMP) paradigm virtualizes a single superscalar processor into multiple interference-free different-sized virtual processors. This provides a flexible spatial dimension. In the time dimension, the number and sizes of virtual processors can be rapidly reconfigured. A simple real-time scheduling approach concentrates scheduling within a small time interval (the 'round'), producing a simple repeating space/time schedule that orchestrates virtualization. Worst-case schedulability experiments show that more task-sets are provably schedulable on RVMP than on conventional rigid multiprocessors with equal aggregate resources, and the advantage only intensifies with more demanding task-sets. Run-time experiments show RVMP's statically-controlled coarser-grain resource sharing is as effective as unsafe SMT, and provides a real-time formalism that SMT does not currently provide. RVMP's round-based scheduling enables other optimizations for safely improving performance even more. A framework is developed on top of RVMP to safely, tractably, and tightly bound overlap between computation and memory accesses of different tasks to improve worst-case performance. This framework captures the throughput gain of dynamic switch-on-event multithreading, but in a way that is compatible with hard-real-time formalism.
  • No Thumbnail Available
    High Performance Parallel and Distributed Genomic Sequence Search
    (2009-03-26) Lin, Heshan; Xiaosong Ma, Committee Chair; Steffen Heber, Committee Member; Frank Mueller, Committee Member; Nagiza Samatova, Committee Member; Douglas Reeves, Committee Member
    Genomic sequence database search identifies similarities between given query sequences and known sequences in a database. It forms a critical class of applications used widely and routinely in computational biology. Due to their wide application in diverse task settings, sequence search tools today are run on several types of parallel systems, including batch jobs on one or more supercomputers and interactive queries through web-based services. Despite successful parallelization of popular sequence search tools such as BLAST, in the past two decades the growth of sequence databases has outpaced that of computing hardware elements, making scalable and efficient parallel sequence search processing crucial in helping life scientists' dealing with the ever-increasing amount of genomic information. In this thesis, we investigate efficient and scalable parallel and distributed sequence-search solutions by addressing unique problems and challenges in the aforementioned execution settings. Specifically, this thesis research 1) introduces parallel I/O techniques into sequence-search tools and proposes novel computation and I/O co-scheduling algorithms that enable genomic sequence search to scale efficiently on massively parallel computers; 2) presents a semantic based distributed I/O framework that leverages the application specific meta information to drastically reduce the amount of data transfer and thus enables distributed sequence searching collaboration in the global scale; 3) proposes a novel request scheduling technique for clustered sequence-search web servers that comprehensively takes into account both data locality and parallel search efficiency to optimize query response time under various server load levels and access scenarios. The efficacy of our proposed solutions has been verified on a broad range of parallel and distributed systems, including Peta-scale supercomputers, the NSF TeraGrid system, and small- or medium-sized clusters. In addition, our optimizations of massively parallel sequence search have been transformed into the official release of mpiBLAST-PIO, currently the only supported branch of mpiBLAST, a popular open-source sequence-search tool. mpiBLAST-PIO is able to achieve 93% parallel efficiency across 32,768 cores on the IBM Blue Gene/P supercomputer.
  • No Thumbnail Available
    Mechanical and Transport Properties of Carbon Nanotube Systems
    (2004-02-05) Zhao, Qingzhong; Jerry Bernholc, Committee Chair; Frank Mueller, Committee Member; Christopher Roland, Committee Member; Marco Buongiorno Nardelli, Committee Member
    The mechanical and transport properties of carbon nanotube systems are studied by large-scale ab initio, tight-binding and classical molecular dynamics simulations. The ultimate strength of carbon nanotubes is investigated theoretically. While the formation energy of strain-induced topological defects determines the thermodynamic limits of the elastic response and of mechanical resistance to applied tension, it is found that the activation barriers for the formation of such defects are much larger than estimated previously. The theoretical results indicate a substantially greater resilience and strength, and show that the ultimate strength limit of carbon nanotubes has yet to be reached experimentally. Carbon nanotubes are indeed the strongest material known. The electronic transport in a new type of carbon nanotube material: carbon nanotube-metal cluster assembly is investigated for gas absorption. For an Al cluster attached to a metallic nanotube, we have observed that its electrical response dramatically changes upon NH3 adsorption onto the metal cluster. For a semiconducting nanotube-Al cluster assembly, the same gas adsorption enhances the system's conductivity. The results of our ab initio simulations explain the observed behavior in terms of interactions between the molecular species and the nanotube-cluster system, where successive charge transfers between the components tailor the electronic and transport properties. Carbon nanotubemetal cluster assemblies could be a new type of nanotube-based chemical/biological sensors.
  • No Thumbnail Available
    An open solution to discover the graph structure of World Wide Web
    (2009-12-23) Chen, Kunsheng; Vincent W. Freeh, Committee Chair; Frank Mueller, Committee Member; Xuxian Jiang, Committee Member
    The World Wide Web is a large complex network of inter-linked web pages. Understanding this structure is of immense benefit both economically and socially. Currently, there is incomplete or sparse information about the graph structure of the Web in the public domain. The full data is closely-guarded by a handful of corporations. Nevertheless, studies on the topological structure of World Wide Web benefit not only scientists and e-commerce merchants but also common users. A better understanding of such a structure helps scientists to develop new technologies to improve the Internet. It also assists companies to build optimal e-commerce solutions to fulfill their business needs. The goal of this thesis is to evaluate an open source solution to mapping the structure of the Web. In support of this thesis, we have implemented a prototype using existing open source software including voluntary computing library BOINC (Berkeley Open Infrastructure Network Computing) and Hadoop MapReduce framework. We utilize the computing power and disk space from BOINC to perform data collection and Hadoop MapReduce framework to perform data analysis on a large set of data.. Contribution of our research includes a low-cost open solution of a distributed web crawling system using BOINC and a URL ranking system utilizing Hadoop MapReduce framework. We also provide a feasibility study on crawling the web using the above solution and present experimental results.
  • No Thumbnail Available
    Poseidon: Hardware Support for Buffer Overflow Attacks
    (2003-09-06) Vaidyanathan, Anuradha; Alex G Dean, Committee Member; Frank Mueller, Committee Member; Thomas M Conte, Committee Member; Gregory T Byrd, Committee Chair
    Stack smashing attacks were the most exploited security vulnerability in the past decade, according to CERT. Using a method called stack smashing, a malicious user overflows a buffer in the stack frame, overwriting critical stack state. The return address of the current function, which is saved in the function's stack frame, is overwritten when the buffer overflows. The new return address points to the attacker's code. So, when the function is exited, control is transferred to the attacker's code instead of back to the calling function. A common way to prevent overflow-based stack smashing is to insert bounds checking code or insert sentinel values on the stack, but this requires recompilation. We propose a hardware-based method that does not require recompilation, based on the idea that an attack of this kind produces an unexpected return address. The processor maintains a separate hardware stack, called the shadow stack, and monitors the dynamic instruction stream for subroutine calls and returns. When a call instruction is retired, its return address is pushed on the shadow stack. When a return instruction is retired, the address at the top of the shadow stack is popped and compared to the target of the return instruction. If the addresses differ, then the conventional subroutine call/return semantics have been violated. This may truly be an attack, or it may be a legitimate program construct (e.g., setjmp()/longjmp()) that also violates call/return semantics. Legitimate cases are distinguished from attacks by recording the stack pointer along with the return address at the time of a call: when a subroutine returns, the stack pointer appears consistent in the case of an attack but not in the case of setjmp()/longjmp(). There are three distinct parts to the evaluation of the usefulness and the practicality of this idea. The first part is identifying the generality of this solution. This means that we seek to answer the question: "Do we detect all forms of buffer overflow attacks without raising unnecessary false positives in the case of legal program constructs?" The second part is the actual design details of such a stack and the amount of state that needs to be recorded to facilitate the generality described above. The third part is the actual recovery mechanism that could take the form of exceptions raised that could be further handled by the Operating System. This thesis answers the generality and design questions in entirety while laying a solid basic understanding and initial set of experiments for the recovery scheme that could be utilized.
  • No Thumbnail Available
    Slipstream Execution Mode for CMP-based Shared Memory Systems
    (2003-07-30) Ibrahim, Khaled Zakarya Moustafa; Gregory T. Byrd, Committee Chair; Thomas M. Conte, Committee Member; Eric Rotenberg, Committee Member; Frank Mueller, Committee Member
    Scalability of applications on distributed shared-memory (DSM) multiprocessors is limited by communication and synchronization overheads. At some point, using more processors to increase parallelism yields diminishing returns or even degrades performance. When increasing concurrency is futile, we propose an additional mode of execution, called slipstream mode, that instead enlists extra processors to assist parallel tasks by reducing perceived overheads. We consider DSM multiprocessors built from dual-processor chip multiprocessor (CMP) nodes (e.g., IBM Power-4 CMP) with shared L2 cache. A parallel task is allocated on one processor of each CMP node. The other processor of each node executes a reduced version of the same task. The reduced version skips shared-memory stores and synchronization, allowing it to run ahead of the true task. Even with the skipped operations, the reduced task makes accurate forward progress and generates an accurate reference stream, because branches and addresses depend primarily on private data. Slipstream execution mode yields multiple benefits. First, the reduced task prefetches data on behalf of the true task. Second, reduced tasks provide a detailed picture of future reference behavior, enabling a number of optimizations aimed at accelerating coherence events. We investigate a well-known optimization, self-invalidation. We also investigate providing confidence mechanism for speculation after barrier synchronization. We investigate the implementation of an OpenMP compiler that supports slipstream execution mode. We discuss how each OpenMP construct can be implemented to take advantage of slipstream mode, and we present a minor extension that allows runtime or compile-time control of slipstream execution. We also investigate the interaction between slipstream mechanisms and OpenMP scheduling. Our implementation supports both static and dynamic scheduling in slipstream mode. For multiprocessor systems with up to 16 CMP nodes, Slipstream mode is 12-19% faster with prefetching only. With self-invalidation also enabled, performance is improved by as much as 29%. We extended slipstream mode to provide a confidence mechanism for barrier speculation. This mechanism identifies dependencies and tries to avoid dependency violations that lead to misspeculations (and subsequently rollbacks). Rollbacks are reduced by up to 95% and the improvement in performance is up to 13%. Slipstream execution mode enables a wide range of optimizations based on an accurate future image of the program behavior. It does not require custom auxiliary hardware tables used by history-based predictors.
  • No Thumbnail Available
    Towards Transparent Parallel Processing on Multi-core Computers
    (2009-12-22) Li, Jiangtian; Xiaosong Ma, Committee Chair; Xiaohui Gu, Committee Member; Frank Mueller, Committee Member; Nagiza Samatova, Committee Member
    Parallelization of all application types is critical with the trend towards an exponentially increasing number of cores per chip, which needs to be done at multiple levels to address the unique challenges and exploit the new opportunities brought by new architecture advances. In this dissertation, we focus on enhancing the utilization of future-generation, many-core personal computers for high performance and energy effective computing. On one hand, computation- and/or data-intensive tasks such as scientific data processing and visualization, which are typically performed sequentially on personal workstations, need to be parallelized to take advantage of the increasing hardware parallelism. Explicit parallel programming, however, is labor-intensive and requires sophisticated performance tuning for individual platforms and operating systems. In this PhD study, we made a first step toward transparent parallelization for data processing codes, by developing automatic parallelization tools for scripting languages. More specifically, we present pR, a framework that transparently parallelizes the R language for high-performance statistical computing. We apply parallelizing compiler technology to runtime, whole-program dependence analysis and employ incremental code analysis assisted with evaluation results. Experimental results demonstrate that pR can exploit both task and data parallelism transparently and overall achieve good performance as well as scalability. Further, we attack the performance tuning problem for transparent parallel execution, by proposing and designing a novel online task decomposition and scheduling approach for transparent parallel computing. This approach collects runtime task cost information transparently and performs online static scheduling, utilizing cost estimates generated by ANN-based runtime performance prediction, as well as by loop iteration test runs. We implement the above techniques in the pR framework and our proposed approach is demonstrated to significantly improve task partitioning and scheduling over a variety of benchmarks. On the other hand, multi-core personal computers will inevitably be under-utilized when their owners perform light-weight tasks such as editing and web browsing, making volunteer computing more appealing than ever. In this study, we made a first step towards a novel computation model, energy-aware volunteer computing on multi-core processors, by evaluating the potential energy/performance trade-off of a more aggressive execution model that selects active nodes over idle nodes for scheduling foreign application tasks, in order to better utilize idle cores and achieve energy savings. Our experiment results suggest that aggressive volunteer computing can bring significant energy saving compared to common existing execution modes and provides an attractive computation model.

Contact

D. H. Hill Jr. Library

2 Broughton Drive
Campus Box 7111
Raleigh, NC 27695-7111
(919) 515-3364

James B. Hunt Jr. Library

1070 Partners Way
Campus Box 7132
Raleigh, NC 27606-7132
(919) 515-7110

Libraries Administration

(919) 515-7188

NC State University Libraries

  • D. H. Hill Jr. Library
  • James B. Hunt Jr. Library
  • Design Library
  • Natural Resources Library
  • Veterinary Medicine Library
  • Accessibility at the Libraries
  • Accessibility at NC State University
  • Copyright
  • Jobs
  • Privacy Statement
  • Staff Confluence Login
  • Staff Drupal Login

Follow the Libraries

  • Facebook
  • Instagram
  • Twitter
  • Snapchat
  • LinkedIn
  • Vimeo
  • YouTube
  • YouTube Archive
  • Flickr
  • Libraries' news

ncsu libraries snapchat bitmoji

×