Browsing by Author "Tao Xie, Committee Member"
Now showing 1 - 13 of 13
- Results Per Page
- Sort Options
- Achieving Communication Scalability in Collaborative Development Tools: Performance Modeling of the Jazz Development Environment.(2010-05-12) Phadnis, Deepti; Mihail Devetsikiotis, Committee Chair; Tao Xie, Committee Member; Harilaos Perros, Committee Member
- Analytical Bounding Data Cache Behavior for Real-Time Systems(2008-07-21) Ramaprasad, Harini; Frank Mueller, Committee Chair; Eric Rotenberg, Committee Member; Vincent Freeh, Committee Member; Tao Xie, Committee MemberThis dissertation presents data cache analysis techniques that make it feasible to predict data cache behavior and to bound the worst-case execution time for a large class of real-time programs. Data Caches are an increasingly important architectural feature in most modern computer systems. They help bridge the gap between processor speeds and memory access times. One inherent difficulty of using data caches in a real-time system is the unpredictability of memory accesses, which makes it difficult to calculate worst-case execution times of real-time tasks. This dissertation presents an analytical framework that characterizes data cache behavior in the context of independent, periodic tasks with deadlines less than or equal to their periods, executing on a single, in-order processor. The framework presented has three major components. 1) The first component analytically derives data cache reference patterns for all scalar and non-scalar references in a task. Using these, it produces a safe and tight upper bound on the worst-case execution time of the task without considering interference from other tasks. 2) The second component calculates the worst-case execution time and response time of a task in the context of a multi-task, prioritized, preemptive environment. This component calculates Data-Cache Related Preemption Delay for tasks assuming that all tasks in the system are completely preemptive. 3) In the third component, tasks are allowed to have critical sections in which they access shared resources. In this context, two analysis techniques are presented. In the first one, a task executing in a critical section is not allowed to be preempted by any other task. In the second one, the framework incorporates Resource Sharing Policies to arbitrate accesses to shared resources, thereby improving responsiveness of high-priority tasks that do not use a particular resource. In all the components presented in this dissertation, a direct-mapped data cache is assumed. Experimental results demonstrate the value of all the analysis techniques described above in the context of data cache usage in a hard real-time system.
- Architecture Support for Operating System Survivability and Efficient Bulk Memory Copying and Initialization(2010-01-05) Jiang, Xiaowei; Tao Xie, Committee Member; Edward Gehringer, Committee Member; Yan Solihin, Committee Chair; Gregory Byrd, Committee Member; William Cohen, Committee MemberOperating System (OS) is the fundamental layer that provides and mediates accesses to a computer system’s resources for user application programs. The ever increasing size and complexity of the OS code bring the inevitable increase in the number of security vulnerabilities that can be exploited by attackers. A successful security attack on the OS has a profound impact because the OS runs at the highest processor privilege level. An OS kernel crash can freeze the entire system, terminate all running processes, and cause a long period of system unavailability. Given the increasing trend of OS security faults and the dire consequences of successful OS kernel attacks, we strive to make the OS kernel survivable, i.e. able to keep normal system operation despite security faults. This works makes several contributions. First, we propose an OS survivability scheme that consists of three inseparable components: (1) Security attack detection mechanism, (2) security fault isolation mechanism, and (3) recovery mechanism that resumes normal system operation. We analyze the underlying performance requirement for each of the components and propose simple but carefully-designed architecture support to reduce the performance overhead. When testing with real world security attacks, our survivability scheme automatically isolates the security faults from corrupting the kernel state or affecting other executing processes, recovers the kernel state and resumes execution. Second, in order to overcome the performance overhead incurred by the checkpointing- based recovery mechanism that extensively uses bulk memory copying and initialization operations, we propose efficient architecture support for improving bulk memory copying and initialization performance. While many of the current systems rely on a loop of loads and stores, or use a single copying instruction to perform memory copying, in this work we demonstrate that the key to significantly improving the performance is removing pipeline and cache bottlenecks of the code that follows the copying instructions. We show that the bottlenecks arise due to (1) the pipeline clogged by the copying instruction, (2) lengthened critical path due to dependent instructions stalling while waiting for the copying to complete, and (3) the inability to specify (separately) the cacheability of the source and destination regions. We propose FastBCI, an architecture support that achieves the granularity efficiency of a bulk copying/initialization instruction, but without its pipeline and cache bottlenecks. When applied to OS kernel buffer management, we show that on average FastBCI achieves anywhere between 23% to 32% speedup ratios, which is roughly 3×–4× of an alternative scheme, and 1.5×–2× of a highly optimistic DMA; When applied to our OS survivability scheme, we show that an average of 1.0% performance overhead can be achieved by our survivability scheme.
- Automated Access Control Policy Testing through Code Generation(2008-07-26) Sivasubramanian, Dhivya; Peng Ning, Committee Member; Tao Xie, Committee Member; Ting Yu, Committee Chair
- Exploiting Hardware/Software Interactions for Analyzing Embedded Systems(2008-08-15) Mohan, Sibin; Frank Mueller, Committee Chair; Alex Dean, Committee Member; Purush Iyer, Committee Member; Tao Xie, Committee MemberEmbedded systems are often subject to real-time timing constraints. Such systems require determinism to ensure that task deadlines are met. The knowledge of the bounds on worst-case execution times (WCET) of tasks is a critical piece of information required to achieve this objective. One limiting factor in designing real-time systems is the class of processors that may be used. Contemporary processors with their advanced architectural features, such as out-of-order execution, branch prediction, speculation, and prefetching, cannot be statically analyzed to obtain WCETs for tasks as they introduce non-determinism into task execution, which can only be resolved at run-time. Such micro-processors are tuned to reduce average-case execution times at the expense of predictability. Hence, they do not find use in hard real-time systems. On the other hand, static timing analysis derives bounds on WCETs but requires that bounds on loop iterations be known statically, i.e., at compile time. This limits the class of applications that may be analyzed by static timing analysis and, hence, used in a real-time system. Finally, many embedded systems have communication and⁄or synchronization constructs and need to function on a wide spectrum of hardware devices ranging from small microcontrollers to modern multi-core architectures. Hence, any single analysis technique (be it static or dynamic) will not suffice in gauging the true nature of such systems. This thesis contributes novel techniques that use combinations of analysis methods and constant interactions between them to tackle complexities in modern embedded systems. To be more specific, this thesis (I) introduces of a new paradigm that proposes minor enhancements to modern processor architectures, which, on interaction with software modules, is able to obtain tight, accurate timing analysis results for modern processors; (II) it shows how the constraint concerning statically bound loops may be relaxed and applied to make dynamic decisions at run-time to achieve power savings; (III) it represents the temporal behavior of distributed real-time applications as colored graphs coupled with graph reductions/transformations that attempt to capture inherent "meaning" in the application. To the best of my knowledge, these methods that utilize interactions between different sources of information to analyze modern embedded systems are a first of their kind.
- Generation And Verification Of Software Robustness Properties Through Static Analysis(2006-01-06) Sharma, Tanu; Jun Xu, Committee Chair; David Thuente, Committee Co-Chair; Tao Xie, Committee MemberIncreasing reliance on computers calls for the need of robust software especially in critical applications such as those used in military, hospital etc. Traditional software testing techniques focus on functionality and ignore stressful conditions and exception handling. Poor programming practices may lead to critical software robustness failures resulting in memory corruption, application crashes and file system failures. Such robustness failures can be detected by many static analysis tools. However the difficulty in using existing tools is that they require users to provide robustness properties which need to be checked. Currently these properties which require source code and interface level information are mostly manually specified. This work proposes an FSA Generator framework that automatically generates concrete properties. Users only need to specify high level generic properties in simple finite state machines. The framework converts these generic properties into concrete language specific properties using source code information from a pattern database and interface level information from an API specification database. The automated cost effective generation of concrete properties makes static analysis scalable and efficient. Experimental evaluation using the generated properties and a static checker has found numerous robustness bugs in more than ten open source packages.
- In Regression Testing without Code(2007-08-03) Zheng, Jiang; Jason Osborne, Committee Member; Tao Xie, Committee Member; Mladen Vouk, Committee Member; Laurie Williams, Committee ChairSoftware products are often built from commercial-off-the-shelf (COTS) components. When new releases of these components are made available for integration and testing, source code is usually not provided by the COTS vendors. Various regression test selection (RTS) techniques have been developed and have been shown to be cost effective. However, the majority of these test selection techniques rely on source code for change identification and impact analysis. This dissertation presents an RTS process called Integrated - Black-box Approach for Component Change Identification (I-BACCI) for COTS-based applications. The I-BACCI process reduces the test suite based upon changes in the binary code of the COTS component using the firewall analysis RTS method. This dissertation also presents Pallino, the supporting automation that statically analyzes binary code to identify the code change and the impact of these changes. Based on the output of Pallino and the original test suit, testers can determine the regression test cases needed that execute the application glue code which is affected by the changed areas in the new version of the COTS component. Five case studies were conducted on ABB internal products written in C⁄C++ to determine the effectiveness and efficiency of the I-BACCI process. The total size of application and component for each release is about 340˜930 KLOC. The results indicate this process can reduce the required number of regression test by as much as 100% if there are a small number of changes in the new component in the best case. Similar to other RTS techniques, when there are many changes in the new component the I-BACCI process suggests a retest-all regression test strategy. With the help of Pallino, RTS via the I-BACCI process can be completed in about one to two person hours for each release of the case studies. Depending upon the percentage of test cases reduction determined by the I-BACCI process, the total time cost of the whole regression testing process can be reduced to 0.0003% of that by retest-all strategy in the best case. Currently, Pallino works on components in Common Object File Format or Portable Executable formats.
- Information Needs of Developers for Program Comprehension during Software Maintenance Tasks(2008-12-16) Layman, Lucas Michael; Laurie A. Williams, Committee Chair; Robert St. Amant, Committee Co-Chair; Tao Xie, Committee Member; Christopher B. Mayhorn, Committee Member; Jason A. Osborne, Committee MemberSoftware engineers undertaking maintenance tasks often work on unfamiliar code, requiring developers to search for, relate, and collect information relevant to the maintenance task. The goal of this research is to create theories that describe the nature of information sought by developers and how that information is used by developers during two types of maintenance tasks: debugging (corrective maintenance) and enhancement (perfective maintenance). To meet this goal, six hypotheses are investigated regarding the navigation activities undertaken by developers to identify, relate, and collect information during software maintenance tasks. These hypotheses were investigated using data from two empirical studies of 18 developers performing enhancement and debugging tasks on three Java programs. Video recordings were used to annotate user interaction logs to create a history of user activities during the maintenance tasks. These data described the activities developers undertake during maintenance tasks, what source code elements the developers examined, and the amount of time developers spent performing various activities. These data were analyzed using a combination of statistical and qualitative methods to compare the different methods of searching for and collecting information relevant to the software maintenance tasks. Analysis of the data showed that the navigation styles used by developers (static navigation, normal navigation, and keyword searching) to find information differ significantly in the amount of time spent collecting information. Furthermore, static navigation techniques were significantly shorter in duration than keyword search techniques. No statistically significant differences were observed in the amount of time developers spent collecting information in debugging and enhancement tasks. During debugging tasks, developers focused on information that controlled the state and behavior a particular element. During enhancement tasks, developers focused on how a element used other elements, rather than how an element is used by other elements. The analysis of the code relationships motivated further study of the nature of the information gathered by developers in enhancement and debugging tasks. The information read by developers (source code, Java documentation, and web search results) was analyzed with respect to the content of the information, how the information was related to the task and code elements being investigated, and how the information was used. This qualitative analysis led to the following new theories on software maintenance: Theory 1: Developers are less likely to progress toward completing a maintenance task when the correct implementation of new code or correct editing of existing code requires logical connections and/or evaluations of other code elements. Theory 2: New code that has been duplicated from another source acts as a self-reference, thereby requiring developers to make fewer logical evaluations and increasing the likelihood the duplicated information will be successfully used in completing a task. Theory 3: Specific software behavior is often identified through analysis of a sequence of events and the control structures that propagate those events through the system, whereas a functional concept is often identified through comparisons, similarities, and references of existing functionality. These theories are new contributions to the field of software maintenance and program comprehension theory. These theories can be further evaluated to help guide the creation of tools and strategies for assisting developers in finding relevant information during software maintenance tasks. One such tool, the Mimec Spotlight, has been proposed and evaluated in this research.
- Mechanisms for Protecting Software Integrity in Networked Systems(2008-12-02) Kil, Chongkyung; Tao Xie, Committee Member; Peng Ning, Committee Chair; Douglas S. Reeves, Committee Member; S. Purushothaman Iyer, Committee MemberProtecting software integrity is a key to successfully maintain its own credibility and reduce the financial and technical risks caused from a lack of integrity. Although researchers have been putting effort on improving software development techniques and preventing human errors during the software development process, it is still a daunting task to make non-vulnerable software in practice. For example, the national vulnerability database shows that a set of new software vulnerabilities are discovered every day. Since developing non-vulnerable software is hardly achievable, in this research, we look for a way to achieve software integrity while they are used. In particular, this dissertation investigates three mechanisms to protect software integrity at runtime. Firstly, this dissertation presents a protection mechanism that can thwart attacks that try to exploit memory corruption vulnerabilities of software. The protection mechanism is provided by randomizing the program's runtime memory address layout and the memory objects. As a result, it hinders memory corruption attacks by preventing an attacker being able to easily predict their target addresses. The protection mechanism is implemented by a novel binary rewriting tool that can randomly place the code and data segments of programs and perform fine-grained permutation of function bodies in the code segment as well as global variables in the data segment. Our evaluation results show minimal performance overhead with orders of magnitude improvement in randomness. Secondly, this dissertation investigates a vulnerability identification mechanism named as CBones that can discover how unknown vulnerabilities in C programs are exploited by verifying program structural constraints. This mechanism is also useful in developing integrity patches for vulnerable programs where applying security patch is increasingly common in these days. CBones automatically extracts a set of program structural constraints via binary analysis of the compiled program executable. CBone then verifies these constraints while it monitors the program execution to detect and isolate the security bugs. Our evaluation with real-world applications that known to have vulnerabilities shows that CBones can discover all integrity vulnerabilities with no false alarms, pinpoint the corrupting instructions, and provide information to facilitate the understanding of how an attack exploits a security bug. Lastly, this dissertation identifies the need of dynamic attestation to overcome the limitations of existing remote attestation approaches. To the best of our knowledge, we are the first to introduce the notion of dynamic attestation and propose use of dynamic system properties to provide the integrity proof of a running system. To validate our idea, we develop an application-level dynamic attestation system named as ReDAS(Remote Dynamic Attestation System) that can verify runtime integrity of software. ReDAS provides the integrity evidence of runtime applications by checking their dynamic properties: structural integrity and global data integrity. These properties are collected from each application, representing the application's unique runtime behavior that must be satisfied at runtime. ReDAS also uses hardware support provided by TPM to protect the integrity evidence from potential attacks. Our evaluation with real-world applications shows that ReDAS is effective in capturing runtime integrity violations with zero false alarms, and demonstrates that ReDAS incurs 8% overhead on average while performing integrity measurements.
- Performance Requirements Improvement with an Evolutionary Model(2008-11-13) Ho, Chih-Wei; Laurie A. Williams, Committee Chair; Annie I. Antón, Committee Member; Tao Xie, Committee Member; Mladen A. Vouk, Committee MemberPerformance is an important property for a software system. Performance requirements should be specified and validated before detail design starts. However, some factors that can affect the system performance may not be available during early phases of software development. This dissertation proposes Performance Refinement and Evolution Model (PREM), an evolutionary model for performance requirements specification. A development team may use PREM to specify simple PRs early in the software development process, and add more details when the team gains more knowledge of the system performance. Two performance requirements improvement approaches were designed based based on PREM. The first approach, called Performance Requirement Improvement from Failure Reports (PRIFF), uses the field failure reported by the customers to improve performance requirements. PRIFF was applied on the requirements and field failure reports for a commercial distributed software system. The results demonstrate that the information in the field failure reports was integrated into the requirements that could be used for the next release. The resulting performance requirements are more complete and more specific than the original ones. The second approach, called DeNaP, improves the PRs with defect reports that are designated as not a problem (NaP). If a defect report is designated as NaP, the development team does not take any action on the defect report. A NaP defect report wastes the time of the development team and other key stakeholders since resources are spent on analyzing the problem but, in the end, the quality of the software is not improved. Reducing the NaP occurrence rate improves the efficiency of the development team. DeNaP was applied on a firmware development project of an embedded control module from ABB Inc and a file processing system from EMC Corporation. After applying DeNaP, we were able to create new performance requirements and refine the original ones from the NaP defect reports. A survey was conducted to examine the development teams’ reaction to the resulting performance requirements. The results show that more than half of the defect reports could have been avoided given the resulting performance requirements from DeNaP.
- Predicting Attack-prone Components with Source Code Static Analyzers(2009-08-05) Gegick, Michael; Laurie Williams, Committee Chair; Tao Xie, Committee Member; Mladen Vouk, Committee Member; Jason Osborne, Committee MemberNo single vulnerability detection technique can identify all vulnerabilities in a software system. However, the vulnerabilities that are identified from a detection technique may be predictive of the residuals. We focus on creating and evaluating statistical models that predict the components that contain the highest risk residual vulnerabilities. The cost to find and fix faults grows with time in the software life cycle (SLC). A challenge with our statistical models is to make the predictions available early in the SLC to afford for cost-effective fortifications. Source code static analyzers (SCSA) are available during coding phase and are also capable of detecting code-level vulnerabilities. We use the code-level vulnerabilities identified by these tools to predict the presence of additional coding vulnerabilities and vulnerabilities associated with the design and operation of the software. The goal of this research is to reduce vulnerabilities from escaping into the field by incorporating source code static analysis warnings into statistical models that predict which components are most susceptible to attack. The independent variable for our statistical model is the count of security-related source SCSA warnings. We also include the following metrics as independent variables in our models to determine if additional metrics are required to increase the accuracy of the model: non-security SCSA warnings, code churn and size, the count of faults found manually during development, and the measure of coupling between components. The dependent variable is the count of vulnerabilities reported by testing and those found in the field. We evaluated our model on three commercial telecommunications software systems. Two case studies were performed at an anonymous vendor and the third case study was performed at Cisco Systems. Each system is a different technology and consists of over one million source lines of C/C++ code. The results show positive and statistically significant correlations between the metrics and vulnerability counts. Additionally, the predictive models produce accurate probability rankings that indicate which components are most susceptible to attack. The models are evaluated with receiver operating characteristic curves where each case study showed over 92% of the area was under the curve. We also performed five-fold cross-validation to further demonstrate statistical confidence in the models. Based on these results we contribute the following theories: Theory 1: Large proportions of source code static analysis warnings are in the same components as other vulnerabilities that are likely to be exploited. Theory 2: Additional metrics including non-security source code static analysis warnings, code churn and size, coupling, and faults found manually increase the accuracy of a statistical model that uses security-related source code static analysis warnings alone. Components that contain security-related warnings identified by SCSA are also likely to contain other exploitable vulnerabilities. Software engineers should systematically inspect and test code for other vulnerabilities when a security-related warning is present. Fortifying these vulnerabilities may facilitate other techniques to identify more undetected vulnerabilities.
- A Systematic Model Building Process for Predicting Actionable Static Analysis Alerts(2009-07-07) Heckman, Sarah Smith; Steffen Heber, Committee Member; Laurie Williams, Committee Chair; Tao Xie, Committee Member; Robert St. Amant, Committee MemberAutomated static analysis tools can identify potential source code anomalies, like null pointers, buffer overflows, and unclosed streams that could lead to field failures. These anomalies, which we call alerts, require inspection by a developer to determine if the alert is important enough to fix. Actionable alert identification techniques can supplement automated static analysis tools by classifying or prioritizing the alerts generated by automated static analysis such that the likelihood of a developer inspecting actionable alerts first is increased. By classifying and prioritizing actionable static analysis alerts, the developer will focus his or her time on inspecting and fixing actionable alerts rather than inspecting and suppressing unactionable alerts. The goal of my research is to reduce inspection time by accurately predicting actionable and unactionable alerts when using static analysis by creating and validating a systematic actionable alert identification model. The Systematic Actionable Alert Identification (SAAI) process uses machine learning to identify actionable alerts. Investigation of the following three hypotheses will inform the goal of my research: - Hypothesis 1: The artifact characteristics of an alert and the surrounding source code are predictive of the actionability of an alert. - Hypothesis 2: A systematic actionable alert identification technique using machine learning can accurately identify actionable alerts. - Hypothesis 3: A systematic actionable alert identification technique using machine learning is project specific. A benchmark, FAULTBENCH, provides the evaluation framework for the proposed SAAI model building process and comparison with other actionable alert identification techniques. The dissertation presents a feasibility study and three empirical studies evaluating the hypotheses above. The feasibility study evaluates an adaptive actionable alert identification technique that utilizes the alert’s type and code location in addition to developer feedback to prioritize actionable alerts. The first empirical study investigates hypotheses 1-3 using FAULTBENCH on 15 SAAI models generated on five treatments for each of three subject programs. The treatments considered different grouping of alerts within revisions to train and test SAAI. The second empirical study is a comparative evaluation of the generated SAAI models with other actionable alert identification techniques in further evaluation of Hypothesis 2. Additionally, an empirical user study was conducted where students in the senior capstone project course used a custom SAAI model during development of their software project. Selection of predictive artifact characteristics as part of the SAAI process suggests the acceptance of hypothesis 1. All but four of the 58 artifact characteristics used to build SAAI models were in one or more of the artifact characteristics subsets. The SAAI model identified actionable and unactionable alerts with greater than 90% accuracy for eight of the 15 FAULTBENCH subject treatments. Comparing SAAI models with other actionable alert identification techniques from literature found that SAAI models had the highest accuracy for 11 of the 15 treatments when classifying the full alert sets. Both of the above results support hypothesis 2. Due to accuracies greater than 90% when applying artifact characteristic subsets and machine learning algorithms for one subject program to another subject program, hypothesis 3 is not supported on the evaluated subject programs. The contributions of this work are as follows: - A systematic actionable alert identification model building process to predict actionable and unactionable automated static analysis alerts; - A benchmark, FAULTBENCH, for evaluating and comparing actionable alert identification techniques; and - A comparative evaluation of systematic actionable alert identification models with other actionable alert identification techniques from literature.
- Trace Based Performance Characterization and Optimization(2007-06-20) Marathe, Jaydeep Prakash; Vincent Freeh, Committee Member; Yan Solihin, Committee Member; Tao Xie, Committee Member; Frank Mueller, Committee ChairProcessor speeds have increased dramatically in the recent past, but improvement in memory access latencies has not kept pace. As a result, programs that do not make efficient use of the processor caches tend to become increasing memory-bound and do not experience speedups with increasing processor frequency. In this thesis, we present tools to characterize and optimize the memory access patterns of software programs. Our tools use the program's memory access trace as a primary input for analysis. Our efforts encompass two broad areas --- performance analysis and performance optimization. With performance analysis, our focus is on automating the analysis process as far as possible and on presenting the user with a rich set of metrics, both for single-threaded and multi-threaded programs. With performance optimization, we go one step further and perform automatic transformations based on observed program behavior. We make the following contributions in this thesis. First, we explore different tracing strategies --- software tracing with dynamic binary instrumentation, hardware-based tracing exploiting support found in contemporary microprocessors and a hybrid scheme that leverages hardware support with certain software modifications. Second, we present a range of performance analysis and optimization tools based on these trace inputs and additional auxiliary instrumentation. Our first tool, METRIC, characterizes the memory performance of single-threaded programs. Our second tool, ccSIM extends METRIC to characterize the coherence behavior of multithreaded OpenMP benchmarks. Our third tool extends ccSIM to work with hardware-generated and hybrid trace inputs. These three tools represent our performance analysis efforts. We also explore automated performance optimization with our remaining tools. Our fourth tool uses hardware-generated traces for automatic page placement in cache coherent non-uniform memory architectures (ccNUMA). Finally, our fifth tool explores a novel trace-driven instruction-level software data prefetching strategy. Overall, we demonstrate that memory traces represent a rich source of information about a program's behavior and can be effectively used for a wide range of performance analysis and optimization strategies.
