Mining API Specifications from Source Code for Improving Software Reliability

Show simple item record

dc.contributor.advisor David Thuente, Committee Member en_US
dc.contributor.advisor Helen Gu, Committee Member en_US
dc.contributor.advisor Tao Xie, Committee Chair en_US
dc.contributor.advisor Douglas Reeves, Committee Member en_US Acharya, Mithun Puthige en_US 2010-04-02T18:37:51Z 2010-04-02T18:37:51Z 2009-04-27 en_US
dc.identifier.other etd-03222009-100211 en_US
dc.description.abstract A software system interacts with third-party libraries through various Application Program Interfaces (APIs). Using these APIs correctly often needs to follow certain programming rules, i.e., API specifications. API specifications specify the required checks (on API input parameters and return values) and other APIs to be invoked before (preconditions) and after (postconditions) an API call. Incorrect usage of APIs (in short, API violations) can lead to security and robustness problems, two primary hindrances for the reliable operation of a software system. Hence, for a software system, adherence to the specifications, which govern the correct usage of APIs used by the system, is paramount for software reliability. Specifications, when known, can be formally written for third-party APIs and statically verified against a software system. This dissertation addresses two main problems faced by programmers in effectively and correctly reusing third-party APIs. (1) Formal API specifications are complicated and lengthy mainly due to the various API details (such as input/return type, error-flag codes, and return values for APIs on success/failure) and language-specific syntax considerations required for the specification to be accurate and complete. Hence, manually writing a large number of formal API specifications, when known, for static verification is often inaccurate or incomplete, apart from being cumbersome. (2) API specifications are not well documented by the API developers and are often not known to programmers who reuse third-party APIs in the first place. API specifications cut across procedural boundaries and an attempt to infer these specifications by manual inspection of source code (API client code) is often inefficient and inaccurate. This dissertation proposes a novel framework to address the aforementioned problems faced by programmers in reusing third-party APIs. Our framework comprises related approaches to aid programmers in constructing API specifications for static verification. First, when API specifications are known, to encourage the use of formal verification in the software development cycle, we present an approach to automatically construct formal API specifications for static verification from generic, user-specified specification templates, which are free from language-specific syntax and API details. Second, when API specifications are not known, to automatically mine them from source code (API client code), we present an approach to generate static traces from the source code; we then present novel applications of data mining techniques on the generated static traces for specification mining. Our approach allows mining of software systems that reuse APIs without requiring environment setup for system executions or availability of sufficiently high-quality system tests. We apply our trace mining approach on several popular open-source packages to mine API specifications and detect violations, without requiring any user input. Finally, we conduct an empirical analysis of the characteristics of API specifications in practice such as the distance of pre/postcondition enforcement points in a program to their corresponding call sites and the extent of aliasing between these points and call sites (involving API input parameters and return values) in large open-source packages. These characteristics, as we demonstrate, have implications on the cost and precision of the inter-procedural and alias analysis required for specification mining and violation detection algorithms, and hence, on the scalability and the false-positive rate of the algorithms. en_US
dc.rights I hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dis sertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to NC State University or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report. en_US
dc.subject Reliability en_US
dc.subject Static Analysis en_US
dc.subject Mining API Specifications en_US
dc.title Mining API Specifications from Source Code for Improving Software Reliability en_US PhD en_US dissertation en_US Computer Science en_US

Files in this item

Files Size Format View
etd.pdf 1.258Mb PDF View/Open

This item appears in the following Collection(s)

Show simple item record