Fault Tolerance and Reliability in Scientific Workflows

dc.contributor.advisorPeter Wurman, Committee Memberen_US
dc.contributor.advisorMunindar Singh, Committee Memberen_US
dc.contributor.advisorMladen Vouk, Committee Chairen_US
dc.contributor.authorMouallem, Pierreen_US
dc.date.accessioned2010-04-02T17:54:36Z
dc.date.available2010-04-02T17:54:36Z
dc.date.issued2005-05-31en_US
dc.degree.disciplineComputer Scienceen_US
dc.degree.levelthesisen_US
dc.degree.nameMSen_US
dc.description.abstractThe emerging technologies of web services, agents and service-oriented workflows will enable scientific projects and experiments to be conducted on a larger scale than ever before. Data used and produced in such projects and experiments become increasingly complex and heterogeneous. Thus the need for a tool (or a set of tools) to efficiently design, manage and maintain problem solving flows (scientific workflows) using various components. The DOE Scientific Data Management (SDM) initiative aims to develop a framework that helps scientists to manage data in distributed and collaborative environments. It also provides tools that help them create and manage scientific workflows that use network-based (web) services, agent technologies and semantic mediation techniques. The current SDM's framework is known as SPA/Kepler and is Ptolemy II based. One of the vulnerabilities of service dependent workflows is that they require that the web services they use to be available whenever the workflow is run. If key web services are not available, the workflow cannot finish successfully. At that point a scientist using such as service would have to wait for it to be restored, This, of course, impacts workflows reliability and availability, and may be sufficient for an end-user to stop using workflows that use those services.. The work reported here uses the SPA/Kepler framework to explore the issue of reliability of service-based scientific workflows. For example, a workflow that invokes 3 services in a series may have .an acceptably high overall failure probability. This thesis explores the issues related to improvement of the overall workflow reliability using fault tolerance. Specifically, the work focuses on failure-masking and fail-over through redundancy, and in the context of individual services, rather than on provision of checkpointing and recovery.. Analyses show that even a relatively simple redundancy based fault-tolerance approach, such as duplication of key services, can provide an order of magnitude or better reliability. In the context of an actual implementation, one option is to find locations of alternative (functionally equivalent) services during workflow design, and then use that information at run-time if the primary service fails. A more practical method is to publish the list of services used by the workflow to a UDDI type service and have a way of dynamically matching needed services with functionally equivalent ones if a fail-over is required. A prototype solution of the latter, based on a commercially available brokering service, was developed for one of the SDM pilot workflows to show its viability. It is discussed in detail.en_US
dc.identifier.otheretd-05262005-123723en_US
dc.identifier.urihttp://www.lib.ncsu.edu/resolver/1840.16/306
dc.rightsI hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to NC State University or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.en_US
dc.subjectFault Toleranceen_US
dc.subjectReliabilityen_US
dc.subjectScientific Worflowsen_US
dc.subjectWeb Servicesen_US
dc.titleFault Tolerance and Reliability in Scientific Workflowsen_US

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
etd.pdf
Size:
998.18 KB
Format:
Adobe Portable Document Format

Collections