A Transparent Collaborative Framework for Efficient Data Analysis and Knowledge Annotation on the Web
| dc.contributor.advisor | Professor Nagiza F. Samatova, Committee Chair | en_US |
| dc.contributor.advisor | Professor Steffen Heber, Committee Member | en_US |
| dc.contributor.advisor | Professor Tao Xie, Committee Member | en_US |
| dc.contributor.advisor | Professor Mladen Vouk, Committee Member | en_US |
| dc.contributor.author | Breimyer, Paul William | en_US |
| dc.date.accessioned | 2010-04-02T18:42:38Z | |
| dc.date.available | 2010-04-02T18:42:38Z | |
| dc.date.issued | 2009-07-23 | en_US |
| dc.degree.discipline | Computer Science | en_US |
| dc.degree.level | dissertation | en_US |
| dc.degree.name | PhD | en_US |
| dc.description | North Carolina State University Theses Computer Science. | |
| dc.description.abstract | High-throughput experiments and ultrascale computing generate scientific data of growing size and complexity. These trends challenge traditional data analysis environments, most of which are based on scripting languages such as R, MATLAB or IDL, in a number of ways. To address some of these challenges, this research proposes a framework with the overarching goal to enable large-scale high-performance data analytics and collaborative knowledge annotation over the Web. The proposed framework has three major components, which parallel the three core steps of the knowledge discovery cycle. For the first step, defining the data analysis pipeline, the research designs and implements a Web-enabled interactive and collaborative statistical R-based environment. The component implements a memory management system that minimizes memory requirements thereby enabling multi-user scalability. To the best of our knowledge, this is the first Web-enabled R system that supports interactive remote access to R servers and enables users to share data, results and analysis sessions. For the second step, executing the data analysis pipeline, the research investigates and proposes a transparent and low-overhead means for executing external compiled language parallel codes from within R, thus seamlessly bridging two code development paradigms: efficient, compiled parallel codes and high abstraction and easy-to-use scripting codes. This component contains three elements: a transparent bidirectional translation of data objects between R and compiled languages, such as C/C++/Fortran; seamless integration of external parallel codes; and automatic parallelization of data-parallel computations in hybrid multi-core and multi-node execution environments. For the third step, annotating the predictive knowledge derived from community analysis pipelines, the research explores an environment for semantically rich, structured and queriable annotation of facts, relationships between those facts, and complex events reported in scientific literature. The social networking nature of this component allows the community to improve the predictions as well as generate new, higher-level inferences, thus filling in the gaps in the communities' understanding of physical phenomena. The environment offers mechanisms for streamlining the annotated and curated knowledge into distributed public databases, thus enabling a feedback loop into the database-publication cycle to allow scientists to make connections between data-driven predictions and published evidence. | en_US |
| dc.format | Thesis (Ph.D.)--North Carolina State University. | |
| dc.identifier.other | etd-06302009-132544 | en_US |
| dc.identifier.uri | http://www.lib.ncsu.edu/resolver/1840.16/4020 | |
| dc.rights | I hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dis sertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to NC State University or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report. | en_US |
| dc.subject | statistical data analysis | en_US |
| dc.subject | Web | en_US |
| dc.subject | annotation | en_US |
| dc.title | A Transparent Collaborative Framework for Efficient Data Analysis and Knowledge Annotation on the Web | en_US |
| dcterms.abstract | Keywords: statistical data analysis, web, annotation. | |
| dcterms.extent | ix, 122 pages : illustrations (some color) |
Files
Original bundle
1 - 1 of 1
