|
NCSU Institutional Repository >
NC State Theses and Dissertations >
Dissertations >
Please use this identifier to cite or link to this item:
http://www.lib.ncsu.edu/resolver/1840.16/4168
|
| Title: | Data Mining Techniques to Enable Large-scale Exploratory Analysis of Heterogeneous Scientific Data |
| Authors: | Chopra, Pankaj |
| Advisors: | Dr. Steffen Heber, Committee Co-Chair Dr. Xiaosong Ma, Committee Member Dr. Donald L. Bitzer, Committee Chair Dr. Ting Yu, Committee Member Dr. Jaewoo Kang, Committee Member |
| Keywords: | pathway analysis data mining gene expression data mining genetic pathways microarray data mining microarray clustering |
| Issue Date: | 24-Apr-2009 |
| Degree: | PhD |
| Discipline: | Computer Science |
| Abstract: | Recent advances in microarray technology have enabled scientists to simultaneously gather data on thousands of genes. However, due to the complexity of genetic interactions, the function and purpose of many genes remains unclear. The cause and progression of many diseases, like cancer and Alzheimer's, is increasingly being attributed to the deregulation of critical genetic pathways. Data mining is now being extensively used in biological datasets to infer gene function, and to identify genetic biomarkers for disease prognosis and treatment. There is a considerable need to design algorithms that explore and interpret the underlying microarray data from a biological perspective.
In this thesis, three areas of data mining in heterogeneous biological datasets have been addressed. First, a new clustering algorithm has been designed that leverages information on known gene functions. Most conventional clustering algorithms generate only one set of clusters, irrespective of the biological context of the analysis. This is often inadequate to explore data from different biological perspectives and gain new insights. The new clustering model generates multiple versions of different clusters from a single dataset, each of which highlights a different aspect of the given dataset. Second, a new classification algorithm has been designed that uses gene pairings for cancer classification. This exploits the concept that due to genetic interactions, gene pairs may be a better metric for cancer classification compared to single genes. Third, a meta-analysis of human and mouse cancer datasets is conducted. The results are then integrated with gene ontology and pathway knowledge to highlight pathways that are closely implicated in the cause and progression of cancer. |
| URI: | http://www.lib.ncsu.edu/resolver/1840.16/4168 |
| Appears in Collections: | Dissertations
|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
|