Clustering of Mixed Data Types with Application to Toxicogenomics

dc.contributor.advisorGreg C. Gibson, Committee Chairen_US
dc.contributor.advisorRussell D. Wolfinger, Committee Memberen_US
dc.contributor.advisorSpencer V. Muse, Committee Memberen_US
dc.contributor.advisorRobert C. Smart, Committee Memberen_US
dc.contributor.authorBushel, Pierre Roberten_US
dc.date.accessioned2010-04-02T18:41:19Z
dc.date.available2010-04-02T18:41:19Z
dc.date.issued2006-04-25en_US
dc.degree.disciplineBioinformaticsen_US
dc.degree.leveldissertationen_US
dc.degree.namePhDen_US
dc.description.abstractDNA microarray analysis provides unprecedented capabilities for simultaneous measurement of genome-wide alterations in transcription levels. Toxicogenomics bridges gene and protein expression analyses with conventional toxicology to elucidate a global view of the toxic outcomes and mechanistic changes elicited by toxicant exposure and environmental stressors to biological systems. Inherent in toxicogenomics data are systematic error, stochastic variation and disparate measurement domains and types which complicate the acquisition of significant, meaningful and broad biological interpretations from analysis of the data. In this dissertation, a classification regimen comprised of analysis of replicate data, outlier diagnostics and gene selection procedures was employed to utilize microarray data for categorization of sub-classes of biological samples exposed to pharmacologic agents. To assess contrasts of centrilobular congestion severity of the rat liver subsequent to exposure with acetaminophen (APAP), microarray data, clinical chemistry evaluations and histopathology observations were integrated in a database and analyzed using mixed linear model approaches. Finally, the k-prototype algorithm with a mixed objective function comprised of the sum of the squared Euclidean distance to measure the dissimilarity of samples based on microarray array and clinical chemistry numeric data features and simple matching to measure the dissimilarity of the samples based on histopathology features with categorical values, was modified (Modk-prototypes) to the specifications of k-means clustering. In addition, the objective function included weighting terms for the microarray, clinical chemistry and histopathology domain data in order to computationally integrate the data as well as constrain the clustering of the APAP-treated samples according to similarity of gene expression and toxicological profiles. Simulated annealing optimization of the Modk (SA-Modk) —prototypes algorithm was used to validate the clustering of the APAP-treated samples. The clusters were vetted for gene expression and toxicological (VETed) k-prototypes features that discerned clusters from one another. The VETed k-prototypes are shown to be ideal for distinguishing between zero, minimal, and moderate levels of necrosis of the hepatocytes and centrilobular region of the rat liver that are end-point representations of the clusters of APAP-treated samples. In this dissertation, chapter 1 is an introduction to general toxicology, microarray gene expression array platforms, experimental designs, preprocessing of the data and gene selection approaches, toxicogenomics as it applies to compound classification and phenotypic anchoring of gene expression, databases and informatics resources for toxicogenomics and clustering of mixed data types. Chapter 2 is dedicated to statistical validation and significance of differentially expressed genes as well as sub-categorization of samples exposed to phenobarbital and peroxisome proliferators clofibrate, gemfibrozil and Wyeth 14, 643. Chapter 3 presents integration of microarray data with clinical chemistry and histopathology data to contrast levels of centrilobular congestion of the rat liver by mixed linear modeling of gene expression ratio values acquired from rats exposed to APAP. Chapter 4 describes the utilization of a modified k (Modk) —prototypes objective function and algorithm, and simulated annealing optimization version of the Modk (SA-Modk)-prototypes objective function, for computational integration of microarray, clinical chemistry and histopathology mixed numeric and categorical data. It also includes partitioning of APAP-treated biological samples into clusters which contain vetted expression and toxicological (VETed) k-prototypes features that distinguish between levels of necrosis of the hepatocytes and centrilobular region of the rat liver. In chapter 5, a conclusion of the research, development and analyses presented in this dissertation is provided.en_US
dc.identifier.otheretd-03172005-091928en_US
dc.identifier.urihttp://www.lib.ncsu.edu/resolver/1840.16/3976
dc.rightsI hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to NC State University or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.en_US
dc.subjecttoxicologyen_US
dc.subjectdatabaseen_US
dc.subjectgene expressionen_US
dc.subjecttoxicogenomicsen_US
dc.subjectgenomic sciencesen_US
dc.subjectmicroarrayen_US
dc.subjectclusteringen_US
dc.subjectstatisticsen_US
dc.subjectbioinformaticsen_US
dc.titleClustering of Mixed Data Types with Application to Toxicogenomicsen_US

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
etd.pdf
Size:
1.19 MB
Format:
Adobe Portable Document Format

Collections