Clustering of Mixed Data Types with Application to Toxicogenomics

Show simple item record

dc.contributor.advisor Greg C. Gibson, Committee Chair en_US
dc.contributor.advisor Russell D. Wolfinger, Committee Member en_US
dc.contributor.advisor Spencer V. Muse, Committee Member en_US
dc.contributor.advisor Robert C. Smart, Committee Member en_US Bushel, Pierre Robert en_US 2010-04-02T18:41:19Z 2010-04-02T18:41:19Z 2006-04-25 en_US
dc.identifier.other etd-03172005-091928 en_US
dc.description.abstract DNA microarray analysis provides unprecedented capabilities for simultaneous measurement of genome-wide alterations in transcription levels. Toxicogenomics bridges gene and protein expression analyses with conventional toxicology to elucidate a global view of the toxic outcomes and mechanistic changes elicited by toxicant exposure and environmental stressors to biological systems. Inherent in toxicogenomics data are systematic error, stochastic variation and disparate measurement domains and types which complicate the acquisition of significant, meaningful and broad biological interpretations from analysis of the data. In this dissertation, a classification regimen comprised of analysis of replicate data, outlier diagnostics and gene selection procedures was employed to utilize microarray data for categorization of sub-classes of biological samples exposed to pharmacologic agents. To assess contrasts of centrilobular congestion severity of the rat liver subsequent to exposure with acetaminophen (APAP), microarray data, clinical chemistry evaluations and histopathology observations were integrated in a database and analyzed using mixed linear model approaches. Finally, the k-prototype algorithm with a mixed objective function comprised of the sum of the squared Euclidean distance to measure the dissimilarity of samples based on microarray array and clinical chemistry numeric data features and simple matching to measure the dissimilarity of the samples based on histopathology features with categorical values, was modified (Modk-prototypes) to the specifications of k-means clustering. In addition, the objective function included weighting terms for the microarray, clinical chemistry and histopathology domain data in order to computationally integrate the data as well as constrain the clustering of the APAP-treated samples according to similarity of gene expression and toxicological profiles. Simulated annealing optimization of the Modk (SA-Modk) —prototypes algorithm was used to validate the clustering of the APAP-treated samples. The clusters were vetted for gene expression and toxicological (VETed) k-prototypes features that discerned clusters from one another. The VETed k-prototypes are shown to be ideal for distinguishing between zero, minimal, and moderate levels of necrosis of the hepatocytes and centrilobular region of the rat liver that are end-point representations of the clusters of APAP-treated samples. In this dissertation, chapter 1 is an introduction to general toxicology, microarray gene expression array platforms, experimental designs, preprocessing of the data and gene selection approaches, toxicogenomics as it applies to compound classification and phenotypic anchoring of gene expression, databases and informatics resources for toxicogenomics and clustering of mixed data types. Chapter 2 is dedicated to statistical validation and significance of differentially expressed genes as well as sub-categorization of samples exposed to phenobarbital and peroxisome proliferators clofibrate, gemfibrozil and Wyeth 14, 643. Chapter 3 presents integration of microarray data with clinical chemistry and histopathology data to contrast levels of centrilobular congestion of the rat liver by mixed linear modeling of gene expression ratio values acquired from rats exposed to APAP. Chapter 4 describes the utilization of a modified k (Modk) —prototypes objective function and algorithm, and simulated annealing optimization version of the Modk (SA-Modk)-prototypes objective function, for computational integration of microarray, clinical chemistry and histopathology mixed numeric and categorical data. It also includes partitioning of APAP-treated biological samples into clusters which contain vetted expression and toxicological (VETed) k-prototypes features that distinguish between levels of necrosis of the hepatocytes and centrilobular region of the rat liver. In chapter 5, a conclusion of the research, development and analyses presented in this dissertation is provided. en_US
dc.rights I hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to NC State University or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report. en_US
dc.subject toxicology en_US
dc.subject database en_US
dc.subject gene expression en_US
dc.subject toxicogenomics en_US
dc.subject genomic sciences en_US
dc.subject microarray en_US
dc.subject clustering en_US
dc.subject statistics en_US
dc.subject bioinformatics en_US
dc.title Clustering of Mixed Data Types with Application to Toxicogenomics en_US PhD en_US dissertation en_US Bioinformatics en_US

Files in this item

Files Size Format View
etd.pdf 1.185Mb PDF View/Open

This item appears in the following Collection(s)

Show simple item record