Browsing by Author "Russell D. Wolfinger, Committee Member"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
- Clustering of Mixed Data Types with Application to Toxicogenomics(2006-04-25) Bushel, Pierre Robert; Greg C. Gibson, Committee Chair; Russell D. Wolfinger, Committee Member; Spencer V. Muse, Committee Member; Robert C. Smart, Committee MemberDNA microarray analysis provides unprecedented capabilities for simultaneous measurement of genome-wide alterations in transcription levels. Toxicogenomics bridges gene and protein expression analyses with conventional toxicology to elucidate a global view of the toxic outcomes and mechanistic changes elicited by toxicant exposure and environmental stressors to biological systems. Inherent in toxicogenomics data are systematic error, stochastic variation and disparate measurement domains and types which complicate the acquisition of significant, meaningful and broad biological interpretations from analysis of the data. In this dissertation, a classification regimen comprised of analysis of replicate data, outlier diagnostics and gene selection procedures was employed to utilize microarray data for categorization of sub-classes of biological samples exposed to pharmacologic agents. To assess contrasts of centrilobular congestion severity of the rat liver subsequent to exposure with acetaminophen (APAP), microarray data, clinical chemistry evaluations and histopathology observations were integrated in a database and analyzed using mixed linear model approaches. Finally, the k-prototype algorithm with a mixed objective function comprised of the sum of the squared Euclidean distance to measure the dissimilarity of samples based on microarray array and clinical chemistry numeric data features and simple matching to measure the dissimilarity of the samples based on histopathology features with categorical values, was modified (Modk-prototypes) to the specifications of k-means clustering. In addition, the objective function included weighting terms for the microarray, clinical chemistry and histopathology domain data in order to computationally integrate the data as well as constrain the clustering of the APAP-treated samples according to similarity of gene expression and toxicological profiles. Simulated annealing optimization of the Modk (SA-Modk) —prototypes algorithm was used to validate the clustering of the APAP-treated samples. The clusters were vetted for gene expression and toxicological (VETed) k-prototypes features that discerned clusters from one another. The VETed k-prototypes are shown to be ideal for distinguishing between zero, minimal, and moderate levels of necrosis of the hepatocytes and centrilobular region of the rat liver that are end-point representations of the clusters of APAP-treated samples. In this dissertation, chapter 1 is an introduction to general toxicology, microarray gene expression array platforms, experimental designs, preprocessing of the data and gene selection approaches, toxicogenomics as it applies to compound classification and phenotypic anchoring of gene expression, databases and informatics resources for toxicogenomics and clustering of mixed data types. Chapter 2 is dedicated to statistical validation and significance of differentially expressed genes as well as sub-categorization of samples exposed to phenobarbital and peroxisome proliferators clofibrate, gemfibrozil and Wyeth 14, 643. Chapter 3 presents integration of microarray data with clinical chemistry and histopathology data to contrast levels of centrilobular congestion of the rat liver by mixed linear modeling of gene expression ratio values acquired from rats exposed to APAP. Chapter 4 describes the utilization of a modified k (Modk) —prototypes objective function and algorithm, and simulated annealing optimization version of the Modk (SA-Modk)-prototypes objective function, for computational integration of microarray, clinical chemistry and histopathology mixed numeric and categorical data. It also includes partitioning of APAP-treated biological samples into clusters which contain vetted expression and toxicological (VETed) k-prototypes features that distinguish between levels of necrosis of the hepatocytes and centrilobular region of the rat liver. In chapter 5, a conclusion of the research, development and analyses presented in this dissertation is provided.
- Computational Methods for Identifying and Characterizing the Human Gene Regulatory Regions and Cis-elements(2005-11-23) Huang, Weichun; Leping Li, Committee Member; Bruce S. Weir, Committee Chair; William R. Atchley, Committee Member; Jeffrey L. Thorne, Committee Member; Russell D. Wolfinger, Committee MemberThe identification of functional regulatory regions and cis-elements is a preliminary step toward the reconstruction of gene regulatory networks. Comparative genomics has been demonstrated to be a powerful approach for motif discovery. However, the accurate alignment of complex genomic sequences, especially those of mammalians, remains a computational challenge. In chapter 2, we propose a novel pairwise alignment system, ACANA, to improve the alignment quality of genomic sequences. Compared with top competing alignment tools, ACANA achieves better alignment quality in aligning divergent sequences for both local and global alignments. When applied to the upstream sequences of human-mouse orthologs, ACANA is able to reliably detect the conserved functional regions containing most cis-elements. Statistical motif modeling is another fundamental computational approach for motif prediction in large genome sequence. In chapter 3, we introduce the mixture of optimized Markov models to reduce false motif discovery rate in large genomic sequences. Our model is not only able to incorporate most dependency information within a motif by optimizing the arrangement of motif positions, but also flexible for adjusting model complexity limited by the size of training data. We implement the mixture model in our OMiMa system. Using OMiMa, we demonstrate that our model can improve motif prediction accuracy. Although the reconstruction of complete human gene regulatory networks, at present, remains a distant hope, it is still possible to infer some distinct features of the networks from the available data. In chapter 4, we present an example of inferring major evolutionary features of human gene regulatory networks by combining information from both gene sequence data and functional annotations. We systematically analyze the association between gene function and upstream region conservation for human-rodent orthologs. Our study shows that upstream regulatory regions of developmental transcription regulators, such as Hox genes, are extremely conserved while those of catalytic enzymes are significantly less conserved. We suggest that developmental and other important regulators constitute the central hub of human gene regulatory networks.
- Statistical methods for the analysis of genetics marker and microarray data(2004-05-18) Yu, Xiang; Bruce S. Weir, Committee Chair; Dahlia M. Nielsen, Committee Co-Chair; Greg Gibson, Committee Member; Russell D. Wolfinger, Committee MemberWith the advent of high-throughput technologies in genomics study, a large volume of data has been accumulated, leaving the challenge for bioinformaticists on how to manage, analyze, and interpret the data. Analysis of genetic marker and microarray data are two important aspects in current bioinformatics studies. In this dissertation work, we tend to explore some statistical issues for such problems. We discuss two extensions of the EM algorithm to infer haplotypes from genotype data, each for a particular sampling scenario. The first one applies to a random sample of both diploid and haploid individuals from the population, in which the haplotype information from the haploid individuals is incorporated into the estimation process. The second one applies to a sample of parent-offspring trios, in which the dependencies between the parental and the offspring genotypes are correctly handled in the analysis. We show that these two modified EM algorithms perform better than the usual one when applied to their corresponding specific samples, respectively. We study the experimental designs in two-color microarray experiments and resolve some of the outstanding issues that are controversial on the use of different experiment designs. We show that the loop and balanced block designs analyzed in a mixed model are more efficient that the reference designs from a statistical point of view. We also provide general guidelines on how to optimize experimental resources to get maximal efficiency using these designs. We present an application of the mixed model to identify transcription factor-gene interactions and to infer transcriptional regulatory structures in Sacchromyces cerevisiae using microarray experiments. We demonstrate the mixed model that pools the observations across all experiments to be a powerful approach.
