Methods for Accurate Analysis of High-Throughput Transcriptome Data

Show simple item record

dc.contributor.advisor Steffen Heber, Committee Chair en_US
dc.contributor.advisor David Bird, Committee Member en_US
dc.contributor.advisor Dahlia Nielsen, Committee Member en_US
dc.contributor.advisor Heike Winter-Sederoff, Committee Member en_US
dc.contributor.advisor Hao Helen Zhang, Committee Member en_US Howard, Brian E en_US 2010-04-02T18:48:58Z 2010-04-02T18:48:58Z 2009-11-30 en_US
dc.identifier.other etd-10132009-213553 en_US
dc.description.abstract A detailed understanding of the transcriptome is a prerequisite for deciphering the flow of information from genotype to phenotype. Fortunately, modern high-throughput technologies now provide an unprecedented ability to observe the full complement of transcriptional events, which extend far beyond the classical "one gene, one protein" hypothesis to include alternatively spliced genes, microRNAs, RNA interference, anti-sense transcription, and a variety of other, until recently, unknown phenomena. However, in order to accurately interpret the results of these assays, new statistical and bioinformatic methods must be developed in parallel to biotechnological advances. In this thesis, we present several methods for improving the accuracy of inferences obtained from the high-throughput transcriptome data generated by these new technologies. First, we present a novel method for microarray quality assessment. Since accurate inference is dependent on the quality of the underlying data, quality assessment is a critical component in any microarray data analysis. Our method, which uses an unsupervised classifier to discriminate between high and low quality microarray datasets, exhibits performance comparable to supervised learners constructed using the same training data. However, because our approach requires only unnannotated data, it is easy to customize and to keep up-to-date as technology evolves. Next, we present an alternative method for microarray quality assessment, which identifies low quality microarrays by simulating a set of differentially expressed genes. This method directly measures the ability of a planned statistical analysis to identify differential gene expression when suspected low quality arrays are included in the dataset. A key advantage of this approach is that, unlike other methods, this method provides a specific recommendation about whether to retain or discard low quality chips in the context of a particular experimental setting. Finally, we introduce a procedure for accurately quantifying alternative splicing using RNA-Seq data. Our method uses a familiar linear models approach, but improves upon similar methods that assume uniform coverage of RNA-Seq reads along the targeted transcripts. We first show, through simulation, that using an incorrect read sampling distribution can lead to incorrect conclusions about the expression of isoforms in a mixture. Applying our method to an example dataset, we identify 438 differentially spliced genes, exhibiting a range of expression patterns including genes with switch-like differential splicing between two tissues, as well as genes with more subtle variations in isoform expression. Taken together, we expect that these methods can serve to increase the accuracy of inferences drawn from high-throughput transcriptome data, and in doing so, lead to an advancement of our understanding of the biology of genome expression. en_US
dc.rights I hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dis sertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to NC State University or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report. en_US
dc.subject RNA-Seq en_US
dc.subject quality assessment en_US
dc.subject microarray en_US
dc.subject alternative splicing en_US
dc.subject transcriptome en_US
dc.title Methods for Accurate Analysis of High-Throughput Transcriptome Data en_US PhD en_US dissertation en_US Bioinformatics en_US

Files in this item

Files Size Format View
etd.pdf 1.590Mb PDF View/Open

This item appears in the following Collection(s)

Show simple item record