dc.contributor.advisor |
Steffen Heber, Committee Chair |
en_US |
dc.contributor.advisor |
David Bird, Committee Member |
en_US |
dc.contributor.advisor |
Dahlia Nielsen, Committee Member |
en_US |
dc.contributor.advisor |
Heike Winter-Sederoff, Committee Member |
en_US |
dc.contributor.advisor |
Hao Helen Zhang, Committee Member |
en_US |
dc.contributor.author |
Howard, Brian E |
en_US |
dc.date.accessioned |
2010-04-02T18:48:58Z |
|
dc.date.available |
2010-04-02T18:48:58Z |
|
dc.date.issued |
2009-11-30 |
en_US |
dc.identifier.other |
etd-10132009-213553 |
en_US |
dc.identifier.uri |
http://www.lib.ncsu.edu/resolver/1840.16/4206 |
|
dc.description.abstract |
A detailed understanding of the transcriptome is a prerequisite for deciphering the flow of information from genotype to phenotype. Fortunately, modern high-throughput technologies now provide an unprecedented ability to observe the full complement of transcriptional events, which extend far beyond the classical "one gene, one protein" hypothesis to include alternatively spliced genes, microRNAs, RNA interference, anti-sense transcription, and a variety of other, until recently, unknown phenomena. However, in order to accurately interpret the results of these assays, new statistical and bioinformatic methods must be developed in parallel to biotechnological advances. In this thesis, we present several methods for improving the accuracy of inferences obtained from the high-throughput transcriptome data generated by these new technologies.
First, we present a novel method for microarray quality assessment. Since accurate inference is dependent on the quality of the underlying data, quality assessment is a critical component in any microarray data analysis. Our method, which uses an unsupervised classifier to discriminate between high and low quality microarray datasets, exhibits performance comparable to supervised learners constructed using the same training data. However, because our approach requires only unnannotated data, it is easy to customize and to keep up-to-date as technology evolves.
Next, we present an alternative method for microarray quality assessment, which identifies low quality microarrays by simulating a set of differentially expressed genes. This method directly measures the ability of a planned statistical analysis to identify differential gene expression when suspected low quality arrays are included in the dataset. A key advantage of this approach is that, unlike other methods, this method provides a specific recommendation about whether to retain or discard low quality chips in the context of a particular experimental setting.
Finally, we introduce a procedure for accurately quantifying alternative splicing using RNA-Seq data. Our method uses a familiar linear models approach, but improves upon similar methods that assume uniform coverage of RNA-Seq reads along the targeted transcripts. We first show, through simulation, that using an incorrect read sampling distribution can lead to incorrect conclusions about the expression of isoforms in a mixture. Applying our method to an example dataset, we identify 438 differentially spliced genes, exhibiting a range of expression patterns including genes with switch-like differential splicing between two tissues, as well as genes with more subtle variations in isoform expression.
Taken together, we expect that these methods can serve to increase the accuracy of inferences drawn from high-throughput transcriptome data, and in doing so, lead to an advancement of our understanding of the biology of genome expression. |
en_US |
dc.rights |
I hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dis
sertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee.
I hereby grant to NC State University or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I
retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report. |
en_US |
dc.subject |
RNA-Seq |
en_US |
dc.subject |
quality assessment |
en_US |
dc.subject |
microarray |
en_US |
dc.subject |
alternative splicing |
en_US |
dc.subject |
transcriptome |
en_US |
dc.title |
Methods for Accurate Analysis of High-Throughput Transcriptome Data |
en_US |
dc.degree.name |
PhD |
en_US |
dc.degree.level |
dissertation |
en_US |
dc.degree.discipline |
Bioinformatics |
en_US |