Site-to-site Rate Variation in Protein Coding Genes

dc.contributor.advisorSpencer Muse, Committee Chairen_US
dc.contributor.advisorWilliam Atchley, Committee Memberen_US
dc.contributor.advisorJeffrey Thorne, Committee Memberen_US
dc.contributor.advisorBruce Weir, Committee Memberen_US
dc.contributor.authorMannino, Frank Vincenten_US
dc.date.accessioned2010-04-02T18:53:13Z
dc.date.available2010-04-02T18:53:13Z
dc.date.issued2006-04-28en_US
dc.degree.disciplineBioinformaticsen_US
dc.degree.leveldissertationen_US
dc.degree.namePhDen_US
dc.description.abstractThe ability to realistically model gene evolution improved dramatically with the rejection of the assumption that rates are constant across sites. Rate heterogeneity models allow for better estimates of parameters and site specific inferences such as the detection of positive selection. Recently developed models of codon evolution allow for both synonymous and nonsynonymous rates to vary independently according to discretized gamma distributions. I applied this model to mitochondrial genomes and concluded that synonymous rate variation is present in many genes, and is of appreciable magnitude relative to the amount of nonsynonymous heterogeneity. I then extending this model to allow for the two rates to vary according to a dependent bivariate distribution, permitting tests for the significance of correlation of rates within a gene. I present here the algorithm to discretize this bivariate distribution and the application of the model to many real data sets. Significant correlation between synonymous and nonsynonymous rates exists in roughly half of the data sets that I examined, and the correlation is typically positive. These data sets range over a wide group of taxa and genes, implying that the trend of correlation is general. Finally, I performed a thorough investigation of the statistical properties of using discretized gamma distributions to model rate variation, looking at the bias and variance in parameter estimates. These discretized distributions are common in modeling heterogeneity, but have weaknesses that must be well understood before making inferences.en_US
dc.identifier.otheretd-03272006-140300en_US
dc.identifier.urihttp://www.lib.ncsu.edu/resolver/1840.16/4400
dc.rightsI hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to NC State University or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.en_US
dc.subjectcodon modelen_US
dc.subjectsynonymous and nonsynonymous substitution ratesen_US
dc.subjectsite-to-site rate heterogeneityen_US
dc.subjectmolecular evolutionen_US
dc.subjectgamma distributionen_US
dc.subjectrate correlationen_US
dc.titleSite-to-site Rate Variation in Protein Coding Genesen_US

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
etd.pdf
Size:
1.78 MB
Format:
Adobe Portable Document Format

Collections