Statistical Inference and Biological Interpretation via Comparatively Realistic Models of Molecular Evolution

No Thumbnail Available

Date

2008-12-06

Journal Title

Series/Report No.

Journal ISSN

Volume Title

Publisher

Abstract

Recently, advances in statistical inference techniques have allowed analyses of molecular evolution to proceed without the biologically implausible assumption of independent change among DNA sequence sites. These techniques permit incorporation of molecular phenotypes such as RNA secondary and protein tertiary structure directly into the models of DNA sequence evolution, and they thereby facilitate assessment of the impact of molecular phenotype on the rates of sequence evolution. Our analysis of 1,195 non-redundant protein-coding sequences suggests that solvent accessibility and pairwise interactions among amino acids have important and roughly comparable impacts on the rates of evolution. We show how solvent accessibility and pairwise amino acid interactions can be used with protein-coding single nucleotide polymorphism (SNP) data to predict which SNP allele is ancestral and which is derived. Our analysis of 142 non-synonymous SNPs indicates that ancestral alleles are more selectively advantageous with respect to tertiary structure than are derived alleles. In other work, we show how recently developed models of molecular evolution with dependent change among sites can be adapted to generate stationary distributions that match a desired variable length Markov model or profile hidden Markov model for protein sequence organization. Departures between a neutral model for protein evolution and the variable length Markov model or profile hidden Markov model are attributed to natural selection. We show how these departures lead to a crude approximation of the product of effective population size and the difference in relative fitnesses between sequences.

Description

Keywords

Gene Ontology, MCMC, Bayes factor, indel models, protein structure impact, phenotype-genotype mapping, molecular fitness, population genetic interpretation, ancestral allele prediction

Citation

Degree

PhD

Discipline

Bioinformatics

Collections