Statistical Inference and Biological Interpretation via Comparatively Realistic Models of Molecular Evolution
No Thumbnail Available
Files
Date
2008-12-06
Authors
Journal Title
Series/Report No.
Journal ISSN
Volume Title
Publisher
Abstract
Recently, advances in statistical inference techniques have
allowed analyses of molecular evolution to proceed without the biologically
implausible assumption of independent change among DNA sequence sites.
These techniques permit incorporation of molecular phenotypes such
as RNA secondary and protein tertiary structure directly into the
models of DNA sequence evolution, and they thereby facilitate assessment
of the impact of molecular phenotype on the rates of sequence evolution.
Our analysis of 1,195 non-redundant protein-coding sequences suggests
that solvent accessibility and pairwise interactions among amino acids
have important and roughly comparable impacts on the rates of evolution.
We show how solvent accessibility and pairwise amino acid interactions
can be used with protein-coding single nucleotide polymorphism (SNP)
data to predict which SNP allele is ancestral and which is derived.
Our analysis of 142 non-synonymous SNPs indicates that ancestral alleles
are more selectively advantageous with respect to tertiary structure
than are derived alleles. In other work, we show how recently developed
models of molecular evolution with dependent change among sites can
be adapted to generate stationary distributions that match a desired
variable length Markov model or profile hidden Markov model for protein
sequence organization. Departures between a neutral model for protein
evolution and the variable length Markov model or profile hidden Markov
model are attributed to natural selection. We show how these departures
lead to a crude approximation of the product of effective population
size and the difference in relative fitnesses between sequences.
Description
Keywords
Gene Ontology, MCMC, Bayes factor, indel models, protein structure impact, phenotype-genotype mapping, molecular fitness, population genetic interpretation, ancestral allele prediction
Citation
Degree
PhD
Discipline
Bioinformatics