Protein Evolution From Sequence To Structure.

Abstract

The purpose of this research is to elucidate how natural selection shapes protein evolution. The question was addressed by exploring protein sequence evolution, 3D structural evolution, and analysis of the multidimensional nature of amino acid covariation. This thesis begins with a study of protein sequence evolution. 118 different bHLH genes in the completely sequenced Arabidopsis thaliana genome and 131 bHLH genes in the rice genome were identified and characterized using phylogenetic analysis. These plant proteins were classified into 15 distinct plant clades and were under weaker selective constraints than their animal counterparts. Additionally, it was shown that lineage specific expansions and subfunctionalization have fashioned regulatory proteins for plant specific functions. To further characterize the bHLH domain, a canonical 3D structure was created from solved structures. This canonical structure was used as a template for producing 3D models for other representative bHLH proteins, which were then compared, contrasted, and grouped based on structural characteristics. Structural similarities were discovered within the bHLH domain between three clades (Max, Myc, and PbHLH-LZ). In addition, structural models of the Sat proteins suggest a strong similarity to other bHLH proteins, which is in disagreement with previous functional characterization. To further understand the dimensionality of protein evolution, the independence of amino acid sites was explored using multivariate factor analysis. A matrix of pairwise normalized mutual information values were computed among amino acid sites for the serpin proteins. The normalized mutual information matrix was partitioned into orthogonal dimensions by factor analysis. Each eigenvector from the factor analysis can be interpreted as having phylogenetic or structural/functional explanations or combinations of both. This approach discerns strong amino acid covariation within several key functional regions including the RCL, shutter, and breach. In addition, this approach elucidates hydrogen bonding, hydrophobic, and electrostatic interactions within the serpin protein family.

Description

Keywords

basic helix-loop-helix, Arabidopsis thaliana, rice, phylogeny, transcription factors, G-box, R genes, E-box, genome searching, blast search, lineage specific expansion, subfunctionalization, homology modeling, structural parsimony, root mean square deviation, serpins, factor analysis

Citation

Degree

PhD

Discipline

Genetics

Collections