Multivariate Statistical Analysis of Protein Variation

Show full item record

Title: Multivariate Statistical Analysis of Protein Variation
Author: Zhao, Jieping
Advisors: Bruce S. Weir, Committee Member
Zhao-Bang Zeng, Committee Member
Thomas M. Gerig, Committee Member
William R. Atchley, Committee Chair
Abstract: The purpose of this research is to study the protein sequence metric problem solution and apply it to explore the structural, functional and evolutionary aspects of basic helix-loop-helix (bHLH) protein family. Sequence metric problem is caused by the alphabetic coding of the amino acids and has long been a hindrance to efficient protein sequence analysis. This dissertation started with revisiting sequence metric problem solution initiated by Atchley et al (2005) [PNAS102(18):6401-6]. Some of the unsolved issues, such as information loss, model robustness, and concordance between factor analysis and principal component analysis were studied. Further, classification of 20 amino acids was explored in the numerical factor space resolved by Atchley et al (2005) Next two parts of the dissertation were focused on computational and statistical studies of the bHLH protein family. All the protein sequence data were transformed into numerical vectors by using the amino acid factor scores from the sequence metric solution. In the second part of the dissertation, protein sequence variability in the level of statistically supported lineages (=clades) was studied using the stepwise discriminant analysis. Some of the important sites for the clades discrimination were selected and hierarchical classification strategies for the clades were proposed. In the third part of the dissertation, 147 Arabidopsis bHLH proteins were studied by a series of multivariate analyses. Results showed that there were significant sequence differences between plant (e.g. Arabidopsis) and animal bHLH proteins, and some of the contributing discriminant sites were selected and discussed. Binding property of each of the Arabidopsis bHLH proteins was assigned by using the classification rules derived from animal bHLH proteins.
Date: 2006-03-09
Degree: PhD
Discipline: Bioinformatics

Files in this item

Files Size Format View
etd.pdf 2.008Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record