Multivariate Statistical Analysis of Protein Variation

dc.contributor.advisorBruce S. Weir, Committee Memberen_US
dc.contributor.advisorZhao-Bang Zeng, Committee Memberen_US
dc.contributor.advisorThomas M. Gerig, Committee Memberen_US
dc.contributor.advisorWilliam R. Atchley, Committee Chairen_US
dc.contributor.authorZhao, Jiepingen_US
dc.date.accessioned2010-04-02T19:14:01Z
dc.date.available2010-04-02T19:14:01Z
dc.date.issued2006-03-09en_US
dc.degree.disciplineBioinformaticsen_US
dc.degree.leveldissertationen_US
dc.degree.namePhDen_US
dc.description.abstractThe purpose of this research is to study the protein sequence metric problem solution and apply it to explore the structural, functional and evolutionary aspects of basic helix-loop-helix (bHLH) protein family. Sequence metric problem is caused by the alphabetic coding of the amino acids and has long been a hindrance to efficient protein sequence analysis. This dissertation started with revisiting sequence metric problem solution initiated by Atchley et al (2005) [PNAS102(18):6401-6]. Some of the unsolved issues, such as information loss, model robustness, and concordance between factor analysis and principal component analysis were studied. Further, classification of 20 amino acids was explored in the numerical factor space resolved by Atchley et al (2005) Next two parts of the dissertation were focused on computational and statistical studies of the bHLH protein family. All the protein sequence data were transformed into numerical vectors by using the amino acid factor scores from the sequence metric solution. In the second part of the dissertation, protein sequence variability in the level of statistically supported lineages (=clades) was studied using the stepwise discriminant analysis. Some of the important sites for the clades discrimination were selected and hierarchical classification strategies for the clades were proposed. In the third part of the dissertation, 147 Arabidopsis bHLH proteins were studied by a series of multivariate analyses. Results showed that there were significant sequence differences between plant (e.g. Arabidopsis) and animal bHLH proteins, and some of the contributing discriminant sites were selected and discussed. Binding property of each of the Arabidopsis bHLH proteins was assigned by using the classification rules derived from animal bHLH proteins.en_US
dc.identifier.otheretd-12092005-003538en_US
dc.identifier.urihttp://www.lib.ncsu.edu/resolver/1840.16/5446
dc.rightsI hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to NC State University or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.en_US
dc.subjectbasic helix-loop-helixen_US
dc.subjectmultivariate analysisen_US
dc.subjectsequence metric problemen_US
dc.titleMultivariate Statistical Analysis of Protein Variationen_US

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
etd.pdf
Size:
2.01 MB
Format:
Adobe Portable Document Format

Collections