Gene family evolution of monolignol biosynthetic genes in loblolly pine ( Pinus taeda L.)

No Thumbnail Available

Date

2002-10-28

Journal Title

Series/Report No.

Journal ISSN

Volume Title

Publisher

Abstract

Gene family organization in a genome is an important area in the study of genomics because it provides an structured basis to approach larger, more complex problems like gene expression, biochemical pathways and metabolism. Genes control and mediate many of the biological activities of the cell. Although some genes are single copy, a significant proportion of genes belong to larger gene families. Members of a gene family are related by homology and share significant sequence identity, modular structure, and their products display similarity in biochemical function and a common structural fold. To gain an understanding of the importance of grouping and characterizing genes into families, Chapter 1 provides an overview of gene family organization in genomes, presenting current delineation schemes to recognize members of a family, concepts of current use in this area of study, and a summary of different computational methods applied for the characterization of gene families. The ability to recognize patterns that allow the distinction of genes and the formation of discrete family groups requires the generation and use of multiple classes of information. For any given family, these classes of knowledge would include (1) the sequence properties of the genes and its corresponding proteins, including its possession of defined motifs; (2) the phylogenetic history and pattern of conservation of the gene and the gene product; and (3) the structual features of the gene product. In order to gather this information and furnish the data needed to advance inferences on gene families of the monolignol biosynthetic pathway, Chapter 2 describes the three-dimensional structural models of 4-coumarate:CoA ligase, cinnamoyl-CoA reductase and cinnamyl alcohol dehydrogenase, that were built using a comparative modeling approach. The three proteins show a Rossman fold domain, but they were evolutionarily unrelated. The comparison of the modular structure of the gene (exons) and the protein (domains) provided no clear patterns to relate the uni-dimensional gene structure to the three-dimensional structure of its protein product. The 3D structural models served to map conserved and variable sites and to identify active site or positions involved with cofactor or substrate recognition. At present, a major challenge facing the study of gene families is how to characterize sequence features that distinguish subfamilies. The ability to detect positions in a multiple sequence alignment where amino acid composition distinguishes subfamilies provide a basis to pursue studies on functional differences. Chapter 3 tackles this problem by determining a minimal set of homologous positions whose amino acid configuration is associated to subfamily groups. Two measures, the Gu's criterion of divergence and the mutual information criterion, detected aligned positions where molecular evolution differed by subfamily. Some of these positions were found in the substrate binding site, others related with the active site. The specific role of other discriminant positions could not be elucidated at present. The great benefit of recognizing homolog positions with discriminant capability poses the question whether these positions have a predictive ability? Could this signature positions that effectively distinguish subfamilies be applied with the purpose to predict subfamily structure in loblolly pine DNA sequences? Does loblolly pine subfamily structure for a given gene follows a similar organization as known in other plant species? Chapter 4 describes the application of the signature in this context using loblolly pine expressed sequence tag (EST) data. The rationale of the method involves using an alignment and phylogeny of translated sequences of known genes to detect the signature positions (scaffold alignment). Then use the scaffold to align translated EST sequences (obtained from a tBLASTn report). This EST-scaffold alignment allowed the determination of the amino acid occupancy in the corresponding signature position in the EST. This operation will allocate ESTs into the different subfamilies. This knowledge-based approach revealed that loblolly pine contained only one type of 4-coumarate:CoA ligase, whereas coumaroyl-CoA reductase and cinnamyl alcohol dehydrogenase had each members of two different classes. One by-product of the method was the assemblage of full-length, biologically-meaningful contigs from the same subfamily. This study revealed that the integrated use of phylogenetic, informatic and structural techniques and the use of pertinent biological sequence data could provide sound basis to extent effectively the inference capability in the study of gene family structure in the loblolly pine genome.

Description

Keywords

monolignols, bioinformatics, lignin biosynthesis, gene family, functional diversification

Citation

Degree

PhD

Discipline

Forestry

Collections