Browsing by Author "Jeffrey Thorne, Committee Member"
Now showing 1 - 12 of 12
- Results Per Page
- Sort Options
- Analysis of Genes Expressed by Meloidogyne incognita Males: Generation of ESTs and Comparative Genomics(2004-07-21) Snyder, Daniel Watchorn; Charles H. Opperman, Committee Co-Chair; Jeffrey Thorne, Committee Member; David Mckenzie Bird, Committee ChairOver the last decade genomic and molecular research in model systems including Ceanorahbditis elegans have generated a body of knowledge that has revolutionized the study of plant parasitic nematode biology and disease. Meloidogyne incognita is one of the most devastating plant parasitic nematodes worldwide and has proven to be a prime example of the utility of comparative genomics in the investigation of plant diseases. The role of males and sex determination in this species presents a particularly interesting evolutionary and biological issue but is poorly understood at the molecular level. A method for generating and collecting male M. incognita tissue that is free of other life stages or females and other contaminating organisms was developed. The method confirmed previous results that host-pruning stress stimulates male sex differentiation and that the effects of this stimulus are limited to the time period of early second stage larvae feeding. Approximately 5,000 ESTs were generated by 5' sequencing of cDNAs from four male M. incognita libraries constructed from the tissue obtained above. Two cDNA libraries were constructed by utilizing a nematode splice leader (SL1) and two libraries were constructed with a universal template switch method. The ESTs from these libraries were clustered for each library and as a complete set from all four libraries providing insight to the composition and abundance of genes expressed by these males. Searching the public databases for homologous sequences providing a second degree of clustering and provided the putative identification of genes identified by these clusters. Approximately 89% of the clusters had significant homology to the public databases and approximately 31% of these had homology to C. elegans. Additional annotation of these genes was conducted using several public resources providing further insight to the molecular basis of male M. incognita biology and comparative analyses to other stages and nematode species. Protein phosphatases, transthyretin-like families and major sperm proteins are some of the most abundant sequences in these male libraries and are also highly expressed in C. elegans males. These results indicate that not only are these M. incognita libraries representative of male gene expression, but that male gene expression profiles may be similar across families.
- Computational Biology of Ras Proteins(2008-04-07) Dellinger, Andrew Everette; William R. Atchley, Committee Chair; Carla Mattos, Committee Member; Jeffrey Thorne, Committee Member; Jon Doyle, Committee MemberIn this research, computational biology is used to elucidate how evolutionary history has changed roles of structure and function among Ras proteins, with a focus on the Ras family. This dissertation begins with phylogenetic analyses of the Ras superfamily and Ras family. Phylogenetic trees of the Ras family were estimated using Neighbor-Joining, Weighted Neighbor-joining, Parsimony, Quartet Puzzling, Maximum Likelihood and Bayesian methods. In nearly all cases, each clade represented a subfamily. Clade members and clade divisions were consistent among all the trees, increasing the probability of a correct estimation of the evolutionary history. Further investigation into the evolution of sequence involved decomposing sequence covariation into its respective components. The roles of the functional and structural components of covariation were the focus of several multivariate analyses. Decision tree analysis, a data mining method, found that sequence divergence in critical sites of the hydrophobic core, dimerization regions and ligand binding regions were sufficient to divide Ras subfamilies. Alignments of GDP-bound and GTP-bound crystal structures revealed that only Ral and M-Ras proteins have structural variation in the effector binding switch I regions, while all Ras structures vary in the protein binding switch II region. Di-Ras2-GDP was shown to have a unique C-terminal loop which binds to the interswitch region. Last, a common factor analysis was computed. The factors contain the set of sites that both discriminate among the subfamilies and have a unique functional or structural role, such as Ral tree-determinant sites. Finally, sequence signatures were developed for each of the families of the Ras superfamily using Boltzmann-Shannon entropy. This method was compared to the PROSITE signature, profile hidden Markov model and MEME position-specific scoring matrix methods. The Entropy method identified approximately 8% fewer proteins than the best of the other methods, MEME. Comparative analyses of these sequence signatures determined which sites and amino acids played important roles in the changes in protein function and structure among Ras families.
- Gene family evolution of monolignol biosynthetic genes in loblolly pine ( Pinus taeda L.)(2002-10-28) Vasquez-Kool, Jorge; Arthur Johnson, Committee Member; David M. O'Malley, Committee Chair; Jeffrey Thorne, Committee Member; Brenda Temple, Committee MemberGene family organization in a genome is an important area in the study of genomics because it provides an structured basis to approach larger, more complex problems like gene expression, biochemical pathways and metabolism. Genes control and mediate many of the biological activities of the cell. Although some genes are single copy, a significant proportion of genes belong to larger gene families. Members of a gene family are related by homology and share significant sequence identity, modular structure, and their products display similarity in biochemical function and a common structural fold. To gain an understanding of the importance of grouping and characterizing genes into families, Chapter 1 provides an overview of gene family organization in genomes, presenting current delineation schemes to recognize members of a family, concepts of current use in this area of study, and a summary of different computational methods applied for the characterization of gene families. The ability to recognize patterns that allow the distinction of genes and the formation of discrete family groups requires the generation and use of multiple classes of information. For any given family, these classes of knowledge would include (1) the sequence properties of the genes and its corresponding proteins, including its possession of defined motifs; (2) the phylogenetic history and pattern of conservation of the gene and the gene product; and (3) the structual features of the gene product. In order to gather this information and furnish the data needed to advance inferences on gene families of the monolignol biosynthetic pathway, Chapter 2 describes the three-dimensional structural models of 4-coumarate:CoA ligase, cinnamoyl-CoA reductase and cinnamyl alcohol dehydrogenase, that were built using a comparative modeling approach. The three proteins show a Rossman fold domain, but they were evolutionarily unrelated. The comparison of the modular structure of the gene (exons) and the protein (domains) provided no clear patterns to relate the uni-dimensional gene structure to the three-dimensional structure of its protein product. The 3D structural models served to map conserved and variable sites and to identify active site or positions involved with cofactor or substrate recognition. At present, a major challenge facing the study of gene families is how to characterize sequence features that distinguish subfamilies. The ability to detect positions in a multiple sequence alignment where amino acid composition distinguishes subfamilies provide a basis to pursue studies on functional differences. Chapter 3 tackles this problem by determining a minimal set of homologous positions whose amino acid configuration is associated to subfamily groups. Two measures, the Gu's criterion of divergence and the mutual information criterion, detected aligned positions where molecular evolution differed by subfamily. Some of these positions were found in the substrate binding site, others related with the active site. The specific role of other discriminant positions could not be elucidated at present. The great benefit of recognizing homolog positions with discriminant capability poses the question whether these positions have a predictive ability? Could this signature positions that effectively distinguish subfamilies be applied with the purpose to predict subfamily structure in loblolly pine DNA sequences? Does loblolly pine subfamily structure for a given gene follows a similar organization as known in other plant species? Chapter 4 describes the application of the signature in this context using loblolly pine expressed sequence tag (EST) data. The rationale of the method involves using an alignment and phylogeny of translated sequences of known genes to detect the signature positions (scaffold alignment). Then use the scaffold to align translated EST sequences (obtained from a tBLASTn report). This EST-scaffold alignment allowed the determination of the amino acid occupancy in the corresponding signature position in the EST. This operation will allocate ESTs into the different subfamilies. This knowledge-based approach revealed that loblolly pine contained only one type of 4-coumarate:CoA ligase, whereas coumaroyl-CoA reductase and cinnamyl alcohol dehydrogenase had each members of two different classes. One by-product of the method was the assemblage of full-length, biologically-meaningful contigs from the same subfamily. This study revealed that the integrated use of phylogenetic, informatic and structural techniques and the use of pertinent biological sequence data could provide sound basis to extent effectively the inference capability in the study of gene family structure in the loblolly pine genome.
- Genomic and Molecular Analyses of the Core DNA Replication Machinery in Plants(2008-04-04) Shultz, Randall William; Rebecca Boston, Committee Member; Jeffrey Thorne, Committee Member; William Thompson, Committee Co-Chair; George Allen, Committee Co-Chair; Steven Spiker, Committee MemberAccurate and complete DNA replication is essential for maintaining the integrity of the genome. In eukaryotes, this process requires the coordinated action of numerous molecular machines. Based on yeast and animal model systems, we defined a set of fifty-one "core DNA replication proteins" that are integral to the initiation, DNA synthesis, and Okazaki fragment maturation functions of DNA replication. We used computational analyses to identify putative homologs in the genomes of two plants, Arabidopsis thaliana (Arabidopsis) and Oryza sativa (rice), providing the first comprehensive view of the core DNA replication machinery in plants. Our results indicated that the overall composition of this apparatus is conserved, but plants are unique in that multiple DNA replication genes exist as small gene families. Fourteen of the genes we annotated in this study have not been previously reported in the literature, and we have provided revised gene models for seventeen plant proteins. To better understand how the DNA replication machinery functions in plants, we cloned multiple subunits of the pre-replication complex (pre-RC) from Arabidopsis and generated antibodies against four key components of this complex — AtORC1, AtORC2, AtMCM5, and AtMCM7. We demonstrated that the pre-RC is developmentally regulated in Arabidopsis and, consistent with a role in DNA replication, is abundant in proliferating tissues. We used immunocytochemical and biochemical methods to characterize MCM7 in plants. We observed two distinct localization patterns for plant MCM7 proteins. In most cells, MCM7 was nuclear and colocalized with DNA. In a small fraction of cells, MCM7 was dispersed throughout the cytoplasmic compartment. Biochemical analysis confirmed that MCM7 binds to chromatin and that it is present in the nucleus at least during the G1, S and G2 cell cycle stages. Together, these analyses support a model where the MCM complex is loaded onto DNA in late M and early G1, released into the nucleoplasm during S phase followed by a brief dispersion into the cytoplasmic compartment concurrent with nuclear envelope breakdown in mitosis.
- Insight into Filamentous Fungal Secretion and Evolution through Genomic Analysis(2005-12-08) Diener, Stephen Ericson; Gary Payne, Committee Member; Ignazio Carbone, Committee Member; Ralph Dean, Committee Chair; Jeffrey Thorne, Committee MemberFilamentous fungi are of broad economic importance due to their roles in industry, medicine and agriculture. There are several filamentous fungi, such as Trichoderma reesei, which have been harnessed as protein factories due to their immense secretion capacity. Unfortunately, their full potential cannot be exploited due to an incomplete understanding of the pathways and genes involved in the fungal secretion system. Through the development of bioinformatic tools and the use of genomic technologies including expressed sequence tags and bacterial artificial chromosomes, the genome of T. reesei has been partially characterized and a number of genes involved in the secretion system have been identified. Pathogenic fungi, such as Magnaporthe grisea, are of great economic importance due to their devastating effect on agriculture. M. grisea is responsible for the loss of incredible amounts of rice crop yearly and has recently had its genome completely sequenced and annotated. The genome sequence has revealed the set of transposable elements in M. grisea which have then been analyzed using gene genealogies and the coalescent. The genealogies and coalescent have revealed that all elements analyzed showed a rapid expansion at some point in the past. This can be explained as a genomic event leading to the acceptance of transposable element activity most likely caused by the loss of genomic defense mechanisms. As a pathogen, the ability to evolve quickly in the face of plant defense mechanisms is essential. Transposable element activity can provide means for rapid evolution and this acceptance may represent a shift of these elements from genomic parasitism to mutualism.
- Modeling the Molecular Evolution of Protein Domains and Networks.(2010-10-20) McFerrin, Lisa; William Atchley, Committee Chair; Eric Stone, Committee Chair; Spencer Muse, Committee Member; Jeffrey Thorne, Committee Member; Ignazio Carbone, Committee Member
- Molecular Characterization of Listeria monocytogenes Epidemic Clone I (ECI) like isolates from Food and Food Environments(2007-08-08) Yildirim, Suleyman; Jeffrey Thorne, Committee Member; Craig Altier, Committee Member; Sophia Kathariou, Committee Chair; Lee-Ann Jaykus, Committee Co-Chair; Todd R Klaenhammer, Committee Member
- Molecular Evolution of Phytophthora infestans (Mont.)de Bary, The Late Blight Pathogen(2005-03-01) Gomez-Alpizar, Luis E; Trudy Mackay, Committee Member; Jeffrey Thorne, Committee Member; Marc Cubeta, Committee Member; Jean B. Ristaino, Committee ChairPhytophthora infestans (Mont.) de Bary causes late blight of potato and tomato and is one of the world's most devastating plant diseases. P. infestans left its footprint in human history when, in the 19th century, it was responsible for the Irish Potato Famine. Nuclear and mitochondrial DNA variability was used to examine the population history of P. infestans. DNA sequence data from three nuclear regions (Intron Ras, Ras, and β-tubulin) and two mitochondrial regions (P3 and P4) were obtained from ninety isolates from various locations including Brazil, Bolivia, Ecuador, Peru, Costa Rica, Mexico (Toluca Valley), the USA and Ireland. Population summary statistics show that the Mexican population from the presumed center of origin of P. infestans, harbored less nucleotide and haplotype diversity than South American populations, and was genetically differentiated from other populations, particularly at the mitochondrial loci. Coalescent-based genealogies of mitochondrial (rpl14, rpl5, tRNAs, cox1) and nuclear (Intron Ras+Ras) loci were congruent and demonstrated the existence of two lineages leading to the present day haplotypes of P. infestans associated with potatoes. A third lineage, associated with a group of isolates from Solanum tetrapetalum collected in the Andean Highlands of Ecuador was also found. In the mitochondrial genealogy the two potato lineages corresponded to the mitochrondrial haplotypes Type I and Type II described elsewhere. Mitochondrial haplotypes were associated with different nuclear backgrounds. Haplotypes found in the Toluca Valley population were derived from only one of the two lineages in both mitochondrial and nuclear genealogies, whereas haplotypes found in South American populations (Peru and Ecuador) were derived from both lineages. Haplotypes found in USA and Ireland populations were also derived from both lineages and these populations were not genetically differentiated from the Peruvian populations, suggesting a common ancestry among these populations. Evidence for recombination was found for Mexican and USA populations. Solanum tetrapetalum isolates were highly polymorphic within the regions analyzed and may be a new species. The results support a South American origin of P. infestans and are discussed in relation of previous hypotheses regarding the geographic origin of this plant pathogen.
- Not Just Another Trait: Methods for the Genetic Analysis of Gene Expression(2008-04-21) Aylor, David Lawrence; Zhao-Bang Zeng, Committee Chair; Jeffrey Thorne, Committee Member; Ignazio Carbone, Committee Member; Philip Awadalla, Committee Member
- Site-to-site Rate Variation in Protein Coding Genes(2006-04-28) Mannino, Frank Vincent; Spencer Muse, Committee Chair; William Atchley, Committee Member; Jeffrey Thorne, Committee Member; Bruce Weir, Committee MemberThe ability to realistically model gene evolution improved dramatically with the rejection of the assumption that rates are constant across sites. Rate heterogeneity models allow for better estimates of parameters and site specific inferences such as the detection of positive selection. Recently developed models of codon evolution allow for both synonymous and nonsynonymous rates to vary independently according to discretized gamma distributions. I applied this model to mitochondrial genomes and concluded that synonymous rate variation is present in many genes, and is of appreciable magnitude relative to the amount of nonsynonymous heterogeneity. I then extending this model to allow for the two rates to vary according to a dependent bivariate distribution, permitting tests for the significance of correlation of rates within a gene. I present here the algorithm to discretize this bivariate distribution and the application of the model to many real data sets. Significant correlation between synonymous and nonsynonymous rates exists in roughly half of the data sets that I examined, and the correlation is typically positive. These data sets range over a wide group of taxa and genes, implying that the trend of correlation is general. Finally, I performed a thorough investigation of the statistical properties of using discretized gamma distributions to model rate variation, looking at the bias and variance in parameter estimates. These discretized distributions are common in modeling heterogeneity, but have weaknesses that must be well understood before making inferences.
- Spectral Analysis of Protein Sequences(2005-10-25) Wang, Zhi; Jeffrey Thorne, Committee Member; William Atchley, Committee Co-Chair; Charles Smith, Committee Chair; Bruce Weir, Committee MemberThe purpose of this research is to elucidate how to apply spectral analysis methods to understand the structure, function and evolution of protein sequences. In the first part of this research, spectral analyses have been applied to the basic- helix-loop-helix (bHLH) family of transcription factors. It is shown that the periodicity of the bHLH variability pattern (entropy profile) conforms to the classical alpha-helix periodicity of 3.6 amino acids per turn. Further, the underlying physiochemical attributes profiles (factor score profiles) are examined and their periodicities also have significant implications of the alpha-helix secondary structure. It is suggested that the entropy profile can be well explained by the five factor score variance components that reflect the polarity/hydrophobicity, secondary structure information, molecular volume, codon composition and electrostatic charge attributes of amino acids. In the second part of this research, complex demodulation (CDM) method is introduced in an attempt to quantify the amplitude of periodic components in protein sequences. Proteins are often considered to be 'multiple domain entities' because they are composed of a number of functionally and structurally distinct domains with potentially independent origins. The analyses of bZIP and bHLH-PAS protein domains found that complex demodulation procedures can provide important insight about functional and structural attributes. It is found that the local amplitude minimums or maximums are associated with the boundary between two structural or functional components. In the third part of this research, the periodicity evaluation of a leucine zipper protein domain with a well-known structure is used to rank 494 published indices summarized in a database (http://www.genome.jp/dbget/aaindex.html). This application allows us to select those amino acid indices that are strongly associated with the protein structure and hereby to promote the protein structure prediction. This procedure can be used to reduce some redundancy of the amino acid indices.
- Spontaneous Mutation Discovery via High-Throughput Sequencing of Pedigrees(2010-04-20) Keebler, Jonathan Edward Myers; Alison Motsinger, Committee Member; Jeffrey Thorne, Committee Member; Ignzaio Carbone, Committee Member; Eric Stone, Committee Chair; Philip Awadalla, Committee MemberRecent technological advances have made high-throughput DNA sequencing a routine laboratory experiment. This progression in technology has been made possible by the parallel production of millions of short fragments of sequence. The responsibility of garnering biological information from these DNA fragments has shifted from the wet-lab to the bioinformatician. As sequencing technology is applied to a growing number of individual human genomes, entire families are now being sequenced. Information contained within the pedigree of a sequenced family can be leveraged when inferring the donors’ genotypes, a task that is not necessarily trivial using high-throughput sequencing reads. A violation of Mendelian inheritance laws observed amid the resequenced genomes of family members can indicate the presence of a de novo mutation. A method for locating de novo mutations by probabilistically inferring genotypes across a pedigree using high-throughput sequencing is presented and applied to two resequenced nuclear families: one as a collaborative effort within The 1,000 Genomes Project, and the second in an attempt to discover candidate driver and passenger mutations within the genome of an Acute Lymphoblastic Leukemia. The mutation findings within these projects are presented, and the approach is examined in detail, highlighting areas where method improvements may be made. Considering the challenges experienced in these studies within the larger context of the nascent field of Personal Genomics, an honest assessment is presented of developments that must be made before the application of whole-genome sequencing on the scale of an individual human can unequivocally be used to predict, diagnose, or treat human disease.
