Log In
New user? Click here to register. Have you forgotten your password?
NC State University Libraries Logo
    Communities & Collections
    Browse NC State Repository
Log In
New user? Click here to register. Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Bruce S. Weir, Committee Chair"

Filter results by typing the first few letters
Now showing 1 - 8 of 8
  • Results Per Page
  • Sort Options
  • No Thumbnail Available
    Accounting for Within- and Between-Locus Dependencies in Marker Association Tests
    (2003-06-26) Czika, Wendy Ann; Dennis Boos, Committee Member; David Dickey, Committee Member; Dahlia Nielsen, Committee Member; Bruce S. Weir, Committee Chair; Russell Wolfinger, Committee Member
    The importance of marker association tests has recently been established for locating disease susceptibility genes in the human genome, attaining finer-scaled maps than the linkage variety of tests through the detection of linkage disequilibrium (LD). Many of these association tests were originally defined for biallelic markers under ideal assumptions, with multiallelic extensions often complicated by the covariance among genotype or allele proportions. The well-established allele and genotype case-control tests based on Pearson chi-square test statistics are exceptions since they adapt easily to multiallelic versions, however each of these has its shortcomings. We demonstrate that the multiallelic trend test is an attractive alternative that lacks these limitations. A formula for marker genotype frequencies that incorporates the coefficients quantifying various disequilibria is presented, accommodating any type of disease model. This enables the simulation of samples for estimating the significance level and calculating sample sizes necessary for achieving a certain level of power. There is a similar complexity in extending the family-based tests of association to markers with more than two alleles. Fortunately, the nonparametric sibling disequilibrium test (SDT) statistic has a natural extension to a quadratic form for multiallelic markers. In the original presentation of the statistic however, information from one of the marker alleles is needlessly discarded. This is necessary for the parametric form of the statistic due to a linear dependency among the statistics for the alleles, but the nonparametric representation eliminates this dependency. We show how a statistic making use of all the allelic information can be formed. Obstacles also arise when multiple loci affect disease susceptibility. In the presence of gene-gene interaction, single-marker tests may be unable to detect an association between individual markers and disease status. We implement and evaluate tree-based methods for the mapping of multiple susceptibility genes. Adjustments to correlated p-values from markers in LD with each other are also examined. This study of epistatic gene models reveals the importance of three-locus disequilibria of which we discuss various statistical tests.
  • No Thumbnail Available
    Computational Methods for Identifying and Characterizing the Human Gene Regulatory Regions and Cis-elements
    (2005-11-23) Huang, Weichun; Leping Li, Committee Member; Bruce S. Weir, Committee Chair; William R. Atchley, Committee Member; Jeffrey L. Thorne, Committee Member; Russell D. Wolfinger, Committee Member
    The identification of functional regulatory regions and cis-elements is a preliminary step toward the reconstruction of gene regulatory networks. Comparative genomics has been demonstrated to be a powerful approach for motif discovery. However, the accurate alignment of complex genomic sequences, especially those of mammalians, remains a computational challenge. In chapter 2, we propose a novel pairwise alignment system, ACANA, to improve the alignment quality of genomic sequences. Compared with top competing alignment tools, ACANA achieves better alignment quality in aligning divergent sequences for both local and global alignments. When applied to the upstream sequences of human-mouse orthologs, ACANA is able to reliably detect the conserved functional regions containing most cis-elements. Statistical motif modeling is another fundamental computational approach for motif prediction in large genome sequence. In chapter 3, we introduce the mixture of optimized Markov models to reduce false motif discovery rate in large genomic sequences. Our model is not only able to incorporate most dependency information within a motif by optimizing the arrangement of motif positions, but also flexible for adjusting model complexity limited by the size of training data. We implement the mixture model in our OMiMa system. Using OMiMa, we demonstrate that our model can improve motif prediction accuracy. Although the reconstruction of complete human gene regulatory networks, at present, remains a distant hope, it is still possible to infer some distinct features of the networks from the available data. In chapter 4, we present an example of inferring major evolutionary features of human gene regulatory networks by combining information from both gene sequence data and functional annotations. We systematically analyze the association between gene function and upstream region conservation for human-rodent orthologs. Our study shows that upstream regulatory regions of developmental transcription regulators, such as Hox genes, are extremely conserved while those of catalytic enzymes are significantly less conserved. We suggest that developmental and other important regulators constitute the central hub of human gene regulatory networks.
  • No Thumbnail Available
    Development of Linkage and Association Methods to Map Disease Genes
    (2002-10-28) Liu, Wenlei; Gregory C. Gibson, Committee Member; Bruce S. Weir, Committee Chair; Zhao-Bang Zeng, Committee Member; Dahlia M. Nielsen, Committee Member
    Identification of disease susceptibility genes is one of the primary aims of contemporary genetic research. With the recent development in molecular biology techniques, large-scale gene mapping with a dense genome-spanning set of markers becomes a reality. The availability of markers throughout the genome has made linkage and association studies more feasible. In the first chapter, we review many linkage and association methods and point out the potential problems with current linkage and association analysis. In the second chapter, we modify two identity-by-state (IBS) test statistics of Lange (Lange K. 1986a, A test statistic for the affected-sib-set method. Annals of Human Genetics 50, 283--290; Lange K. 1986b, The affected sib-pair method using identity by descent relations. American Journal of Human Genetics 39, 148--150.) to allow for inbreeding in the population. We evaluate the power and false positive rates of the modified tests under three disease models using simulated data. When the population inbreeding coefficient is large, both the false positive rates and power are reduced when the modified test statistics were applied, although power remained high under a recessive disease model. Allowing for inbreeding is therefore appropriate at least for diseases known to be recessive. In the third chapter, we compute the proportions of affected sib pairs sharing 0, 1 and 2 marker alleles identity-by-decent (IBD) in an inbred population and express them in terms of higher order decent measures. We perform two consistency checks on the identity state probabilities and the two consistency checks verify our calculations. We did the same thing for affected sib pairs from first cousin marriage in an inbred population. In the fourth chapter, we study linkage and linkage disequilibrium (LD) simultaneously for single QTL using family data in an attempt to increase mapping resolution and reduce false positive rates. We estimate QTL allele frequencies, LD and recombination factions between the marker loci and the QTL locus and the QTL model parameters using an EM algorithm. After performing single analysis, we extend our model to study two marker loci simultaneously so that we can increase the accuracy of the estimations. Our simulation results show that our EM algorithm can give consistent estimates of all the parameters considered.
  • No Thumbnail Available
    Disease Gene Mapping in General Pedigrees
    (2005-02-28) Li, Li; Michael D. Purugganan, Committee Member; Bruce S. Weir, Committee Chair; Sharon R. Browning, Committee Member; Zhao-Bang Zeng, Committee Member; Margarate G. Ehm, Committee Member
    Disease gene mapping is one of the main focuses of genetic epidemiology and statistical genetics. This dissertation explores some methods and algorithms in this area, especially in pedigrees. The first chapter gives an introduction to human genetics and disease gene mapping. Existing linkage and association methods are introduced and compared. Probabilities of genotypic data from multiple linked marker loci on related individuals are used as likelihoods of gene locations for gene-mapping, or as likelihoods of other parameters of interest in human genetics. With the recent development in genetics and molecular biology techniques, large-scale marker data has become available, which requires highly efficient likelihood calculations especially for complex pedigrees. Algorithms for likelihood calculations for pedigree data are reviewed in chapter 2. Besides exact likelihood calculation methods and MCMC, a Sequential Importance Sampling (SIS) approach has been proposed to enable calculations for large pedigrees with large numbers of markers. However, when the system gets large, the variance of the importance sampling weights increases while both efficiency and accuracy of the method decrease. We propose an optimization algorithm for calculating the likelihood of general pedigrees in Chapter 3. We incorporate a resampling strategy into SIS to reduce the variance inflation problem. A successful linkage analysis may identify a linkage region of interest containing hundreds of genes at a magnitude of perhaps ten to thirty centiMorgans. A follow-up association (or so-called linkage disequilibrium) analysis can provide much finer gene-mapping but is subject to greater multiple testing problems. In Chapter 4, we present a method for determining whether an association result is responsible for a non-parametric linkage result for binary traits in general pedigrees. The correlation between family frequency of a variant of interest and family LOD score is used as a measure of whether the association between a given variant at a marker and the disease status can help to explain a significant linkage result seen in the collection of families in the region around the marker.
  • No Thumbnail Available
    Statistical Methods for the Analysis of Forensic DNA Mixtures
    (2006-07-11) Beecham, Gary Wayne Jr.; Sujit Ghosh, Committee Member; Dahlia Nielsen, Committee Member; Bruce S. Weir, Committee Chair; Gene Eisen, Committee Member
    Forensic DNA mixtures are often interpreted statistically using a likelihood ratio. These ratios are of the form, "The evidence is LR times more likely when assuming the prosecution's hypothesis than when assuming the defense hypothesis." The likelihood ratio calculations rest on the allelic frequencies, yet these frequencies are estimated from only a small portion of the population. Therefore, because of sampling error, the likelihood ratio is an estimate, a random variable. In Chapter 2 the use of a confidence interval to report the variation of likelihood ratios is proposed. The formula for the confidence interval is herein explained and a computer program has been made available. In Chapter 3, a maximum likelihood method is given for the inclusion of peak intensities in forensic DNA mixture likelihood ratio calculations. Observed peak intensities are the result of the underlying composition of the mixture: the amount contributed, and the genotypes of the contributors. This chapter proposes the use of the maximum likelihood method to weight each possible genotype combination by the likelihood of the genotype given the peak intensities. Models based on the Normal and Dirichlet distributions are described. Both models tend to weight more correct genotypes higher, though the Normal model puts much more emphasis on the best model(s) than the Dirichlet. This method can also be applied to certain cases of allele drop out. In the final chapter, several different situations are explored. Four standard cases are considered: single-contributor evidence, two-contributor evidence, the paternity index, and the consideration of relationship by pedigree. These four standard cases are used as an introduction to basic concepts, which are in turn used to discuss more complicated cases later in the chapter. The more complicated cases discussed include analysis of a paternity index from a mixture, relatives and mixtures, consideration of relatives in the presence of population substructure, and a case of canine parentage under varying degrees of relatedness.
  • No Thumbnail Available
    Statistical methods for the analysis of genetics marker and microarray data
    (2004-05-18) Yu, Xiang; Bruce S. Weir, Committee Chair; Dahlia M. Nielsen, Committee Co-Chair; Greg Gibson, Committee Member; Russell D. Wolfinger, Committee Member
    With the advent of high-throughput technologies in genomics study, a large volume of data has been accumulated, leaving the challenge for bioinformaticists on how to manage, analyze, and interpret the data. Analysis of genetic marker and microarray data are two important aspects in current bioinformatics studies. In this dissertation work, we tend to explore some statistical issues for such problems. We discuss two extensions of the EM algorithm to infer haplotypes from genotype data, each for a particular sampling scenario. The first one applies to a random sample of both diploid and haploid individuals from the population, in which the haplotype information from the haploid individuals is incorporated into the estimation process. The second one applies to a sample of parent-offspring trios, in which the dependencies between the parental and the offspring genotypes are correctly handled in the analysis. We show that these two modified EM algorithms perform better than the usual one when applied to their corresponding specific samples, respectively. We study the experimental designs in two-color microarray experiments and resolve some of the outstanding issues that are controversial on the use of different experiment designs. We show that the loop and balanced block designs analyzed in a mixed model are more efficient that the reference designs from a statistical point of view. We also provide general guidelines on how to optimize experimental resources to get maximal efficiency using these designs. We present an application of the mixed model to identify transcription factor-gene interactions and to infer transcriptional regulatory structures in Sacchromyces cerevisiae using microarray experiments. We demonstrate the mixed model that pools the observations across all experiments to be a powerful approach.
  • No Thumbnail Available
    Statistical Methods in Genetic Association Studies
    (2007-08-01) Gao, Xiaoyi; Dahlia M. Nielsen, Committee Co-Chair; Bruce S. Weir, Committee Chair; Philip Awadalla, Committee Member; Jason A. Osborne, Committee Member
    Population structure is a serious confounding factor in genetic association studies. It may lead to false positive results or failure to detect true association. We propose a hierarchical clustering algorithm, AW-clust, for using single nucleotide polymorphism (SNP) genetic data to assign individuals to populations. We show that the algorithm can assign sample individuals highly accurately to their corresponding ethic groups: CEU, YRI, CHB+JPT in our tests using HapMap SNP data and it is also robust to admixed populations when tested on Perlegen SNP data. Moreover, it can detect fine-scale population structure as subtle as that between Chinese and Japanese by using genome-wide hight diversity SNP loci. Genotyping errors exist in most genetic data and can influence the biological conclusions of the studies. A simple method is to conduct the Hardy-Weinberg equilibrium (HWE) test in population-based association studies. We investigated the power issue of using the HWE test on genotyping error detection in the presence of current genotyping technologies. Multiple testing is a challenging issue in genetic studies using SNPs that are in linkage disequilibrium (LD) with each other. Failure to adjust for multiple testing appropriately may produce excess false positives or overlook true positive signals. We propose a new multiple testing correction method, CLDMeff , for association studies using SNP markers. It is shown to be simpler and more accurate than the recently developed methods and is comparable to the permutation-based correction using both simulated and real data. The efficiency and accuracy of the CLDMeff method makes it an attractive choice for multiple testing correction when there is high intermarker LD in the SNP dataset.
  • No Thumbnail Available
    Statistical Topics in Disease Gene Mapping
    (2003-04-14) Meng, Zhaoling; Bruce S. Weir, Committee Chair; Margaret G. Ehm, Committee Co-Chair; Zhao-Bang Zeng, Committee Member; Russ Wolfinger, Committee Member; Greg Gibson, Committee Member; Jonathan Allen, Committee Member
    Efforts in disease gene mapping have achieved a great deal of success in mendelain diseases, but made slower progress in common disease studies because of their complexity. The rapid development of genetics and molecular technologies provides an immense amount of DNA data; developing powerful and efficient statistical methodologies is under high demand. This dissertation explored some aspects of the problem. The power of two genome-wide disease gene mapping strategies is investigated. One applies linkage analysis and then linkage disequilibrium (LD) tests to markers within linked regions. The other looks for LD with disease using all markers. The results showed that the genome-wide association based tests are much more likely to identify genes. Genotyping closely spaced Single Nucleotide Polymorphisms (SNPs) frequently yields highly correlated data due to extensive LD, and gives association studies unnecessary and unaffordable burden when these markers don't yield significantly different information. Two procedures are developed to select an optimum subset of SNPs that could be efficiently genotyped on larger numbers of samples while retaining most of the information based on genotypes of a large initial set of SNPs on a small number of samples. One utilizes a spectral decomposition method based on matrices of pair-wise LD, and the other extends David Clayton's htSNP selection method. Properties of the procedures are studied; minimum sample sizes needed for achieving consistent results are recommended; the procedures are evaluated using experimental data. Studying gene-treatment interaction is a long desired problem. When the genetic variation that is being tested is not specific functional sites but randomly selected polymorphisms, a source of randomness is introduced. A mixed effect model is developed to fit fixed treatment effects, random haplotypic effects, and random gene-treatment interactions in this scenario; likelihood ratio tests are applied for testing the random effects. Our simulation results showed that the mixed effect model is valid and generally behaves better than the fixed haplotypic effects model in the exploratory phase of a study.

Contact

D. H. Hill Jr. Library

2 Broughton Drive
Campus Box 7111
Raleigh, NC 27695-7111
(919) 515-3364

James B. Hunt Jr. Library

1070 Partners Way
Campus Box 7132
Raleigh, NC 27606-7132
(919) 515-7110

Libraries Administration

(919) 515-7188

NC State University Libraries

  • D. H. Hill Jr. Library
  • James B. Hunt Jr. Library
  • Design Library
  • Natural Resources Library
  • Veterinary Medicine Library
  • Accessibility at the Libraries
  • Accessibility at NC State University
  • Copyright
  • Jobs
  • Privacy Statement
  • Staff Confluence Login
  • Staff Drupal Login

Follow the Libraries

  • Facebook
  • Instagram
  • Twitter
  • Snapchat
  • LinkedIn
  • Vimeo
  • YouTube
  • YouTube Archive
  • Flickr
  • Libraries' news

ncsu libraries snapchat bitmoji

×