Statistical Analysis of Genetic Associations

Abstract

There is an increasing need for a statistical treatment of genetic data prompted by recent advances in molecular genetics and molecular technology. Study of associations between genes is one of the most important aspects in applications of population genetics theory and statistical methodology to genetic data. Developments of these methods are important for conservation biology, experimental population genetics, forensic science, and for mapping human disease genes. Over the next several years, genotypic data will be collected to attempt locating positions of multiple genes affecting disease phenotype. Adequate statistical methodology is required to analyze these data. Special attention should be paid to multiple testing issues resulting from searching through many genetic markers and high risk of false associations. In this research we develop theory and methods needed to treat some of these problems. We introduce exact conditional tests for analyzing associations within and between genes in samples of multilocus genotypes and efficient algorithms to perform them. These tests are formulated for the general case of multiple alleles at arbitrary numbers of loci and lead to multiple testing adjustments based on the closing testing principle, thus providing strong protection of the family-wise error rate. We discuss an application of the closing method to the testing for Hardy-Weinberg equilibrium and computationally efficient shortcuts arising from methods for combining p-values that allow to deal with large numbers of loci. We also discuss efficient Bayesian tests for heterozygote excess and deficiency, as a special case of testing for Hardy-Weinberg equilibrium, and the frequentist properties of a p-value type of quantity resulting from them. We further develop new methods for validation of experiments and for combining and adjusting independent and correlated p-values and apply them to simulated as well as to actual gene expression data sets. These methods prove to be especially useful in situations with large numbers of statistical tests, such as in whole-genome screens for associations of genetic markers with disease phenotypes and in analyzing gene expression data obtained from DNA microarrays.

Description

Keywords

Citation

Degree

PhD

Discipline

Statistics

Collections