Estimation and Sampling Properties of Gene Diversity, Heterozygosity and F[subscript ST]

Abstract

Estimates of the coancestry coefficient F[subscript ST], gene diversity and heterozygosity have been used in many fields, including conservation and evolutionary biology, and forensic studies. Although the sampling properties of estimators of these parameters could affect inferences to be made, these continue to be frequently overlooked in published analyses. This dissertation characterizes the estimators of these measures by presenting relevant theoretical developments, approaches to estimation, and results regarding evaluations of the sampling properties of these three measures. Making inferences about the genetic variation among populations of a species, rather than some larger, between-species scope, will be the biological focus. The accuracy and precision of the method of moments and maximum likelihood estimators of population-specific F[subscript ST] developed by citet[Weir02] are evaluated through population simulation and analysis of an empirical data set. Of the two estimators considered, the method of moments estimator for population-specific F[subscript ST] is found to be relatively unbiased with a large sampling variance, which increases as coancestry increases in a population. Sampling more loci has a much stronger effect on reducing this sampling variance than sampling more individuals. The other estimator evalutated here obtained by maximum-likelihood poorly estimates the coancestry in a population for two iterative approaches and a non-iterative approach, and is not recommended for future analyses. Problems with estimates obtained from individual loci with very low polymorphism levels for both estimators are discussed and practical measures for proceeding with analyses are suggested. Properties of several methods for inferring the variances of sample heterozygosity or gene diversity are evaluated, including the use of a new random model for the total variance of sample heterozygosity. Large differences with a previous mixed model are observed for a case where there is a large variance component due to loci. Several approximations are evaluated and compared to variances obtained from exact expressions. Different results with unbalanced data for the total variance of sample heterozygosity are obtained with four variance component methods, as expected by statistical theory. The likelihood-based methods considered here are shown to be robust to violations of assumptions of normality, even for very small sample sizes.

Description

Keywords

F-statistics, population genetics, linear models, drift model

Citation

Degree

PhD

Discipline

Genetics

Collections