Accounting for Within- and Between-Locus Dependencies in Marker Association Tests

Abstract

The importance of marker association tests has recently been established for locating disease susceptibility genes in the human genome, attaining finer-scaled maps than the linkage variety of tests through the detection of linkage disequilibrium (LD). Many of these association tests were originally defined for biallelic markers under ideal assumptions, with multiallelic extensions often complicated by the covariance among genotype or allele proportions. The well-established allele and genotype case-control tests based on Pearson chi-square test statistics are exceptions since they adapt easily to multiallelic versions, however each of these has its shortcomings. We demonstrate that the multiallelic trend test is an attractive alternative that lacks these limitations. A formula for marker genotype frequencies that incorporates the coefficients quantifying various disequilibria is presented, accommodating any type of disease model. This enables the simulation of samples for estimating the significance level and calculating sample sizes necessary for achieving a certain level of power. There is a similar complexity in extending the family-based tests of association to markers with more than two alleles. Fortunately, the nonparametric sibling disequilibrium test (SDT) statistic has a natural extension to a quadratic form for multiallelic markers. In the original presentation of the statistic however, information from one of the marker alleles is needlessly discarded. This is necessary for the parametric form of the statistic due to a linear dependency among the statistics for the alleles, but the nonparametric representation eliminates this dependency. We show how a statistic making use of all the allelic information can be formed. Obstacles also arise when multiple loci affect disease susceptibility. In the presence of gene-gene interaction, single-marker tests may be unable to detect an association between individual markers and disease status. We implement and evaluate tree-based methods for the mapping of multiple susceptibility genes. Adjustments to correlated p-values from markers in LD with each other are also examined. This study of epistatic gene models reveals the importance of three-locus disequilibria of which we discuss various statistical tests.

Description

Keywords

decision trees, mSDT, haplotype-based tests

Citation

Degree

PhD

Discipline

Statistics

Collections