Probabilistic Allele Calling to Improve Population Size Estimates from Non-Invasive Genetic Mark-Recapture Analysis

No Thumbnail Available

Date

2009-08-10

Journal Title

Series/Report No.

Journal ISSN

Volume Title

Publisher

Abstract

Accurate estimates of population sizes are often necessary to help researchers better understand how wildlife populations are changing over time. Researchers often use traditional mark-recapture methods to estimate wildlife population sizes. A variety of models, with varying assumptions, are available to analyze traditional mark-recapture data. The utility of traditional mark-recapture methods is limited when sampling rare or elusive species. Capture probabilities may not be high enough due to the difficulties and cost of capturing the animals. In addition, physical capture can be stressful, even deadly, to the animals. The limitations of traditional mark-recapture methods can sometimes be addressed by utilizing non-invasive genetic mark-recapture methods. Using the non-invasive genetic method, individuals are not physically captured and tagged. Instead, non-invasive genetic samples, such as hair or scat, are collected and genotyped at multiple microsatellite markers. An individual's genotype serves as a DNA tag, uniquely identifying that individual. DNA is extracted from each sample and the extracted DNA is PCR amplified multiple times at several microsatellite loci. The results of each PCR amplification are visualized using capillary electrophoresis, resulting in an electropherogram. Alleles are called by interpreting the peak heights and/or peak areas on the electropherogram. While non-invasive genetic methods solve some of the problems of traditional mark-recapture, they also introduce some new problems. One major problem introduced by non-invasive genetic methods is the misidentification of individuals. The DNA from non-invasive samples is often low in quality and/or low in quantity, which increases the probability of genotyping errors. In addition, poor marker selection can result in individuals sharing a genotype. Traditional mark-recapture methods are not robust to violations of the assumption that individuals are correctly identified. Genotyping errors cause overestimation of population size; markers that lack the power to distinguish between individuals cause underestimation of population size. To achieve better population size estimates, I propose a new probabilistic allele calling method. In the traditional method, definitive allele calls are made independently for each PCR replicate of a sample. Then, the definitive allele calls are examined to determine the sample's genotype. The new method assigns probabilities to allele calls, rather than determining a definitive allele call. Probabilities are assigned to possible allele calls based on electropherogram peak heights. For cases of possible allelic drop out, a portion of the probability distribution for the PCR replicate is assigned to a heterozygous allele call with one undesignated allele. For each sample, the allele call probabilities at each locus, including allele calls with undesignated alleles, are averaged from the PCR replicates. Then, possible allele calls with undesignated alleles are assigned based on the allele frequencies in the averaged probabilities. The genotype with the highest probability is assigned as the sample's genotype. Using the probabilistic method, uncertainty remains in the allele calls until all the PCR replicates of a sample are examined. This allows more information from the electropherograms to be utilized when determining genotypes. To examine the proposed probabilistic allele calling method, I compared it to a traditional method by running computer simulations that examine a variety of scenarios. For each simulation scenario, a population was generated and sampled using non-invasive genetic mark-recapture methods. Each sample, which contained DNA of low quality and quantity, was genotyped at multiple microsatellite loci, with multiple PCR replicates for each locus. Genotypes were determined for samples using a traditional allele calling method and the new probabilistic allele calling method. The resulting genotypes were matched and the data was analyzed using four traditional closed mark-recapture models. The probabilistic method performed better than the traditional method in almost all cases. When more than two PCR replicates were examined, the estimates from the probabilistic method were less biased and more precise than estimates from the traditional method. Using the probabilistic method, good estimates can be achieved using fewer PCR replicates. This new method of analyzing non-invasive genetic mark-recapture data has the potential to allow wildlife population sizes to be accurately estimated using non-invasive methods in less time and at lower cost than current methods.

Description

Keywords

genotyping errors, DNA tags, allele calling, population size estimation, mark-recapture, non-invasive

Citation

Degree

MS

Discipline

Biomathematics

Collections