Software and Methods for Analyzing Molecular Genetic Marker Data

Liu, Kejun

Software and Methods for Analyzing Molecular Genetic Marker Data

Files

etd.pdf (888.64 KB)

Date

2003-07-18

Authors

Liu, Kejun

Advisors

Edward Buckler, Committee Member

Montserrat Fuentes, Committee Member

Bruce S. Weir, Committee Member

Spencer V. Muse, Committee Chair

Abstract

Genetic analysis of molecular markers has allowed biologists to ask a wide variety of questions. This dissertation explores some aspects of the statistical and computational issues used in the genetic marker data analysis. Chapter 1 gives an introduction to genetic marker data, as well as a brief description to each chapter. Chapter 2 presents the different genetic analyses performed on a large data set and discusses the use of microsatellites to describe the maize germplasm and to improve maize germplasm maintenance. Considerable attention is focused on how the maize germplasm is organized and genetic variation is distributed. A novel maximum likelihood method is developed to estimate the historical contributions for maize inbred lines. Chapter 3 covers a new method for optimal selection of a core set of lines from a large germplasm collection. The simulated annealing algorithm for choosing an optimal k-subset is described and evaluated using the maize germplasm as an example; general constraints are incorporated in the algorithm, and the efficiency of the algorithms is compared to existing methods. Chapter 4 covers a two-stage strategy to partition a chromosomal region into blocks with extensive within-block linkage disequilibrium, and to select the optimal subset of SNPs that essentially captures the haplotype variation within a block. Population simulations suggest that the recursive bisection algorithm for block partitioning is generally reliable for recombination hotspots identification. Maximal entropy theory is applied to choose optimal subset of SNPs. The procedures are evaluated analytically as well as by simulation. The final chapter covers a new software package for genetic marker data analysis. The methods implemented in the package are listed. A brief tutorial is included to illustrate the features of the package. Chapter 5 also describes a new method for estimating population specific F-statistics and an extended algorithm for estimating haplotype frequencies.

Keywords

Genetic Data Analysis, Inbreds, Maize, Marker, CoreSet, PowerMarker

URI

http://www.lib.ncsu.edu/resolver/1840.16/3308

Degree

PhD

Discipline

Bioinformatics

Collections

Dissertations

Full item page

Software and Methods for Analyzing Molecular Genetic Marker Data

Files

Date

Authors

Advisors

Journal Title

Series/Report No.

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Degree

Discipline

Collections