Probe Design and Data Analysis for Gene Expression Microarrays

Show full item record

Title: Probe Design and Data Analysis for Gene Expression Microarrays
Author: Warren, Liling Li
Advisors: Greg Gibson, Committee Member
Spencer Muse, Committee Member
Ben Liu, Committee Chair
Bruce Weir, Committee Member
Abstract: This thesis work focuses on several bioinformatics aspects of DNA microarray experiments. DNA microarrays are breakthrough technologies for large scale gene expression profiling. Instead of measuring transcription levels one gene at a time, expression levels for many thousands of genes can be quantified simultaneously on one microarray. Depending on the array format, cDNA or pre-synthesized oligo nucleotides can be deposited as probes onto the array. Oligo probes can also be synthesized on the array. During the complete process of a DNA microarray experiment, many steps involve bioinformatics tasks; from probe design, image analysis, data normalization to data analysis and data mining. This thesis deals with oligo probe design issues and comparisons of data normalization methods. Methods on how to select a relatively small number of short probes and use them in a combinatorial fashion to quantify large scale expression levels are also explored. In Chapter one, a novel algorithm to design gene specific probes is described. When gene specific oligos are used as probes, it is crucial to select a set of probes that have desirable properties in order for many hybridization reactions to take place in parallel on an array. Given a set of sequences, the algorithm works by finding the range of melting temperatures for all possible probe choices. Then for each possible melting temperature within the range, one probe having the closest melting temperature is picked from each sequence to form a probe set. Among all the probe sets, the one that has the most homogeneous melting temperatures is the optimized choice. The major significance of our approach is the reduction of computation amount, which increases linearly as the number of genes increases rather than exponentially. Detailed steps on how to implement the algorithm are outlined and examples are given. With some modifications, the algorithm can also be applied to design allele specific probes for SNP genotyping or point mutation detections. In Chapter two, five normalization methods are compared with each other and also compared with analysis skipping the normalization step. Overall, performing normalization can reduce systematic variations and identify more genes as differentially expressed than without the normalization step. Among different normalization methods being compared, ANOVA based normalization method has the most power to detect differentially expressed genes. When the same normalization and analysis methods are used, ratio based method has more power than the one based on absolute signal intensity values. When different number of genes are detected by different normalization methods, one way to plan for future experiment is to use the set of genes that have been detected by all methods. Alternatively, one can use all the genes that have been identified to be differentially expressed regardless which method was used to design further experiments. Insights from this study on how to incorporate biological variation into future experimental designs are also discussed. In Chapter three, we present methods to choose a set of short oligos to design a genome or tissue specific biochip and then to solve a set of equations for gene expression levels to determine genes that are differentially expressed between samples. The methods have been tested to define a set of 4000 8mers as probes to identify genes that have fold changes for more than 6000 identified yeast ORFs. These methods can also be expanded to design genome specific or tissue specific biochips for other organisms with full gene sequence information. The major advantages of using our methods is to significantly reduce overall cost in array fabrication and oligo synthesis. The process of mining probe sets depends on knowing gene sequence information in a specific genome or tissue. As more genomes are being sequenced, this method holds great promise towards enabling more accurate and less expensive microarray experiments.
Date: 2003-04-13
Degree: PhD
Discipline: Bioinformatics

Files in this item

Files Size Format View
etd.pdf 901.9Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record