Analysis on Microarray Data and DNA Regulatory Elements Prediction

No Thumbnail Available

Date

2002-10-22

Authors

Journal Title

Series/Report No.

Journal ISSN

Volume Title

Publisher

Abstract

Transcription profiling with microarray technology has significantly accelerated our understanding of complex biological processes by allowing the genome-wide measure of message RNA levels. Microarrays are commonly used for identifying genes with expression differing between two or more samples (e.g. treatments vs. controls), searching for gene expression patterns among a set of samples or genes, and studying gene regulation networks. Here, we first address the variation intrinsic to microarray experiments. The analysis of variance technique was applied to partition and quantify several sources of variation likely to be present in a typical cDNA microarray experiment. Based on a pilot experiment with intensive replication at several levels, we showed that significant amounts of variation can be attributed to slide, plate and pin differences. The origin of these sources of variation was discussed and suggestions were made on how to minimize or avoid them when a future microarray experiment is designed. Next, we demonstrated that molecular cancer classification could be approached by discriminant analysis. We analyzed a public Affymetrix chip dataset and selected the predictor genes based on the t-values and stepwise discriminant analysis, and evaluated the resulting model's performance in predicting 34 test samples by discriminant analysis. Only two samples were not correctly predicted with 25 predictor genes we chose. We also evaluated the parsimony of our model by evaluating, through a stepwise method, the minimum number of genes required to maintain a high level of accuracy in predicting cancer types. The accumulation of microarray data can help elucidate the gene regulation mechanisms in cells. Here, we attempted to find an improved matrix description for transcription factor binding site. We applied a genetic algorithm (GA) to derive matrices that were trained from a set of true binding sequences and random sequences. Preliminary results indicate that the matrix derived shows a higher specificity in binding site prediction than the regular position weighted matrix (PWM) within a range of cutoff scores. The binding site of the cell-cycle related transcription factors, E2Fs, was taken as an example to illustrate our method. When both the GA-derived and regular matrices were applied to scan the human gene upstream sequences, the matrix we derived gave significant less predictions than the regular matrix, given the same false negative rate observed in the training dataset.

Description

Keywords

Microarray, ANOVA, Regulatory elements

Citation

Degree

PhD

Discipline

Bioinformatics

Collections