Finding Patterns in DNA Sequences through Visualization with Symbolic Scatter Plots

No Thumbnail Available

Date

2010-03-30

Journal Title

Series/Report No.

Journal ISSN

Volume Title

Publisher

Abstract

Visualization is frequently mentioned as a technique for analyzing large amounts of data. It has been widely anticipated for many years that visualization would become a major tool for the analysis of rapidly growing genomic databases. However, beyond the dot plot which was introduced in 1981 there have been few successful attempts at visualizing this data. In this thesis a new technique for visualizing DNA sequences, the symbolic scatter plot, is introduced. It is shown how the symbolic scatter plot addresses the problems of 1) finding complex patterns in DNA sequences and 2) the comparison of sequences. Second, the symbolic scatter plot is analyzed in terms of human visual perception – particularly in terms of Gestalt theory and pre-attentive visual processing. Third, examples of how specific pre-attentive visual cues can be manipulated or added to find motifs and visualize information content (i.e. entropy) are presented. Fourth, the practicality of symbolic scatter plots is demonstrated by using them to visualize and compare the human and chimpanzee genes responsible for Huntington’s disease.

Description

Keywords

pattern finding, visualization, symbolic scatter plots, DNA sequences

Citation

Degree

PhD

Discipline

Computer Science

Collections