Statistical Design and Analysis of High Throughput Screening Data Using Pooling Experiments and Data Mining Techniques
No Thumbnail Available
Files
Date
2004-07-02
Authors
Journal Title
Series/Report No.
Journal ISSN
Volume Title
Publisher
Abstract
Discovery of a new drug involves screening large chemical libraries to identify new and diverse active compounds. Only a very small percentage of the compounds in the library are active. Naive screening approaches of testing all compounds in the library are not desirable since in addition to being expensive, they provide little information on what aspects of the chemical structure of active compounds are related to activity.
This work investigates pooling experiments as one possible approach of improving screening efficiency and gaining insight into the structure-activity relationships. Four different pooling designs are proposed using two design criteria, optimal coverage of the chemical space and minimal collision between compounds. We evaluate each method by determining how well the design criteria are met and whether the methods are able to find many diverse active compounds. One pooling design emerges as a winner, but all designed pools clearly outperform randomly created pools. Furthermore, different analysis approaches of the pooling designs are investigated. Multiple trees are compared to model-based likelihood approaches with different covariate class definitions. Results show that a model-based likelihood approach with a multiple-trees-lower-bound covariate class definition gives the best performance.
Another possible approach of improving screening efficiency and gaining insight into the structure-activity relationships is the use of data mining techniques such as RandomForest and ChemTree. These techniques are applied to individual compounds.
Description
Keywords
Uniform cell coverage designs, Chemical descriptors, Drug discovery
Citation
Degree
PhD
Discipline
Statistics