Statistical Analysis of Compounds Using OBSTree and Compound Mixtures Using Nonlinear Models

Abstract

A novel tree-structured data-mining tool is proposed to automatically search for and find high performance classification and important quantitative structure-activity relationships (QSARs) hidden in large data sets. The presence or absence of multiple chemical features is implemented to identify more informative splitting rules. A stochastic optimization scheme combined with a new splitting criterion and a post-trimming procedure is developed to find global optimum splitting variables. The algorithm is also ready to serve as a powerful predictive tool for estimating unknown biological activities according to the chemical structures. We also investigate several statistical issues in chemical mixture studies. With a thorough review of different concepts of additivity the criteria for evaluating a concept of additivity are discussed and a particular concept of additivity is generalized to some complicated studies. A nonlinear dose-response model is initially developed for binary mixtures. The model can be easily generalized to a mixture of $M$ chemicals. Different types of test statistics under multiplicity adjustments are proposed to test the interactions.

Description

Keywords

hypothesis test, statistics, data mining, chemical mixtures, nonlinear model, QSAR

Citation

Degree

PhD

Discipline

Statistics

Collections