Tree-structured Classification for Multivariate Binary Responses
No Thumbnail Available
Files
Date
2003-12-19
Authors
Journal Title
Series/Report No.
Journal ISSN
Volume Title
Publisher
Abstract
In this work, a new algorithm of tree-structured classification for multivariate binary responses, the majority-vote method, is proposed. The majority-vote method is a variation of the original work of Breiman et al (1984) on Classification And Regression Trees. The majority-vote method is similar to CART in that both methods use node impurity as the basis of the splitting rules. The majority-vote method differs from CART in that it determines tree size by choosing an optimal threshold value so that the cross-validated hit rate is maximized, whereas CART uses cost-complexity pruning to determine the optimal tree size. The original motivation of this work is to handle incomplete data, missing and censoring, in a Quantitative Structure Activity Relationship (QSAR) context, where the responses are continuous measurements of activity levels. We proceed by discretizing the responses into binary variables and using the majority-vote method to analyze the resulting binary responses. The performance of the majority-vote method is compared to its continuous response counterpart, MultiSCAM, a tree-structured algorithm for analyzing multivariate continuous responses. Multivariate analysis of variance (MANOVA) is used to evaluate the relative information loss due to discretization. The predictivity of the majority-vote method is evaluated by hit rate, a commonly used criterion in drug discovery. Simulation studies show that the majority-vote method outperforms MultiSCAM for censored data in that it yields higher hit rates.
Description
Keywords
hit rate, recursive partitioning, multivariate binary response, QSAR, classification tree
Citation
Degree
PhD
Discipline
Statistics