Controlling Variable Selection By the Addition of Pseudo-Variables

No Thumbnail Available

Date

2004-08-09

Journal Title

Series/Report No.

Journal ISSN

Volume Title

Publisher

Abstract

Many variable selection procedures have been developed in the literature for linear regression models. We propose a new and general approach, the False Selection Rate (FSR) method, to control variable selection with the advantage of being applicable to a broader class of regression models; for example, binary regression, Poisson regression, etc. By adding a number of pseudo-variables to the real set of data and monitoring the proportion of pseudo-variables falsely selected in the model, we are able to control the model false selection rate, selecting as many important variables as possible while selecting a relatively low proportion of false important ones. We focus on forward selection because it is applicable in the case where there are more variables than observations. Due to the difficulty of obtaining analytical results, we study our approach by Monte Carlo and compare it with a variety of commonly used procedures. We first focus on linear regression models, and then extend the approach to logistic regression models. The new method is illustrated on four real data sets.

Description

Keywords

forward selection, false selection rate, subset selection, variable selection

Citation

Degree

PhD

Discipline

Statistics

Collections