Log In
New user? Click here to register. Have you forgotten your password?
NC State University Libraries Logo
    Communities & Collections
    Browse NC State Repository
Log In
New user? Click here to register. Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Dennis Boos, Committee Member"

Filter results by typing the first few letters
Now showing 1 - 14 of 14
  • Results Per Page
  • Sort Options
  • No Thumbnail Available
    Accounting for Within- and Between-Locus Dependencies in Marker Association Tests
    (2003-06-26) Czika, Wendy Ann; Dennis Boos, Committee Member; David Dickey, Committee Member; Dahlia Nielsen, Committee Member; Bruce S. Weir, Committee Chair; Russell Wolfinger, Committee Member
    The importance of marker association tests has recently been established for locating disease susceptibility genes in the human genome, attaining finer-scaled maps than the linkage variety of tests through the detection of linkage disequilibrium (LD). Many of these association tests were originally defined for biallelic markers under ideal assumptions, with multiallelic extensions often complicated by the covariance among genotype or allele proportions. The well-established allele and genotype case-control tests based on Pearson chi-square test statistics are exceptions since they adapt easily to multiallelic versions, however each of these has its shortcomings. We demonstrate that the multiallelic trend test is an attractive alternative that lacks these limitations. A formula for marker genotype frequencies that incorporates the coefficients quantifying various disequilibria is presented, accommodating any type of disease model. This enables the simulation of samples for estimating the significance level and calculating sample sizes necessary for achieving a certain level of power. There is a similar complexity in extending the family-based tests of association to markers with more than two alleles. Fortunately, the nonparametric sibling disequilibrium test (SDT) statistic has a natural extension to a quadratic form for multiallelic markers. In the original presentation of the statistic however, information from one of the marker alleles is needlessly discarded. This is necessary for the parametric form of the statistic due to a linear dependency among the statistics for the alleles, but the nonparametric representation eliminates this dependency. We show how a statistic making use of all the allelic information can be formed. Obstacles also arise when multiple loci affect disease susceptibility. In the presence of gene-gene interaction, single-marker tests may be unable to detect an association between individual markers and disease status. We implement and evaluate tree-based methods for the mapping of multiple susceptibility genes. Adjustments to correlated p-values from markers in LD with each other are also examined. This study of epistatic gene models reveals the importance of three-locus disequilibria of which we discuss various statistical tests.
  • No Thumbnail Available
    Analysis of Gene Expression Profiles with Linear Mixed Models
    (2005-04-25) Hsieh, Wen-Ping; Greg Gibson, Committee Chair; Russ Wolfinger, Committee Co-Chair; Dennis Boos, Committee Member; Spencer Muse, Committee Member
    With the emergence of high throughput technology, proper interpretation of data has become critical for many aspects of biomedical research. My dissertation explores two major issues in gene expression profile microarray data analysis. One is quantification of variation across and among species and its effect on biological interpretation. The second part of my work is to develop better statistical estimates that can account for different sources of variation for significant gene detection. A previously published dataset of oligonucleotide array data for three primate species was analyzed with linear mixed models. By decomposing the variation of expression into different explanatory factors, the differences among species as well as between tissues was revealed at the expression level. Issues of cross-species hybridization and expression divergence compared to mutation-drift equilibrium were addressed. The power and flexibility of the linear mixed model framework for detection of differentially expressed genes was then explored with a dataset that includes spiked-in controls. The impact of probe-level sequence variation on cross-hybridization was detected through a Gibb's sampling method that highlights potential problems for short oligonucleotide microarray data analysis. A motif as short as fifteen bases can possibly cause significant cross-hybridization. Finally, a bivariate model using information from both perfect match probes and mismatch probes was proposed as a means to increase the statistical power for detection of significant differences in gene expression. The improved performance of the method was demonstrated through Monte Carlo simulation. The detection power can increase as much as 20% with 5% false positive rate under certain circumstances.
  • No Thumbnail Available
    Asymptotic behavior of some Bayesian nonparametric and semi-parametric procedures
    (2009-03-23) Wu, Yuefeng; Subhashis Ghosal, Committee Chair; Dennis Boos, Committee Member; Sujit K. Ghosh, Committee Member; Huixia Wang, Committee Member
    This dissertation extends some established results about the asymptotic behavior of Some Bayesian Nonparametric and Semi-parametric Procedures in three aspects. First, positivity of the prior probability of Kullback-Leibler neighborhood around the true density, commonly known as the Kullback-Leibler property, plays a fundamental role in posterior consistency. A popular prior for Bayesian estimation is given by a Dirichlet mixture, where the kernels are chosen depending on the sample space and the class of densities to be estimated. The Kullback-Leibler property of the Dirichlet mixture prior has been shown for some special kernels like the normal density or Bernstein polynomial, under appropriate conditions. We obtain easily verifiable sufficient conditions, under which a prior obtained by mixing a general kernel possesses the Kullback-Leibler property. We study a wide variety of kernels used in practice, including the normal, $t$, histogram, gamma, Weibull densities and so on, and show that the Kullback-Leibler property holds if some easily verifiable conditions are satisfied at the true density. This gives a catalog of conditions required for the Kullback-Leibler property, which can be readily used in applications. Second, the Bayesian approach to analyzing semi-parametric models are gaining popularity in practice. For the Cox proportional hazard model, it has been shown recently that the posterior is consistent and leads to asymptotically accurate confidence intervals under a Levy process prior on the cumulative hazard rate. The explicit expression of the posterior distribution together with independent increment structure of Levy process play a key role in the development. However, except for one-dimensional linear regression with an unknown error distribution and binary response regression with unknown link function, even consistency of Bayesian procedures has not been studied for a general prior distribution. We consider consistency of Bayesian inference for several semi-parametric models including multiple linear regression with an unknown error distribution, exponential frailty model, generalized linear model with unknown link function, Cox proportional hazard model where the baseline hazard function is unknown, accelerated failure time models and partial linear regression model. We give sufficient conditions under which the posterior distribution of the parametric part is consistent in the Euclidean distance while the non-parametric part is consistent with respect to some topology such as the weak topology. Our results are obtained by verifying the conditions of an appropriate modification of a celebrated result of Schwartz. Our general consistency result applies also to the case of independent, non-identically distributed observations. Application of our theorem requires showing the existence of exponentially consistent tests for the complement of the neighborhoods of the "true" value of the parameter and the prior positivity of a Kullback-Leibler type of neighborhood of the true distribution of the observations. We construct the required tests and give sufficient conditions for positivity of prior probabilities of Kullback-Leibler neighborhoods in all the examples we consider in the corresponding chapter of this dissertation. Third, Dirichlet mixtures has been used for multivariate density estimation in practice for quite some time. However, the consistency of such model has not been studied. Valuable results have been given on posterior consistency of Dirichlet mixtures in univariate density estimation. But these results cannot be generalized directly to multivariate cases. By controlling the tail behavior of the base measure of the Dirichlet process, and through the technique of calculating entropy, we give sufficient conditions on the true density and the model prior, under which the posterior consistency holds.
  • No Thumbnail Available
    Catch Curve and Capture Recapture Models: A Bayesian Combined Approach
    (2009-03-19) Griffith, Emily Hohmeister; Dennis Boos, Committee Member; Kenneth H. Pollock, Committee Chair; Sujit K. Ghosh, Committee Co-Chair; Kevin Gross, Committee Member
    When studying animal populations, one demographic parameter of interest is the annual rate of survival. Methods for estimating survival rates of animal populations fall into two general categories: methods based on marked or non-marked animals. Catch curve analysis falls into the latter category of non-marked animal methods, and is based on strong assumptions about population dynamics. Capture-recapture methods, on the other hand, use marked animals and require assumptions about homogeneous individual capture and survival probabilities. We focus specifically on Chapman and Robson’s catch curve analysis, the Cormack-Jolly-Seber (CJS) open population model, and Udevtiz and Ballachey’s augmentation of catch curve data with ages-at-death data, which are a random sample from the natural deaths that occur in a population between two time periods. In Chapter 1, we develop the Bayesian approach to catch curve analysis, beginning with the simple situation of a single catch curve. After extending our method to multiple years, we relax the model assumptions to include random effects for survival across years. The proposed model is validated using predictive distributions and compared with the traditional methods. We conclude that many benefits can be obtained from the Bayesian approach to the analysis of a single or multiple year catch curve. In Chapter 2, we augment catch curve data with capture-recapture data in a hierarchical Bayesian framework. We estimate the fidelity rate and the population growth rate. We illustrate these models with a data set and simulation study. In Chapter 3, we develop a Bayesian method for analyzing catch curve and ages-at-death data together, based on the likelihoods developed in Udevitz and Ballachey. We utilize the Bayesian framework and relax both the assumption of a stable age-distribution and that of a known population growth rate.
  • No Thumbnail Available
    Caution Using Bootstrap Tolerance Limits with Application to Dissolution Specification Limits
    (2007-11-22) Bergquist, Mandy; Marcia Gumpertz, Committee Member; Daowen Zhang, Committee Member; Dennis Boos, Committee Member; Marie Davidian, Committee Chair
  • No Thumbnail Available
    Improving Efficiency and Robustness of Doubly Robust Estimators in the Presence of Coarsened Data
    (2009-11-03) Cao, Weihua; Marie Davidian, Committee Chair; Anastasios A. Tsiatis, Committee Co-Chair; Daowen Zhang, Committee Member; Dennis Boos, Committee Member
    Considerable recent interest has focused on doubly robust estimators for a population mean response in the presence of incomplete data, which involve models for both the propensity score and the regression of outcome on covariates. The ``usual" doubly robust estimator may yield severely biased inferences if neither of these models is correctly specified and can exhibit nonnegligible bias if the estimated propensity score is close to zero for some observations. In part one of this dissertation, we propose alternative doubly robust estimators that achieve comparable or improved performance relative to existing methods, even with some estimated propensity scores close to zero. The second part of this dissertation focuses on drawing inference on parameters in general models in the presence of monotonely coarsened data, which can be viewed as a generalization of longitudinal data with a monotone missingness pattern, as is the case when subjects drop out of a study. Estimators for parameters of interest include both inverse probability weighted estimators and doubly robust estimators. As a generalization of methods in part one, we propose alternative doubly robust estimators that achieve comparable or improved performance relative to existing methods. We apply the proposed method to data from an AIDS clinical trial.
  • No Thumbnail Available
    Model Selection and Estimation in Additive Regression Models
    (2009-09-14) Miao, Huiping; Hao Zhang, Committee Member; Marie Davidian, Committee Member; Dennis Boos, Committee Member; Daowen Zhang, Committee Chair
    We propose a method of simultaneous model selection and estimation in additive regression models (ARMs) for independent normal data. We use the mixed model representation of the smoothing spline estimators of the nonparametric functions in ARMs, where the importance of these functions is controlled by treating the inverse of the smoothing parameters as extra variance components. The selection of important nonparametric functions is achieved by maximizing the penalized likelihood with an adaptive LASSO. A unified EM algorithm is provided to obtain the maximum penalized likelihood estimates of the nonparametric functions and the residual variance. In the same framework, we also consider forward selection based on score tests, and a two stage approach that imposes an early stage screening using an individual score test on each induced variance component of the smoothing parameter. For longitudinal data, we propose to extend the adaptive LASSO and the two-stage selection with score test screening to the additive mixed models (AMMs), by introducing subject-specific random effects to the additive models to accommodate the correlation in responses. We use the eigenvalue-eigenvector decomposition approach to approximate the working random effects in the linear mixed model presentation of the AMMs, so as to reduce the dimensions of matrices involved in the algorithm while keeping most data information, hence to tackle the computational problems caused by large sample sizes in longitudinal data. Simulation studies are provided and the methods are illustrated with data applications.
  • No Thumbnail Available
    Recursive Quantile Estimation with Application to Value at Risk
    (2008-04-25) Ruan, Chen; Dave Dickey, Committee Member; Dennis Boos, Committee Member; Denis Pelletier, Committee Member; Peter Bloomfield, Committee Chair
  • No Thumbnail Available
    Robust Variable Selection
    (2009-04-20) Schumann, David Heinz; Dennis Boos, Committee Member; Judy Wang, Committee Member; Leonard Stefanski, Committee Co-Chair; Lexin Li, Committee Member
    The prevalence of extreme outliers in many regression data sets has led to the development of robust methods that can handle these observations. While much attention has been placed on the problem of estimating regression coefficients in the presence of outliers, few methods address variable selection. We develop and study robust versions of the forward selection algorithm, one of the most popular standard variable selection techniques. Specifically we modify the VAMS procedure, a version of forward selection tuned to control the false selection rate, to simultaneously select variables and eliminate outliers. In an alternative approach, robust versions of the forward selection algorithm are developed using the robust forward addition sequence associated with the generalized score statistic. Combining the robust forward addition sequence with robust versions of BIC and the VAMS procedure, a final model is obtained. Monte Carlo simulation compares these robust methods to current robust methods like the LSA and LAD-LASSO. Further simulation investigates the relationship between the breakdown point of the estimation methods central to each procedure and the breakdown point of the final variable selection method.
  • No Thumbnail Available
    Topics in Design and Analysis of Clinical Trials (DRAFT)
    (2005-08-04) Lokhnygina, Yuliya; Marie Davidian, Committee Member; Dennis Boos, Committee Member; Anastasios A. Tsiatis, Committee Chair; Daowen Zhang, Committee Member
    In the first part of this dissertation we derive optimal two-stage adaptive group-sequential designs for normally distributed data which achieve the minimum of a mixture of expected sample sizes at the range of plausible values of a normal mean. Unlike standard group-sequential tests, our method is adaptive in that it allows the group size at the second look to be a function of the observed test statistic at the first look. Using optimality criteria, we construct two-stage designs which we show have advantage over other popular adaptive methods. The employed computational method is a modification of the backward induction algorithm applied to a Bayesian decision problem. Two-stage randomization designs (TSRD) are becoming increasingly common in oncology and AIDS clinical trials as they make more efficient use of study participants to examine therapeutic regimens. In these designs patients are initially randomized to an induction treatment, followed by randomization to a maintenance treatment conditional on their induction response and consent to further study treatment. Broader acceptance of TSRDs in drug development may hinge on the ability to make appropriate intent-to-treat type inference as to whether an experimental induction regimen is better than a standard regimen in the absence of maintenance treatment within this design framework. Lunceford, Davidian, and Tsiatis (2002, Biometrics 58, 48-57) introduced an inverse-probability-weighting based analytical framework for estimating survival distributions and mean restricted survival times, as well as for comparing treatment policies in the TSRD setting. In practice Cox regression is widely used, and in the second part of this dissertation we extend the analytical framework of Lunceford et. al. to derive a consistent estimator for the log hazard in the Cox model and a robust score test to compare treatment policies. Large sample properties of these methods are derived and illustrated via a simulation study. Considerations regarding the application of TSRDs compared to single randomization designs are discussed.
  • No Thumbnail Available
    Topics Involving the Gamma Distribution: the Normal Coefficient of Variation and Conditional Monte Carlo.
    (2007-01-19) Boyer, Joseph Guenther; William Swallow, Committee Chair; Dennis Boos, Committee Member; Cavell Brownie, Committee Member; Thomas Gerig, Committee Member; Michael Boyette, Committee Member
    A transformation of the sample coefficient of variation ($CV$) for normal data is shown to be nearly proportional to a $chiˆ2$ random variable. The associated density is applied to inference on the common $CV$ of $k$ populations, testing $CV$ homogeneity across populations, and confidence intervals for the ratio of two $CV$s. The resulting tests and confidence intervals are shown via theory and simulation to be valid and powerful. In other work on the coefficient of variation, a sample of scientific abstracts is used to characterize the values of the $CV$ encountered in practice, point estimation for a common $CV$ in normal populations is studied, and the literature on testing $CV$ homogeneity in normal populations is reviewed. There is very little literature on the problem of conducting inference in models for continuous data conditional on sufficient statistics for nuisance parameters. This thesis explores Monte Carlo approaches to conditional $p$-value calculation in such models, including Dirichlet data generation, importance sampling, Markov chain Monte Carlo, and a method related to fiducial inference. Importance sampling is used to create a conditional test of $CV$ homogeneity in normal populations using the $chi^2$ approximation mentioned above. A Markov chain Monte Carlo solution is given to the long-standing problem of testing the homogeneity of exponential populations subject to Type I censoring. Conditional Monte Carlo algorithms are also applied to testing for an effect of a factor in an experiment with exponential data, testing for a dispersion effect in a replicated experiment with normal data, and testing a null value of a coefficient in exponential regression with an inverse link; brief consideration is also given to the problem of testing the homogeneity of $k$ $gamma$ distributions.
  • No Thumbnail Available
    Variable Selection in Linear Mixed Model for Longitudinal Data
    (2006-08-17) Lan, Lan; Daowen Zhang, Committee Chair; Hao Helen Zhang, Committee Co-Chair; Marie Davidian, Committee Member; Dennis Boos, Committee Member
    Fan and Li (JASA, 2001) proposed a family of variable selection procedures for certain parametric models via a nonconcave penalized likelihood approach, where significant variable selection and parameter estimation were done simultaneously, and the procedures were shown to have the oracle property. In this presentation, we extend the nonconcave penalized likelihood approach to linear mixed models for longitudinal data. Two new approaches are proposed to select significant covariates and estimate fixed effect parameters and variance components. In particular, we show the new approaches also possess the oracle property when the tuning parameter is chosen appropriately. We assess the performance of the proposed approaches via simulation and apply the procedures to data from the Multicenter AIDS Cohort Study.
  • No Thumbnail Available
    Variable Selection in Semi-parametric Additive Models with Extensions to High Dimensional Data and Additive Cox Models
    (2008-06-27) Liu, Song; Hao Helen Zhang, Committee Chair; Dennis Boos, Committee Member; Wenbin Lu, Committee Member; John Monahan, Committee Member
  • No Thumbnail Available
    Variable Selection Procedures for Generalized Linear Mixed Models in Longitudinal Data Analysis
    (2007-08-03) Yang, Hongmei; Daowen Zhang, Committee Chair; Hao Helen Zhang, Committee Co-Chair; Dennis Boos, Committee Member; Marie Davidian, Committee Member
    Model selection is important for longitudinal data analysis. But up to date little work has been done on variable selection for generalized linear mixed models (GLMM). In this paper we propose and study a class of variable selection methods. Full likelihood (FL) approach is proposed for simultaneous model selection and parameter estimation. Due to the intensive computation involved in FL approach, Penalized Quasi-Likelihood (PQL) procedure is developed so that model selection in GLMMs can proceed in the framework of linear mixed models. Since the PQL approach will produce biased parameter estimates for sparse binary longitudinal data, Two-stage Penalized Quasi-Likelihood approach (TPQL) is proposed to bias correct PQL in terms of estimation: use PQL to do model selection at the first stage and existing software to do parameter estimation at the second stage. Marginal approach for some special types of data is also developed. A robust estimator of standard error for the fitted parameters is derived based on a sandwich formula. A bias correction is proposed to improve the estimation accuracy of PQL for binary data. The sampling performance of four proposed procedures is evaluated through extensive simulations and their application to real data analysis. In terms of model selection, all of them perform closely. As for parameter estimation, FL, AML and TPQL yield similar results. Compared with FL, the other procedures greatly reduce computational load. The proposed procedures can be extended to longitudinal data analysis involving missing data, and the shrinkage penalty based approach allows them to work even when the number of observations n is less than the number of parameters d.

Contact

D. H. Hill Jr. Library

2 Broughton Drive
Campus Box 7111
Raleigh, NC 27695-7111
(919) 515-3364

James B. Hunt Jr. Library

1070 Partners Way
Campus Box 7132
Raleigh, NC 27606-7132
(919) 515-7110

Libraries Administration

(919) 515-7188

NC State University Libraries

  • D. H. Hill Jr. Library
  • James B. Hunt Jr. Library
  • Design Library
  • Natural Resources Library
  • Veterinary Medicine Library
  • Accessibility at the Libraries
  • Accessibility at NC State University
  • Copyright
  • Jobs
  • Privacy Statement
  • Staff Confluence Login
  • Staff Drupal Login

Follow the Libraries

  • Facebook
  • Instagram
  • Twitter
  • Snapchat
  • LinkedIn
  • Vimeo
  • YouTube
  • YouTube Archive
  • Flickr
  • Libraries' news

ncsu libraries snapchat bitmoji

×