Collection of Biostatistics Research ArchiveCopyright (c) 2015 COBRA All rights reserved.
http://biostats.bepress.com
Recent documents in Collection of Biostatistics Research Archiveen-usFri, 27 Feb 2015 01:37:19 PST3600Simulation of Semicompeting Risk Survival Data and Estimation Based on Multistate Frailty Model
http://biostats.bepress.com/harvardbiostat/paper188
http://biostats.bepress.com/harvardbiostat/paper188Wed, 04 Feb 2015 11:57:56 PST
We develop a simulation procedure to simulate the semicompeting risk survival data. In addition, we introduce an EM algorithm and a B–spline based estimation procedure to evaluate and implement Xu et al. (2010)’s nonparametric likelihood es- timation approach. The simulation procedure provides a route to simulate samples from the likelihood introduced in Xu et al. (2010)’s. Further, the EM algorithm and the B–spline methods stabilize the estimation and gives accurate estimation results. We illustrate the simulation and the estimation procedure with simluation examples and real data analysis.
]]>
Fei Jiang et al.Optimal Dynamic Treatments in Resource-Limited Settings
http://biostats.bepress.com/ucbbiostat/paper333
http://biostats.bepress.com/ucbbiostat/paper333Fri, 30 Jan 2015 14:19:03 PST
A dynamic treatment rule (DTR) is a treatment rule which assigns treatments to individuals based on (a subset of) their measured covariates. An optimal DTR is the DTR which maximizes the population mean outcome. Previous works in this area have assumed that treatment is an unlimited resource so that the entire population can be treated if this strategy maximizes the population mean outcome. We consider optimal DTRs in settings where the treatment resource is limited so that there is a maximum proportion of the population which can be treated. We give a general closed-form expression for an optimal stochastic DTR in this resource-limited setting, and a closed-form expression for the optimal deterministic DTR under an additional assumption. We also present an estimator of the mean outcome under the optimal stochastic DTR in a large semiparametric model that at most places restrictions on the probability of treatment assignment given covariates. We give conditions under which our estimator is efficient among all regular and asymptotically linear estimators. All of our results are supported by simulations.
]]>
Alexander R. Luedtke et al.FACETS: Fraction and Allele-Specific Copy Number Estimates from Tumor Sequencing
http://biostats.bepress.com/mskccbiostat/paper29
http://biostats.bepress.com/mskccbiostat/paper29Tue, 06 Jan 2015 21:20:29 PST
Intratumor heterogeneity is characterized by the presence of genetically and phenotypically distinct subclones of tumor cells. Such genetic diversity within a tumor is increasingly recognized as a driver of rapid disease progression, resistance to targeted therapies, and poor survival outcome. It also has important implications in defining "actionable" driver genes in the cancer genome. To facilitate intratumor heterogeneity analysis, we developed a unified analysis pipeline called FACETS for DNA sequencing of tumor-normal pairs (including whole-exome, whole-genome, and targeted capture sequencing), to 1) perform joint segmentation of total and allelic copy ratio, and to 2) estimate tumor purity, ploidy, allele-specific copy number and associated cell fraction profile. The output can be used to precisely identify copy number states including gains, losses, copy number-neutral regions, and loss of heterozygosity (LOH). In addition, a cell fraction profile is generated to determine whether each aberration is clonal (present in 100% cancer cells) versus subclonal (present in a fraction of cancer cells). We demonstrate its application using the cancer genome atlas (TCGA) dataset.
]]>
Ronglai Shen et al.Statistical Inference for the Mean Outcome Under a Possibly Non-Unique Optimal Treatment Strategy
http://biostats.bepress.com/ucbbiostat/paper332
http://biostats.bepress.com/ucbbiostat/paper332Thu, 18 Dec 2014 11:31:22 PST
We consider challenges that arise in the estimation of the value of an optimal individualized treatment strategy defined as the treatment rule that maximizes the population mean outcome, where the candidate treatment rules are restricted to depend on baseline covariates. We prove a necessary and sufficient condition for the pathwise differentiability of the optimal value, a key condition needed to develop a regular asymptotically linear (RAL) estimator of this parameter. The stated condition is slightly more general than the previous condition implied in the literature. We then describe an approach to obtain root-n rate confidence intervals for the optimal value even when the parameter is not pathwise differentiable. In particular, we develop an estimator that, when properly standardized, converges to a normal limiting distribution. We provide conditions under which our estimator is RAL and asymptotically efficient when the mean outcome is pathwise differentiable. We outline an extension of our approach to a multiple time point problem in the appendix. All of our results are supported by simulations.
]]>
Alexander R. Luedtke et al.Confidence intervals for the treatment effect on the treated
http://biostats.bepress.com/cobra/art111
http://biostats.bepress.com/cobra/art111Thu, 18 Dec 2014 09:36:58 PST
The average effect of the treatment on the treated is a quantity of interest in observational studies in which no definite parameter can be used to quantify the treatment effect, such as those where only a random subset of the data obtained by stratification can be used for analysis. Non-parametric confidence intervals for this quantity appear to be known only in the case where the responses to the treatment are binary and the data fall into a single stratum. We propose nonparametric confidence intervals for the average effect of the treatment on the treated in studies involving one or more strata and general numerical responses.
]]>
José A. FerreiraA Marginalized Zero-Inflated Negative Binomial Regression Model with Overall Exposure Effects
http://biostats.bepress.com/uncbiostat/art43
http://biostats.bepress.com/uncbiostat/art43Thu, 18 Dec 2014 06:58:32 PST
The zero-inflated negative binomial regression model (ZINB) is often employed in diverse fields such as dentistry, health care utilization, highway safety, and medicine, to examine relationships between exposures of interest and overdispersed count outcomes exhibiting many zeros. The regression coefficients of ZINB have latent class interpretations for a susceptible subpopulation at risk for the disease/condition under study with counts generated from a negative binomial distribution and for a non-susceptible subpopulation that provides only zero counts. The ZINB parameters, however, are not well-suited for estimating overall exposure effects, specifically, in quantifying the effect of an explanatory variable in the overall mixture population. In this paper, a marginalized zero-inflated negative binomial regression (MZINB) model for independent responses is proposed to model the population marginal mean count directly, providing straightforward inference for overall exposure effects based on maximum likelihood estimation. Through simulation studies, the performance of MZINB with respect to test size is compared to marginalized zero-inflated Poisson, Poisson, and negative binomial regression. The MZINB model is applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren.
]]>
John S. Preisser et al.OPTIMAL, TWO STAGE, ADAPTIVE ENRICHMENT DESIGNS FOR RANDOMIZED TRIALS USING SPARSE LINEAR PROGRAMMING
http://biostats.bepress.com/jhubiostat/paper273
http://biostats.bepress.com/jhubiostat/paper273Mon, 15 Dec 2014 11:44:33 PST
Adaptive enrichment designs involve preplanned rules for modifying enrollment criteria based on accruing data in a randomized trial. Such designs have been proposed, for example, when the population of interest consists of biomarker positive and biomarker negative individuals. The goal is to learn which populations benefit from an experimental treatment. Two critical components of adaptive enrichment designs are the decision rule for modifying enrollment, and the multiple testing procedure. We provide the first general method for simultaneously optimizing both of these components for two stage, adaptive enrichment designs. We minimize expected sample size under constraints on power and the familywise Type I error rate. It is computationally infeasible to directly solve this optimization problem since it is not convex. The key to our approach is a novel representation of a discretized version of this optimization problem as a sparse linear program. We apply advanced optimization methods to solve this problem to high accuracy, revealing new, approximately optimal designs.
]]>
Michael Rosenblum et al.Higher-order Targeted Minimum Loss-based Estimation
http://biostats.bepress.com/ucbbiostat/paper331
http://biostats.bepress.com/ucbbiostat/paper331Thu, 11 Dec 2014 11:31:37 PST
Common approaches to parametric statistical inference often encounter difficulties in the context of infinite-dimensional models. The framework of targeted maximum likelihood estimation (TMLE), introduced in van der Laan & Rubin (2006), is a principled approach for constructing asymptotically linear and efficient substitution estimators in rich infinite-dimensional models. The mechanics of TMLE hinge upon first-order approximations of the parameter of interest as a mapping on the space of probability distributions. For such approximations to hold, a second-order remainder term must tend to zero sufficiently fast. In practice, this means an initial estimator of the underlying data-generating distribution with a sufficiently large rate of convergence must be available -- in many cases, this requirement is prohibitively difficult to satisfy. In this article, we propose a generalization of TMLE utilizing a higher-order approximation of the target parameter. This approach yields asymptotically linear and efficient estimators when a higher-order remainder term is asymptotically negligible. The latter condition is often much less stringent than that arising in a regular first-order TMLE. Beyond relaxing regularity conditions, use of a higher-order TMLE can improve inference accuracy in finite samples due to its explicit reliance on a higher-order approximation. We provide the theoretical foundations of higher-order TMLE and study its use for estimating a counterfactual mean when all potential confounders have been measured. We show, in particular, that the implementation of a higher-order TMLE is nearly identical to that of a regular first-order TMLE. Since higher-order TMLE requires higher-order differentiability of the target parameter, a requirement that often fails to hold, we also discuss and study practicable approximation strategies that allow us to circumvent this failure in applications.
]]>
Marco Carone et al.Doubly Robust Learning for Estimating Individualized Treatment with Censored Data
http://biostats.bepress.com/uncbiostat/art42
http://biostats.bepress.com/uncbiostat/art42Wed, 10 Dec 2014 07:08:44 PST
Individualized treatment rules recommend treatments based on individual patient characteristics in order to maximize clinical benefit. When the clinical outcome of interest is survival time, estimation is often complicated by censoring. We develop nonparametric methods for estimating an optimal individualized treatment rule in the presence of censored data. To adjust for censoring, we propose a doubly robust estimator which requires correct specification of either the censoring model or survival model, but not both; the method is shown to be Fisher consistent when either model is correct. Furthermore, we establish the convergence rate of the expected survival under the estimated optimal individualized treatment rule to the expected survival under the optimal individualized treatment rule. We illustrate the proposed methods using simulation study and data from a Phase III clinical trial on non-small cell lung cancer.
]]>
Ying-Qi Zhao et al.On the Restricted Mean Survival Time Curve Survival Analysis
http://biostats.bepress.com/harvardbiostat/paper187
http://biostats.bepress.com/harvardbiostat/paper187Mon, 24 Nov 2014 07:52:46 PSTLihui Zhao et al.Quantifying an Adherence Path-Specific Effect of Antiretroviral Therapy in the Nigeria PEPFAR Program
http://biostats.bepress.com/harvardbiostat/paper186
http://biostats.bepress.com/harvardbiostat/paper186Mon, 24 Nov 2014 07:38:34 PSTCaleb Miles et al.Constrained Bayesian Estimation of Inverse Probability Weights for Nonmonotone Missing Data
http://biostats.bepress.com/harvardbiostat/paper185
http://biostats.bepress.com/harvardbiostat/paper185Wed, 19 Nov 2014 09:45:07 PSTBaoLuo Sun et al.Testing Gene-Environment Interactions in the Presence of Measurement Error
http://biostats.bepress.com/uwbiostat/paper405
http://biostats.bepress.com/uwbiostat/paper405Tue, 18 Nov 2014 09:18:35 PST
Complex diseases result from an interplay between genetic and environmental risk factors, and it is of great interest to study the gene-environment interaction (GxE) to understand the etiology of complex diseases. Recent developments in genetics field allows one to study GxE systematically. However, one difficulty with GxE arises from the fact that environmental exposures are often measured with error. In this paper, we focus on testing GxE when the environmental exposure E is subject to measurement error. Surprisingly, contrast to the well-established results that the naive test ignoring measurement error is valid in testing the main effects, we find that the naive test for GxE leads to inflated type I error under the null hypothesis of no interaction. The naive test also leads to biased estimates of the GxE effect. The analytic form of the bias term for general linear models is obtained, which is shown to be closely related to regression calibration. We then propose a regression calibration based approach to correct measurement error for testing GxE when either validation data or replicates are available. Extensive simulation studies are conducted to illustrate the performance of various tests with moderate sample sizes. Based on both theoretical and empirical results, we recommend the proposed test, as its type I error is properly controlled and it has at least comparable power to the naive test even when naive test is valid. The proposed methods are applied to study the gene–blood pressure interaction for cardiovascular diseases in an ancillary study of the Women’s Health Initiative.
]]>
Chongzhi Di et al.Personalized Evaluation of Biomarker Value: A Cost-benefit Perspective
http://biostats.bepress.com/uwbiostat/paper404
http://biostats.bepress.com/uwbiostat/paper404Thu, 13 Nov 2014 15:05:39 PST
For a patient who is facing a treatment decision, the added value of information provided by a biomarker depends on the individual patient’s expected response to treatment with and without the biomarker, as well as his/her tolerance of disease and treatment harm. However, individualized estimators of the value of a biomarker are lacking. We propose a new graphical tool named the subject-specific expected benefit curve for quantifying the personalized value of a biomarker in aiding a treatment decision. We develop semiparametric estimators for two general settings: i) when biomarker data are available from a randomized trial; and ii) when biomarker data are available from a cohort or a cross-sectional study, together with external information about a multiplicative treatment effect. We also develop adaptive bootstrap confidence intervals for consistent inference in the presence of non-regularity. The proposed method is used to evaluate the individualized value of the serum creatinine marker in informing treatment decisions for the prevention of renal artery stenosis.
]]>
Ying Huang et al.CROSS-DESIGN SYNTHESIS FOR EXTENDING THE APPLICABILITY OF TRIAL EVIDENCE WHEN TREATMENT EFFECT IS HETEROGENEOUS. PART II. APPLICATION AND EXTERNAL VALIDATION
http://biostats.bepress.com/jhubiostat/paper272
http://biostats.bepress.com/jhubiostat/paper272Wed, 05 Nov 2014 10:51:24 PST
Randomized controlled trials (RCTs) generally provide the most reliable evidence. When participants in RCTs are selected with respect to characteristics that are potential treatment effect modifiers, the average treatment effect from the trials may not be applicable to a specific target population. We present a new method to project the treatment effect from a RCT to a target group that is inadequately represented in the trial when there is heterogeneity in the treatment effect (HTE). The method integrates RCT and observational data through cross-design synthesis. An essential component is to identify HTE and a calibration factor for unmeasured confounding for the observational study relative to the RCT. The estimate of treatment effect adjusted for unmeasured confounding is projected onto the target sample using G-computation with standardization weights. We call the method Calibrated Risk-Adjusted Modeling (CRAM) and apply it to estimate the effect of angiotensin converting enzyme inhibition to prevent heart failure hospitalization or death. External validation shows that when there is adequate overlap between the RCT and the target sample, risk-based standardization is less biased than CRAM. However, when there is poor overlap between the trial and the target sample, CRAM provides superior estimates of treatment effect.
]]>
Carlos Weiss et al.CROSS-DESIGN SYNTHESIS FOR EXTENDING THE APPLICABILITY OF TRIAL EVIDENCE WHEN TREATMENT EFFECT IS HETEROGENEOUS-I. METHODOLOGY
http://biostats.bepress.com/jhubiostat/paper271
http://biostats.bepress.com/jhubiostat/paper271Wed, 05 Nov 2014 10:41:00 PST
Randomized controlled trials (RCTs) provide reliable evidence for approval of new treatments, informing clinical practice, and coverage decisions. The participants in RCTs are often not a representative sample of the larger at-risk population. Hence it is argued that the average treatment effect from the trial is not generalizable to the larger at-risk population. An essential premise of this argument is that there is significant heterogeneity in the treatment effect (HTE). We present a new method to extrapolate the treatment effect from a trial to a target group that is inadequately represented in the trial, when HTE is present. Our method integrates trial and observational data (cross-design synthesis). The target group is assumed to be well-represented in the observational database. An essential component of the methodology is the estimation of calibration adjustments for unmeasured confounding in the observational sample. The estimate of treatment effect, adjusted for unmeasured confounding, is projected onto the target sample using a weighted G-computation approach. We present simulation studies to demonstrate the methodology for estimating the marginal treatment effect in a target sample that differs from the trial sample to varying degrees. In a companion paper, we demonstrate and validate the methodology in a clinical application.
]]>
Ravi Varadhan et al.sanon : An R Package for Stratified Analysis with Nonparametric Covariable Adjustment
http://biostats.bepress.com/uncbiostat/art41
http://biostats.bepress.com/uncbiostat/art41Wed, 29 Oct 2014 07:15:38 PDT
Kawaguchi et al. (2011) provided methodology and applications for a stratified Mann-Whitney estimator that addresses the same comparison between two randomized groups for a strictly ordinal response variable as the van Elteren test statistic for randomized clinical trials with strata. The sanon package provides the implementation of the method within the R programming environment (R Core Team, 2012). The usage of sanon is illustrated with five examples. The first example is a randomized clinical trial with eight strata and a univariate ordinal response variable. The second example is a randomized clinical trial with four strata, two covariables, and four ordinal response variables. The third example is a cross over design randomized clinical trial with two strata, one covariable, and two ordinal response variables. The fourth example is a randomized clinical trial with seven strata (which are managed as a categorical covariable), three ordinal covariables with missing values, and three ordinal response variables with missing values. The fifth example is a randomized clinical trial with six strata, a categorical covariable with three levels, and three ordinal response variables with missing values.
]]>
Atsushi Kawaguchi et al.ENHANCED PRECISION IN THE ANALYSIS OF RANDOMIZED TRIALS WITH ORDINAL OUTCOMES
http://biostats.bepress.com/jhubiostat/paper270
http://biostats.bepress.com/jhubiostat/paper270Wed, 22 Oct 2014 10:36:19 PDT
We present a general method for estimating the effect of a treatment on an ordinal outcome in randomized trials. The method is robust in that it does not rely on the proportional odds assumption. Our estimator leverages information in prognostic baseline variables, and has all of the following properties: (i) it is consistent; (ii) it is locally efficient; (iii) it is guaranteed to match or improve the precision of the standard, unadjusted estimator. To the best of our knowledge, this is the first estimator of the causal relation between a treatment and an ordinal outcome to satisfy these properties. We demonstrate the estimator in simulations based on resampling from a completed randomized clinical trial of a new treatment for stroke; we show potential gains of up to 39\% in relative efficiency compared to the unadjusted estimator. The proposed estimator could be a useful tool for analyzing randomized trials with ordinal outcomes, since existing methods either rely on model assumptions that are untenable in many practical applications, or lack the efficiency properties of the proposed estimator. We provide R code implementing the estimator.
]]>
Iván Díaz et al.Nonparametric Adjustment for Measurement Error in Time to Event Data
http://biostats.bepress.com/harvardbiostat/paper184
http://biostats.bepress.com/harvardbiostat/paper184Wed, 22 Oct 2014 09:37:50 PDT
Measurement error in time to event data used as a predictor will lead to inaccurate predictions. This arises in the context of self-reported family history, a time to event predictor often measured with error, used in Mendelian risk prediction models. Using a validation data set, we propose a method to adjust for this type of measurement error. We estimate the measurement error process using a nonparametric smoothed Kaplan-Meier estimator, and use Monte Carlo integration to implement the adjustment. We apply our method to simulated data in the context of both Mendelian risk prediction models and multivariate survival prediction models, as well as illustrate our method using a data application for Mendelian risk prediction models. Results from simulations are evaluated using measures of mean squared error of prediction (MSEP), area under the response operating characteristics curve (ROC-AUC), and the ratio of observed to expected number of events. These results show that our adjusted method mitigates the effects of measurement error mainly by improving calibration and by improving total accuracy. In some scenarios discrimination is also improved.
]]>
Danielle Braun et al.Extending Mendelian Risk Prediction Models to Handle Misreported Family History
http://biostats.bepress.com/harvardbiostat/paper183
http://biostats.bepress.com/harvardbiostat/paper183Wed, 22 Oct 2014 09:37:48 PDT
Mendelian risk prediction models calculate the probability of a proband being a mutation carrier based on family history and known mutation prevalence and penetrance. Family history in this setting, is self-reported and is often reported with error. Various studies in the literature have evaluated misreporting of family history. Using a validation data set which includes both error-prone self-reported family history and error-free validated family history, we propose a method to adjust for misreporting of family history. We estimate the measurement error process in a validation data set (from University of California at Irvine (UCI)) using nonparametric smoothed Kaplan-Meier estimators, and use Monte Carlo integration to implement the adjustment. In this paper, we extend BRCAPRO, a Mendelian risk prediction model for breast and ovarian cancers, to adjust for misreporting in family history. We apply the extended model to data from the Cancer Genetics Network (CGN).
]]>
Danielle Braun et al.