Johns Hopkins University, Dept. of Biostatistics Working PapersCopyright (c) 2015 Johns Hopkins University All rights reserved.
http://biostats.bepress.com/jhubiostat
Recent documents in Johns Hopkins University, Dept. of Biostatistics Working Papersen-usMon, 11 May 2015 12:50:27 PDT3600Nested Partially-Latent Class Models for Dependent Binary Data; Estimating Disease Etiology
http://biostats.bepress.com/jhubiostat/paper276
http://biostats.bepress.com/jhubiostat/paper276Fri, 24 Apr 2015 14:00:08 PDT
The Pneumonia Etiology Research for Child Health (PERCH) study seeks to use modern measurement technology to infer the causes of pneumonia. The paper describes a latent variable model designed to infer from case-control data the etiology distribution for the population of cases and for an individual case given his or her measurements taking account of dependence among pathogen measurements due to sources other than class membership. We assume each observation is drawn from a mixture model for which each component represents one pathogen. Conditional dependence among multivariate binary measurements on a single subject is induced by nesting latent subclasses within each disease class. Measurement precision and covariation can be estimated using the control sample for whom the etiologic class is known. We use stick-breaking priors on the subclass weights to estimate the population and individual etiologic distributions that are averaged across models indexed by different numbers of subclasses. Assessment of model fit and individual diagnosis is done using posterior samples drawn by Gibbs sampling. We demonstrate the method on simulated and on the motivating PERCH data.
]]>
Zhenke Wu et al.ADAPTIVE ENRICHMENT DESIGNS FOR RANDOMIZED TRIALS WITH DELAYED ENDPOINTS, USING LOCALLY EFFICIENT ESTIMATORS TO IMPROVE PRECISION
http://biostats.bepress.com/jhubiostat/paper275
http://biostats.bepress.com/jhubiostat/paper275Fri, 24 Apr 2015 14:00:07 PDT
Adaptive enrichment designs involve preplanned rules for modifying enrollment criteria based on accrued data in an ongoing trial. For example, enrollment of a subpopulation where there is sufficient evidence of treatment efficacy, futility, or harm could be stopped, while enrollment for the remaining subpopulations is continued. Most existing methods for constructing adaptive enrichment designs are limited to situations where patient outcomes are observed soon after enrollment. This is a major barrier to the use of such designs in practice, since for many diseases the outcome of most clinical importance does not occur shortly after enrollment. We propose a new class of adaptive enrichment designs for delayed endpoints. At each analysis, semiparametric, locally efficient estimators leverage information in baseline variables and short-term outcomes to improve precision. This can reduce the sample size required to achieve a desired power. We propose new multiple testing procedures tailored to this problem, which we prove to strongly control the family-wise Type I error rate, asymptotically. These methods are illustrated through simulations of a trial for a new surgical intervention for stroke.
]]>
Michael Rosenblum et al.INEQUALITY IN TREATMENT BENEFITS: CAN WE DETERMINE IF A NEW TREATMENT BENEFITS THE MANY OR THE FEW?
http://biostats.bepress.com/jhubiostat/paper274
http://biostats.bepress.com/jhubiostat/paper274Fri, 13 Mar 2015 10:07:18 PDT
The primary analysis in many randomized controlled trials focuses on the average treatment effect and does not address whether treatment benefits are widespread or limited to a select few. This problem affects many disease areas, since it stems from how randomized trials, often the gold standard for evaluating treatments, are designed and analyzed. Our goal is to estimate the fraction who benefit from a treatment, based on randomized trial data. We consider cases where the primary outcome is continuous, discrete, or ordinal. In general, the fraction who benefit is a non-identifiable parameter, and the best that can be obtained are sharp lower and upper bounds on it. We develop a method to estimate these bounds using a novel application of linear programming, which allows fast implementation. MATLAB software is provided. The method can incorporate information from prognostic baseline variables in order to improve precision, without requiring parametric model assumptions. Also, assumptions based on subject matter knowledge can be incorporated to improve the bounds. We apply our general method to estimate lower and upper bounds on the fraction who benefit from a new surgical intervention for stroke.
]]>
Emily Huang et al.OPTIMAL, TWO STAGE, ADAPTIVE ENRICHMENT DESIGNS FOR RANDOMIZED TRIALS USING SPARSE LINEAR PROGRAMMING
http://biostats.bepress.com/jhubiostat/paper273
http://biostats.bepress.com/jhubiostat/paper273Mon, 15 Dec 2014 11:44:33 PST
Adaptive enrichment designs involve preplanned rules for modifying enrollment criteria based on accruing data in a randomized trial. Such designs have been proposed, for example, when the population of interest consists of biomarker positive and biomarker negative individuals. The goal is to learn which populations benefit from an experimental treatment. Two critical components of adaptive enrichment designs are the decision rule for modifying enrollment, and the multiple testing procedure. We provide the first general method for simultaneously optimizing both of these components for two stage, adaptive enrichment designs. We minimize expected sample size under constraints on power and the familywise Type I error rate. It is computationally infeasible to directly solve this optimization problem since it is not convex. The key to our approach is a novel representation of a discretized version of this optimization problem as a sparse linear program. We apply advanced optimization methods to solve this problem to high accuracy, revealing new, approximately optimal designs.
]]>
Michael Rosenblum et al.CROSS-DESIGN SYNTHESIS FOR EXTENDING THE APPLICABILITY OF TRIAL EVIDENCE WHEN TREATMENT EFFECT IS HETEROGENEOUS. PART II. APPLICATION AND EXTERNAL VALIDATION
http://biostats.bepress.com/jhubiostat/paper272
http://biostats.bepress.com/jhubiostat/paper272Wed, 05 Nov 2014 10:51:24 PST
Randomized controlled trials (RCTs) generally provide the most reliable evidence. When participants in RCTs are selected with respect to characteristics that are potential treatment effect modifiers, the average treatment effect from the trials may not be applicable to a specific target population. We present a new method to project the treatment effect from a RCT to a target group that is inadequately represented in the trial when there is heterogeneity in the treatment effect (HTE). The method integrates RCT and observational data through cross-design synthesis. An essential component is to identify HTE and a calibration factor for unmeasured confounding for the observational study relative to the RCT. The estimate of treatment effect adjusted for unmeasured confounding is projected onto the target sample using G-computation with standardization weights. We call the method Calibrated Risk-Adjusted Modeling (CRAM) and apply it to estimate the effect of angiotensin converting enzyme inhibition to prevent heart failure hospitalization or death. External validation shows that when there is adequate overlap between the RCT and the target sample, risk-based standardization is less biased than CRAM. However, when there is poor overlap between the trial and the target sample, CRAM provides superior estimates of treatment effect.
]]>
Carlos Weiss et al.CROSS-DESIGN SYNTHESIS FOR EXTENDING THE APPLICABILITY OF TRIAL EVIDENCE WHEN TREATMENT EFFECT IS HETEROGENEOUS-I. METHODOLOGY
http://biostats.bepress.com/jhubiostat/paper271
http://biostats.bepress.com/jhubiostat/paper271Wed, 05 Nov 2014 10:41:00 PST
Randomized controlled trials (RCTs) provide reliable evidence for approval of new treatments, informing clinical practice, and coverage decisions. The participants in RCTs are often not a representative sample of the larger at-risk population. Hence it is argued that the average treatment effect from the trial is not generalizable to the larger at-risk population. An essential premise of this argument is that there is significant heterogeneity in the treatment effect (HTE). We present a new method to extrapolate the treatment effect from a trial to a target group that is inadequately represented in the trial, when HTE is present. Our method integrates trial and observational data (cross-design synthesis). The target group is assumed to be well-represented in the observational database. An essential component of the methodology is the estimation of calibration adjustments for unmeasured confounding in the observational sample. The estimate of treatment effect, adjusted for unmeasured confounding, is projected onto the target sample using a weighted G-computation approach. We present simulation studies to demonstrate the methodology for estimating the marginal treatment effect in a target sample that differs from the trial sample to varying degrees. In a companion paper, we demonstrate and validate the methodology in a clinical application.
]]>
Ravi Varadhan et al.ENHANCED PRECISION IN THE ANALYSIS OF RANDOMIZED TRIALS WITH ORDINAL OUTCOMES
http://biostats.bepress.com/jhubiostat/paper270
http://biostats.bepress.com/jhubiostat/paper270Wed, 22 Oct 2014 10:36:19 PDT
We present a general method for estimating the effect of a treatment on an ordinal outcome in randomized trials. The method is robust in that it does not rely on the proportional odds assumption. Our estimator leverages information in prognostic baseline variables, and has all of the following properties: (i) it is consistent; (ii) it is locally efficient; (iii) it is guaranteed to match or improve the precision of the standard, unadjusted estimator. To the best of our knowledge, this is the first estimator of the causal relation between a treatment and an ordinal outcome to satisfy these properties. We demonstrate the estimator in simulations based on resampling from a completed randomized clinical trial of a new treatment for stroke; we show potential gains of up to 39\% in relative efficiency compared to the unadjusted estimator. The proposed estimator could be a useful tool for analyzing randomized trials with ordinal outcomes, since existing methods either rely on model assumptions that are untenable in many practical applications, or lack the efficiency properties of the proposed estimator. We provide R code implementing the estimator.
]]>
Iván Díaz et al.APPLYING MULTIPLE IMPUTATION FOR EXTERNAL CALIBRATION TO PROPENSTY SCORE ANALYSIS
http://biostats.bepress.com/jhubiostat/paper269
http://biostats.bepress.com/jhubiostat/paper269Fri, 26 Sep 2014 13:26:30 PDT
Although covariate measurement error is likely the norm rather than the exception, methods for handling covariate measurement error in propensity score methods have not been widely investigated. We consider a multiple imputation-based approach that uses an external calibration sample with information on the true and mismeasured covariates, Multiple Imputation for External Calibration (MI-EC), to correct for the measurement error, and investigate its performance using simulation studies. As expected, using the covariate measured with error leads to bias in the treatment effect estimate. In contrast, the MI-EC method can eliminate almost all the bias. We confirm that the outcome must be used in the imputation process to obtain good results, a finding related to the idea of congenial imputation and analysis in the broader multiple imputation literature. We illustrate the MI-EC approach using a motivating example estimating the effects of living in a disadvantaged neighborhood on mental health and substance use outcomes among adolescents. These results show that estimating the propensity score using covariates measured with error leads to biased estimates of treatment effects, but when a calibration data set is available, MI-EC can be used to help correct for such bias.
]]>
Yenny Webb-Vargas et al.A BAYESIAN APPROACH TO JOINT MODELING OF MENSTRUAL CYCLE LENGTH AND FECUNDITY
http://biostats.bepress.com/jhubiostat/paper268
http://biostats.bepress.com/jhubiostat/paper268Fri, 26 Sep 2014 09:49:48 PDT
Female menstrual cycle length is thought to play an important role in couple fecundity, or the biologic capacity for reproduction irrespective of pregnancy intentions. A complete assessment of the association between menstrual cycle length and fecundity requires a model that accounts for multiple risk factors (both male and female) and the couple's intercourse pattern relative to ovulation. We employ a Bayesian joint model consisting of a mixed effects accelerated failure time model for longitudinal menstrual cycle lengths and a hierarchical model for the conditional probability of pregnancy in a menstrual cycle given no pregnancy in previous cycles of trying, in which we include covariates for the male and the female and a flexible spline function of intercourse timing. Using our joint modeling approach to analyze data from the Longitudinal Investigation of Fertility and the Environment Study, a couple based prospective pregnancy study, we found a significant quadratic relation between menstrual cycle length and the probability of pregnancy even with adjustment for other risk factors, including male semen quality, age, and smoking status.
]]>
Kirsten J. Lum et al.Partially-Latent Class Models (pLCM) for Case-Control Studies of Childhood Pneumonia Etiology
http://biostats.bepress.com/jhubiostat/paper267
http://biostats.bepress.com/jhubiostat/paper267Wed, 27 Aug 2014 12:50:27 PDT
In population studies on the etiology of disease, one goal is the estimation of the fraction of cases attributable to each of several causes. For example, pneumonia is a clinical diagnosis of lung infection that may be caused by viral, bacterial, fungal, or other pathogens. The study of pneumonia etiology is challenging because directly sampling from the lung to identify the etiologic pathogen is not standard clinical practice in most settings. Instead, measurements from multiple peripheral specimens are made. This paper considers the problem of estimating the population etiology distribution and the individual etiology probabilities. We formulate the scientific problem in statistical terms as estimating the posterior distribution of mixing weights and latent class indicators under a partially-latent class model (pLCM) that combines heterogeneous measurements with different error rates obtained from a case-control study. We introduce the pLCM as an extension of the latent class model. We also introduce graphical displays of the population data and inferred latent-class frequencies. The methods are illustrated with simulated and real data sets. The paper closes with a brief description of extensions of the pLCM to the regression setting and to the case where conditional independence among the measures is relaxed.
]]>
Zhenke Wu et al.TARGETED MAXIMUM LIKELIHOOD ESTIMATION USING EXPONENTIAL FAMILIES
http://biostats.bepress.com/jhubiostat/paper266
http://biostats.bepress.com/jhubiostat/paper266Mon, 02 Jun 2014 10:14:08 PDT
Targeted maximum likelihood estimation (TMLE) is a general method for estimating parameters in semiparametric and nonparametric models. Each iteration of TMLE involves fitting a parametric submodel that targets the parameter of interest. We investigate the use of exponential families to define the parametric submodel. This implementation of TMLE gives a general approach for estimating any smooth parameter in the nonparametric model. A computational advantage of this approach is that each iteration of TMLE involves estimation of a parameter in an exponential family, which is a convex optimization problem for which software implementing reliable and computationally efficient methods exists. We illustrate the method in three estimation problems, involving the mean of an outcome missing at random, the parameter of a median regression model, and the causal effect of a continuous exposure, respectively. We conduct a simulation study comparing different choices for the parametric submodel, focusing on the first of these problems. To the best of our knowledge, this is the first study investigating robustness of TMLE to different specifications of the parametric submodel. We find that the choice of submodel can have an important impact on the behavior of the estimator in finite samples.
]]>
Iván Díaz et al.Estimating population treatment effects from a survey sub-sample
http://biostats.bepress.com/jhubiostat/paper265
http://biostats.bepress.com/jhubiostat/paper265Wed, 21 May 2014 08:20:23 PDT
We consider the problem of estimating an average treatment effect for a target population from a survey sub-sample. Our motivating example is generalizing a treatment effect estimated in a sub-sample of the National Comorbidity Survey Replication Adolescent Supplement to the population of U.S. adolescents. To address this problem, we evaluate easy-to-implement methods that account for both non-random treatment assignment and a non-random two-stage selection mechanism. We compare the performance of a Horvitz-Thompson estimator using inverse probability weighting (IPW) and two double robust estimators in a variety of scenarios. We demonstrate that the two double robust estimators generally outperform IPW in terms of mean-squared error even under misspecification of one of the treatment, selection, or outcome models. Moreover, the double robust estimators are easy to implement, providing an attractive alternative to IPW for applied epidemiologic researchers. We demonstrate how to apply these estimators to our motivating example.
]]>
Kara E. Rudolph et al.COX REGRESSION MODELS WITH FUNCTIONAL COVARIATES FOR SURVIVAL DATA
http://biostats.bepress.com/jhubiostat/paper264
http://biostats.bepress.com/jhubiostat/paper264Mon, 12 May 2014 10:27:34 PDT
We extend the Cox proportional hazards model to cases when the exposure is a densely sampled functional process, measured at baseline. The fundamental idea is to combine penalized signal regression with methods developed for mixed effects proportional hazards models. The model is fit by maximizing the penalized partial likelihood, with smoothing parameters estimated by a likelihood-based criterion such as AIC or EPIC. The model may be extended to allow for multiple functional predictors, time varying coefficients, and missing or unequally-spaced data. Methods were inspired by and applied to a study of the association between time to death after hospital discharge and daily measures of disease severity collected in the intensive care unit, among survivors of acute respiratory distress syndrome.
]]>
Jonathan E. Gellar et al.LEVERAGING PROGNOSTIC BASELINE VARIABLES TO GAIN PRECISION IN RANDOMIZED TRIALS
http://biostats.bepress.com/jhubiostat/paper263
http://biostats.bepress.com/jhubiostat/paper263Mon, 05 May 2014 12:39:07 PDT
We focus on estimating the average treatment effect in a randomized trial. If baseline variables are correlated with the outcome, then appropriately adjusting for these variables can improve precision. An example is the analysis of covariance (ANCOVA) estimator, which applies when the outcome is continuous, the quantity of interest is the difference in mean outcomes comparing treatment versus control, and a linear model with only main effects is used. ANCOVA is guaranteed to be at least as precise as the standard unadjusted estimator, asymptotically, under no parametric model assumptions, and also is locally, semiparametric efficient. Recently, several estimators have been developed that extend these desirable properties to more general settings that allow: any real-valued outcome (e.g., binary or count), contrasts other than the difference in mean outcomes (such as the relative risk), and estimators based on a large class of generalized linear models (including logistic regression). To the best of our knowledge, we give the first simulation study in the context of randomized trials that compares these estimators. Furthermore, our simulations are not based on parametric models; instead, our simulations are based on resampling data from completed randomized trials in stroke and HIV in order to assess estimator performance in realistic scenarios. We provide practical guidance on when these estimators are likely to provide substantial precision gains, and describe a quick assessment method that allows clinical investigators to determine whether these estimators could be useful in their specific trial contexts.
]]>
Elizabeth Colantuoni et al.INTERADAPT -- AN INTERACTIVE TOOL FOR DESIGNING AND EVALUATING RANDOMIZED TRIALS WITH ADAPTIVE ENROLLMENT CRITERIA
http://biostats.bepress.com/jhubiostat/paper262
http://biostats.bepress.com/jhubiostat/paper262Fri, 14 Mar 2014 12:09:58 PDT
The interAdapt R package is designed to be used by statisticians and clinical investigators to plan randomized trials. It can be used to determine if certain adaptive designs offer tangible benefits compared to standard designs, in the context of investigators’ specific trial goals and constraints. Specifically, interAdapt compares the performance of trial designs with adaptive enrollment criteria versus standard (non-adaptive) group sequential trial designs. Performance is compared in terms of power, expected trial duration, and expected sample size. Users can either work directly in the R console, or with a user-friendly shiny application that requires no programming experience. Several added features are available when using the shiny application. For example, the application allows users to immediately download the results of the performance comparison as a csv-table, or as a printable, html-based report.
]]>
Aaron Joel Fisher et al.VARIABLE-DOMAIN FUNCTIONAL REGRESSION FOR MODELING ICU DATA
http://biostats.bepress.com/jhubiostat/paper261
http://biostats.bepress.com/jhubiostat/paper261Wed, 05 Feb 2014 09:18:44 PST
We introduce a class of scalar-on-function regression models with subject-specific functional predictor domains. The fundamental idea is to consider a bivariate functional parameter that depends both on the functional argument and on the width of the functional predictor domain. Both parametric and nonparametric models are introduced to fit the functional coefficient. The nonparametric model is theoretically and practically invariant to functional support transformation, or support registration. Methods were motivated by and applied to a study of association between daily measures of the Intensive Care Unit (ICU) Sequential Organ Failure Assessment (SOFA) score and two outcomes: in-hospital mortality, and physical impairment at hospital discharge among survivors. Methods are generally applicable to a large number of new studies that record a continuous variables over unequal domains.
]]>
Jonathan E. Gellar et al.ADAPTIVE RANDOMIZED TRIAL DESIGNS THAT CANNOT BE DOMINATED BY ANY STANDARD DESIGN AT THE SAME TOTAL SAMPLE SIZE
http://biostats.bepress.com/jhubiostat/paper260
http://biostats.bepress.com/jhubiostat/paper260Fri, 31 Jan 2014 13:12:05 PST
Prior work has shown that certain types of adaptive designs can always be dominated by a suitably chosen, standard, group sequential design. This applies to adaptive designs with rules for modifying the total sample size. A natural question is whether analogous results hold for other types of adaptive designs. We focus on adaptive enrichment designs, which involve preplanned rules for modifying enrollment criteria based on accrued data in a randomized trial. Such designs often involve multiple hypotheses, e.g., one for the total population and one for a predefined subpopulation, such as those with high disease severity at baseline. We fix the total sample size, and consider overall power, defined as the probability of rejecting at least one false null hypothesis. We present adaptive enrichment designs whose overall power at two alternatives cannot simultaneously be matched by any standard design. In some scenarios there is a substantial gap between the overall power achieved by these adaptive designs and that of any standard design. We also prove that such gains in overall power come at a cost. To attain overall power above what is achievable by certain standard designs, it is necessary to increase power to reject some hypotheses and reduce power to reject others. We conclude by showing the class of adaptive enrichment designs allows certain power tradeoffs that are not available when restricting to standard designs. We illustrate our results in the context of planning a hypothetical, randomized trial of a new antidepressant, using data distributions from (Kirsch et al., 2008).
]]>
Michael RosenblumJoint Estimation of Multiple Graphical Models from High Dimensional Time Series
http://biostats.bepress.com/jhubiostat/paper259
http://biostats.bepress.com/jhubiostat/paper259Thu, 26 Dec 2013 07:09:18 PST
In this manuscript the problem of jointly estimating multiple graphical models in high dimensions is considered. It is assumed that the data are collected from n subjects, each of which consists of m non-independent observations. The graphical models of subjects vary, but are assumed to change smoothly corresponding to a measure of the closeness between subjects. A kernel based method for jointly estimating all graphical models is proposed. Theoretically, under a double asymptotic framework, where both (m,n) and the dimension d can increase, the explicit rate of convergence in parameter estimation is provided, thus characterizing the strength one can borrow across different individuals and impact of data dependence on parameter estimation. Empirically, experiments on both synthetic and real resting state functional magnetic resonance imaging (rs-fMRI) data illustrate the effectiveness of the proposed method.
]]>
Huitong Qiu et al.Sparse Median Graphs Estimation in a High Dimensional Semiparametric Model
http://biostats.bepress.com/jhubiostat/paper258
http://biostats.bepress.com/jhubiostat/paper258Thu, 26 Dec 2013 07:05:22 PST
In this manuscript a unified framework for conducting inference on complex aggregated data in high dimensional settings is proposed. The data are assumed to be a collection of multiple non-Gaussian realizations with underlying undirected graphical structures. Utilizing the concept of median graphs in summarizing the commonality across these graphical structures, a novel semiparametric approach to modeling such complex aggregated data is provided along with robust estimation of the median graph, which is assumed to be sparse. The estimator is proved to be consistent in graph recovery and an upper bound on the rate of convergence is given. Experiments on both synthetic and real datasets are conducted to illustrate the empirical usefulness of the proposed models and methods.
]]>
Fang Han et al.Soft Null Hypotheses: A Case Study of Image Enhancement Detection in Brain Lesions
http://biostats.bepress.com/jhubiostat/paper257
http://biostats.bepress.com/jhubiostat/paper257Wed, 26 Jun 2013 13:01:11 PDT
This work is motivated by a study of a population of multiple sclerosis (MS) patients using dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) to identify active brain lesions. At each visit, a contrast agent is administered intravenously to a subject and a series of images is acquired to reveal the location and activity of MS lesions within the brain. Our goal is to identify and quantify lesion enhancement location at the subject level and lesion enhancement patterns at the population level. With this example, we aim to address the difficult problem of transforming a qualitative scientific null hypothesis, such as "this voxel does not enhance", to a well-defined and numerically testable null hypothesis based on existing data. We call the procedure "soft null hypothesis" testing as opposed to the standard "hard null hypothesis" testing. This problem is fundamentally different from: 1) testing when a quantitative null hypothesis is given; 2) clustering using a mixture distribution; or 3) identifying a reasonable threshold with a parametric null assumption. We analyze a total of 20 subjects scanned at 63 visits (~30Gb), the largest population of such clinical brain images.
]]>
Haochang Shou et al.