Johns Hopkins University, Dept. of Biostatistics Working PapersCopyright (c) 2017 Johns Hopkins University All rights reserved.
http://biostats.bepress.com/jhubiostat
Recent documents in Johns Hopkins University, Dept. of Biostatistics Working Papersen-usWed, 21 Jun 2017 08:53:43 PDT3600Constructing a Confidence Interval for the Fraction Who Benefit from Treatment, Using Randomized Trial Data
http://biostats.bepress.com/jhubiostat/paper287
http://biostats.bepress.com/jhubiostat/paper287Mon, 05 Jun 2017 11:25:39 PDT
The fraction who benefit from treatment is defined as the proportion of patients whose potential outcome under treatment is better than that under control. Statistical inference for this parameter is challenging since it is only partially identifiable, even in our context of a randomized trial. We propose and evaluate a new method for constructing a confidence interval for the fraction who benefit, when the outcome is ordinal-valued (with binary outcomes as a special case). This confidence interval procedure is proved to be pointwise consistent. Our method does not require any assumptions about the joint distribution of the potential outcomes, although it has the flexibility to incorporate a wide range of user-defined assumptions. A potential advantage of our approach is that, unlike existing confidence interval methods for partially identified parameters (such as m-out-of-n bootstrap and subsampling), we do not need to select m or the subsample size, which is generally a challenging problem. Our method is based on a stochastic optimization technique involving a second order, asymptotic approximation that, to the best of our knowledge, has not been applied to biomedical studies. This approximation leads to statistics that are solutions to quadratic programs, and so they can be computed efficiently using existing optimization tools. In all of our simulations, our method attains the nominal coverage probability or higher, and can have substantially narrower average width compared to the m-out-of-n bootstrap. We also apply our method to a completed trial data set of a new surgical intervention for severe stroke.
]]>
Emily J. Huang et al.ESTIMATING AUTOANTIBODY SIGNATURES TO DETECT AUTOIMMUNE DISEASE PATIENT SUBSETS
http://biostats.bepress.com/jhubiostat/paper286
http://biostats.bepress.com/jhubiostat/paper286Wed, 19 Apr 2017 13:29:16 PDT
Autoimmune diseases are characterized by highly specific immune responses against molecules in self-tissues. Different autoimmune diseases are characterized by distinct immune responses, making autoantibodies useful for diagnosis and prediction. In many diseases, the targets of autoantibodies are incompletely defined. Although the technologies for autoantibody discovery have advanced dramatically over the past decade, each of these techniques generates hundreds of possibilities, which are onerous and expensive to validate. We set out to establish a method to greatly simplify autoantibody discovery, using a pre-filtering step to define subgroups with similar specificities based on migration of labeled, immunoprecipitated proteins on sodium dodecyl sulfate (SDS) gels and autoradiography [Gel Electrophoresis and band detection on Autoradiograms (GEA)]. Human recognition of patterns is not optimal when the patterns are complex or scattered across many samples. Multiple sources of errors - including irrelevant intensity differences and warping of gels - have challenged automation of pattern discovery from autoradiograms. In this paper, we address these limitations using a Bayesian hierarchical model with shrinkage priors for pattern alignment and spatial dewarping. The Bayesian model combines information from multiple gel sets and corrects spatial warping for coherent estimation of autoantibody signatures defined by presence or absence of a grid of landmark proteins. We show the preprocessing creates better separated clusters and improves the accuracy of autoantibody subset detection via hierarchical clustering. Finally, we demonstrate the utility of the proposed methods with GEA data from scleroderma patients.
]]>
Zhenke Wu et al.IMPROVING POWER IN GROUP SEQUENTIAL, RANDOMIZED TRIALS BY ADJUSTING FOR PROGNOSTIC BASELINE VARIABLES AND SHORT-TERM OUTCOMES
http://biostats.bepress.com/jhubiostat/paper285
http://biostats.bepress.com/jhubiostat/paper285Fri, 10 Feb 2017 08:54:28 PST
In group sequential designs, adjusting for baseline variables and short-term outcomes can lead to increased power and reduced sample size. We derive formulas for the precision gain from such variable adjustment using semiparametric estimators for the average treatment effect, and give new results on what conditions lead to substantial power gains and sample size reductions. The formulas reveal how the impact of prognostic variables on the precision gain is modified by the number of pipeline participants, analysis timing, enrollment rate, and treatment effect heterogeneity, when the semiparametric estimator uses correctly specified models. Given set prognostic value of baseline variables and short-term outcomes within each arm, the precision gain is maximal when there is no treatment effect heterogeneity. In contrast, a purely predictive baseline variable, which only explains treatment effect heterogeneity but is marginally uncorrelated with the outcome, can lead to no precision gain. The theory is supported by simulation studies based on data from a trial of a new surgical intervention for treating stroke.
]]>
Tianchen Qian et al.IT'S ALL ABOUT BALANCE: PROPENSITY SCORE MATCHING IN THE CONTEXT OF COMPLEX SURVEY DATA
http://biostats.bepress.com/jhubiostat/paper284
http://biostats.bepress.com/jhubiostat/paper284Wed, 08 Feb 2017 14:12:32 PST
Many research studies aim to draw causal inferences using data from large, nationally representative survey samples, and many of these studies use propensity score matching to make those causal inferences as rigorous as possible given the non-experimental nature of the data. However, very few applied studies are careful about incorporating the survey design with the propensity score analysis, which may mean that the results don’t generate population inferences. This may be because few methodological studies examine how to best combine these methods. Furthermore, even fewer of the methodological studies incorporate different non-response mechanisms in their analysis. This study examines methods for how to handle survey weights in propensity score matching analyses of survey data, under diferent non-response mechanisms. Based on the results from Monte Carlo simulations implemented on synthetic data as well as a data based application we developed suggestions regarding the implementation of propensity score methods to make causal inferences relevant to the target population of a sample survey. Our main conclusions are: (1) whether the survey weights are incorporated in the estimation of the propensity score does not impact estimation of the population treatment effect, as long as good population balance is achieved across confounders, (2) survey weights must be taken into account in the outcome analysis and (3) transfer of survey weights (i.e., matched comparison units are assigned the sampling weight of the treated unit they have been matched to) can be benefcial under certain non-response mechanisms.
]]>
David Lenis et al.Using Sensitivity Analyses for Unobserved Confounding to Address Covariate Measurement Error in Propensity Score Methods
http://biostats.bepress.com/jhubiostat/paper283
http://biostats.bepress.com/jhubiostat/paper283Fri, 18 Nov 2016 10:48:58 PST
Propensity score methods are a popular tool to control for confounding in observational data, but their bias-reduction properties are threatened by covariate measurement error. There are few easy-to-implement methods to correct for such bias. We describe and demonstrate how existing sensitivity analyses for unobserved confounding---propensity score calibration, Vanderweele and Arah's bias formulas, and Rosenbaum's sensitivity analysis---can be adapted to address this problem. In a simulation study, we examined the extent to which these sensitivity analyses can correct for several measurement error structures: classical, systematic differential, and heteroscedastic covariate measurement error. We then apply these approaches to address covariate measurement error in estimating the association between depression and weight gain in a cohort of adults in Baltimore City. We recommend the use of Vanderweele and Arah's bias formulas and propensity score calibration (assuming it is adapted appropriately for the measurement error structure), as both approaches perform well for a variety of propensity score estimators and measurement error structures.
]]>
Kara E. Rudolph et al.Censoring Unbiased Regression Trees and Ensembles
http://biostats.bepress.com/jhubiostat/paper282
http://biostats.bepress.com/jhubiostat/paper282Mon, 31 Oct 2016 09:41:49 PDT
This paper proposes a novel approach to building regression trees and ensemble learning in survival analysis. By first extending the theory of censoring unbiased transformations, we construct observed data estimators of full data loss functions in cases where responses can be right censored. This theory is used to construct two specific classes of methods for building regression trees and regression ensembles that respectively make use of Buckley-James and doubly robust estimating equations for a given full data risk function. For the particular case of squared error loss, we further show how to implement these algorithms using existing software (e.g., CART, random forests) by making use of a related form of response imputation. Comparisons of these methods to existing ensemble procedures for predicting survival probabilities are provided in both simulated settings and through applications to four datasets. It is shown that these new methods either improve upon, or remain competitive with, existing implementations of random survival forests, conditional inference forests, and recursively imputed survival trees.
]]>
Jon Arni Steingrimsson et al.Matching the Efficiency Gains of the Logistic Regression Estimator While Avoiding its Interpretability Problems, in Randomized Trials
http://biostats.bepress.com/jhubiostat/paper281
http://biostats.bepress.com/jhubiostat/paper281Fri, 26 Aug 2016 09:55:22 PDT
Adjusting for prognostic baseline variables can lead to improved power in randomized trials. For binary outcomes, a logistic regression estimator is commonly used for such adjustment. This has resulted in substantial efficiency gains in practice, e.g., gains equivalent to reducing the required sample size by 20-28% were observed in a recent survey of traumatic brain injury trials. Robinson and Jewell (1991) proved that the logistic regression estimator is guaranteed to have equal or better asymptotic efficiency compared to the unadjusted estimator (which ignores baseline variables). Unfortunately, the logistic regression estimator has the following dangerous vulnerabilities: it is only interpretable when the treatment effect is identical within every stratum of baseline covariates; also, it is inconsistent under model misspecification, which is virtually guaranteed when the baseline covariates are continuous or categorical with many levels. An open problem was whether there exists an equally powerful, covariate-adjusted estimator with no such vulnerabilities, i.e., one that (i) is interpretable and consistent without requiring any model assumptions, and (ii) matches the efficiency gains of the logistic regression estimator. Such an estimator would provide the best of both worlds: interpretability and consistency under no model assumptions (like the unadjusted estimator) and power gains from covariate adjustment (that match the logistic regression estimator). We prove a new asymptotic result showing that, surprisingly, there are simple estimators satisfying the above properties. We argue that these rarely used estimators have substantial advantages over the more commonly used logistic regression estimator for covariate adjustment in randomized trials with binary outcomes. Though our focus is binary outcomes and logistic regression models, our results extend to a large class of generalized linear models.
]]>
Michael Rosenblum et al.IMPROVING PRECISION BY ADJUSTING FOR BASELINE VARIABLES IN RANDOMIZED TRIALS WITH BINARY OUTCOMES, WITHOUT REGRESSION MODEL ASSUMPTIONS
http://biostats.bepress.com/jhubiostat/paper280
http://biostats.bepress.com/jhubiostat/paper280Wed, 22 Jun 2016 10:45:50 PDT
In randomized clinical trials with baseline variables that are prognostic for the primary outcome, there is potential to improve precision and reduce sample size by appropriately adjusting for these variables. A major challenge is that there are multiple statistical methods to adjust for baseline variables, but little guidance on which is best to use in a given context. The choice of method can have important consequences. For example, one commonly used method leads to uninterpretable estimates if there is any treatment effect heterogeneity, which would jeopardize the validity of trial conclusions. We give practical guidance on how to avoid this problem, while retaining the advantages of covariate adjustment. This can be achieved by using simple (but less well-known) standardization methods from the recent statistics literature. We discuss these methods and give software in R and Stata implementing them. A data example from a recent stroke trial is used to illustrate these methods.
]]>
Jon Arni Steingrimsson et al.STOCHASTIC OPTIMIZATION OF ADAPTIVE ENRICHMENT DESIGNS FOR TWO SUBPOPULATIONS
http://biostats.bepress.com/jhubiostat/paper279
http://biostats.bepress.com/jhubiostat/paper279Fri, 29 Apr 2016 13:06:43 PDT
An adaptive enrichment design is a randomized trial that allows enrollment criteria to be modified at interim analyses, based on a preset decision rule. When there is prior uncertainty regarding treatment effect heterogeneity, these trial designs can provide improved power for detecting treatment effects in subpopulations. We present a simulated annealing approach to search over the space of decision rules and other parameters for an adaptive enrichment design. The goal is to minimize the expected number enrolled or expected duration, while preserving the appropriate power and Type I error rate. We also explore the benefits of parallel computation in the context of this goal. We find that optimized designs can be substantially more efficient than simpler designs using Pocock or O'Brien-Fleming boundaries.
]]>
Aaron Fisher et al.SENSITIVITY OF TRIAL PERFORMANCE TO DELAY OUTCOMES, ACCRUAL RATES, AND PROGNOSTIC VARIABLES BASED ON A SIMULATED RANDOMIZED TRIAL WITH ADAPTIVE ENRICHMENT
http://biostats.bepress.com/jhubiostat/paper277
http://biostats.bepress.com/jhubiostat/paper277Thu, 20 Aug 2015 13:47:30 PDT
Adaptive enrichment designs involve rules for restricting enrollment to a subset of the population during the course of an ongoing trial. This can be used to target those who benefit from the experimental treatment. To leverage prognostic information in baseline variables and short-term outcomes, we use a semiparametric, locally efficient estimator, and investigate its strengths and limitations compared to standard estimators. Through simulation studies, we assess how sensitive the trial performance (Type I error, power, expected sample size, trial duration) is to different design characteristics. Our simulation distributions mimic features of data from the Alzheimer’s Disease Neuroimaging Initiative, and involve two subpopulations of interest based on a generic marker. We investigate the impact of the following design characteristics: the accrual rate, the delay time between enrollment and observation of the primary outcome, and the prognostic value of baseline variables and short-term outcomes. We apply information-based monitoring, and evaluate how accurately information can be estimated in an ongoing trial.
]]>
Tiachen Qian et al.NESTED PARTIALLY-LATENT, CLASS MODELS FOR DEPENDENT BINARY DATA, ESTIMATING DISEASE ETIOLOGY
http://biostats.bepress.com/jhubiostat/paper276
http://biostats.bepress.com/jhubiostat/paper276Fri, 24 Apr 2015 14:00:08 PDT
The Pneumonia Etiology Research for Child Health (PERCH) study seeks to use modern measurement technology to infer the causes of pneumonia for which gold-standard evidence is unavailable. The paper describes a latent variable model designed to infer from case-control data the etiology distribution for the population of cases, and for an individual case given his or her measurements. We assume each observation is drawn from a mixture model for which each component represents one cause or disease class. The model addresses a major limitation of the traditional latent class approach by taking account of residual dependence among multivariate binary outcome given disease class, hence reduces estimation bias, retains efficiency and offers more valid inference. Such "local dependence" on a single subject is induced in the model by nesting latent subclasses within each disease class. Measurement precision and covariation can be estimated using the control sample for whom the class is known. In a Bayesian framework, we use stick-breaking priors on the subclass indicators for model-averaged inference across different numbers of subclasses. Assessment of model fit and individual diagnosis are done using posterior samples drawn by Gibbs sampling. We demonstrate the utility of the method on simulated and on the motivating PERCH data.
]]>
Zhenke Wu et al.ADAPTIVE ENRICHMENT DESIGNS FOR RANDOMIZED TRIALS WITH DELAYED ENDPOINTS, USING LOCALLY EFFICIENT ESTIMATORS TO IMPROVE PRECISION
http://biostats.bepress.com/jhubiostat/paper275
http://biostats.bepress.com/jhubiostat/paper275Fri, 24 Apr 2015 14:00:07 PDT
Adaptive enrichment designs involve preplanned rules for modifying enrollment criteria based on accrued data in an ongoing trial. For example, enrollment of a subpopulation where there is sufficient evidence of treatment efficacy, futility, or harm could be stopped, while enrollment for the remaining subpopulations is continued. Most existing methods for constructing adaptive enrichment designs are limited to situations where patient outcomes are observed soon after enrollment. This is a major barrier to the use of such designs in practice, since for many diseases the outcome of most clinical importance does not occur shortly after enrollment. We propose a new class of adaptive enrichment designs for delayed endpoints. At each analysis, semiparametric, locally efficient estimators leverage information in baseline variables and short-term outcomes to improve precision. This can reduce the sample size required to achieve a desired power. We propose new multiple testing procedures tailored to this problem, which we prove to strongly control the family-wise Type I error rate, asymptotically. These methods are illustrated through simulations of a trial for a new surgical intervention for stroke.
]]>
Michael Rosenblum et al.INEQUALITY IN TREATMENT BENEFITS: CAN WE DETERMINE IF A NEW TREATMENT BENEFITS THE MANY OR THE FEW?
http://biostats.bepress.com/jhubiostat/paper274
http://biostats.bepress.com/jhubiostat/paper274Fri, 13 Mar 2015 10:07:18 PDT
The primary analysis in many randomized controlled trials focuses on the average treatment effect and does not address whether treatment benefits are widespread or limited to a select few. This problem affects many disease areas, since it stems from how randomized trials, often the gold standard for evaluating treatments, are designed and analyzed. Our goal is to learn about the fraction who benefit from a treatment, based on randomized trial data. We consider the case where the outcome is ordinal, with binary outcomes as a special case. In general, the fraction who benefit is a non-identifiable parameter, and the best that can be obtained are sharp lower and upper bounds on it. Our main contributions include (i) showing that the naive (plug-in) estimator of the bounds can be inconsistent, in the case that support restrictions are made on the joint distribution of the potential outcomes (such as the no harm assumption); (ii) developing the first consistent estimator for this case; (iii) applying this estimator to a randomized trial dataset of a medical treatment to determine whether the estimates can be informative. Our estimator is computed using linear programming, allowing fast implementation. R and MATLAB software are provided (https://github.com/emhuang1/fraction-who-benefit).
]]>
Emily Huang et al.OPTIMAL, TWO STAGE, ADAPTIVE ENRICHMENT DESIGNS FOR RANDOMIZED TRIALS USING SPARSE LINEAR PROGRAMMING
http://biostats.bepress.com/jhubiostat/paper273
http://biostats.bepress.com/jhubiostat/paper273Mon, 15 Dec 2014 11:44:33 PST
Adaptive enrichment designs involve preplanned rules for modifying enrollment criteria based on accruing data in a randomized trial. We focus on designs where the overall population is partitioned into two predefined subpopulations, e.g., based on a biomarker or risk score measured at baseline. The goal is to learn which populations benefit from an experimental treatment. Two critical components of adaptive enrichment designs are the decision rule for modifying enrollment, and the multiple testing procedure. We provide a general method for simultaneously optimizing these components for two stage, adaptive enrichment designs. We minimize the expected sample size under constraints on power and the familywise Type I error rate. It is computationally infeasible to directly solve this optimization problem due to its nonconvexity. The key to our approach is a novel, discrete representation of this optimization problem as a sparse linear program, which is large but computationally feasible to solve using modern optimization techniques. Applications of our approach produce new, approximately optimal designs.
]]>
Michael Rosenblum et al.CROSS-DESIGN SYNTHESIS FOR EXTENDING THE APPLICABILITY OF TRIAL EVIDENCE WHEN TREATMENT EFFECT IS HETEROGENEOUS. PART II. APPLICATION AND EXTERNAL VALIDATION
http://biostats.bepress.com/jhubiostat/paper272
http://biostats.bepress.com/jhubiostat/paper272Wed, 05 Nov 2014 10:51:24 PST
Randomized controlled trials (RCTs) generally provide the most reliable evidence. When participants in RCTs are selected with respect to characteristics that are potential treatment effect modifiers, the average treatment effect from the trials may not be applicable to a specific target population. We present a new method to project the treatment effect from a RCT to a target group that is inadequately represented in the trial when there is heterogeneity in the treatment effect (HTE). The method integrates RCT and observational data through cross-design synthesis. An essential component is to identify HTE and a calibration factor for unmeasured confounding for the observational study relative to the RCT. The estimate of treatment effect adjusted for unmeasured confounding is projected onto the target sample using G-computation with standardization weights. We call the method Calibrated Risk-Adjusted Modeling (CRAM) and apply it to estimate the effect of angiotensin converting enzyme inhibition to prevent heart failure hospitalization or death. External validation shows that when there is adequate overlap between the RCT and the target sample, risk-based standardization is less biased than CRAM. However, when there is poor overlap between the trial and the target sample, CRAM provides superior estimates of treatment effect.
]]>
Carlos Weiss et al.CROSS-DESIGN SYNTHESIS FOR EXTENDING THE APPLICABILITY OF TRIAL EVIDENCE WHEN TREATMENT EFFECT IS HETEROGENEOUS-I. METHODOLOGY
http://biostats.bepress.com/jhubiostat/paper271
http://biostats.bepress.com/jhubiostat/paper271Wed, 05 Nov 2014 10:41:00 PST
Randomized controlled trials (RCTs) provide reliable evidence for approval of new treatments, informing clinical practice, and coverage decisions. The participants in RCTs are often not a representative sample of the larger at-risk population. Hence it is argued that the average treatment effect from the trial is not generalizable to the larger at-risk population. An essential premise of this argument is that there is significant heterogeneity in the treatment effect (HTE). We present a new method to extrapolate the treatment effect from a trial to a target group that is inadequately represented in the trial, when HTE is present. Our method integrates trial and observational data (cross-design synthesis). The target group is assumed to be well-represented in the observational database. An essential component of the methodology is the estimation of calibration adjustments for unmeasured confounding in the observational sample. The estimate of treatment effect, adjusted for unmeasured confounding, is projected onto the target sample using a weighted G-computation approach. We present simulation studies to demonstrate the methodology for estimating the marginal treatment effect in a target sample that differs from the trial sample to varying degrees. In a companion paper, we demonstrate and validate the methodology in a clinical application.
]]>
Ravi Varadhan et al.ENHANCED PRECISION IN THE ANALYSIS OF RANDOMIZED TRIALS WITH ORDINAL OUTCOMES
http://biostats.bepress.com/jhubiostat/paper270
http://biostats.bepress.com/jhubiostat/paper270Wed, 22 Oct 2014 10:36:19 PDT
We present a general method for estimating the effect of a treatment on an ordinal outcome in randomized trials. The method is robust in that it does not rely on the proportional odds assumption. Our estimator leverages information in prognostic baseline variables, and has all of the following properties: (i) it is consistent; (ii) it is locally efficient; (iii) it is guaranteed to match or improve the precision of the standard, unadjusted estimator. To the best of our knowledge, this is the first estimator of the causal relation between a treatment and an ordinal outcome to satisfy these properties. We demonstrate the estimator in simulations based on resampling from a completed randomized clinical trial of a new treatment for stroke; we show potential gains of up to 39\% in relative efficiency compared to the unadjusted estimator. The proposed estimator could be a useful tool for analyzing randomized trials with ordinal outcomes, since existing methods either rely on model assumptions that are untenable in many practical applications, or lack the efficiency properties of the proposed estimator. We provide R code implementing the estimator.
]]>
Iván Díaz et al.APPLYING MULTIPLE IMPUTATION FOR EXTERNAL CALIBRATION TO PROPENSTY SCORE ANALYSIS
http://biostats.bepress.com/jhubiostat/paper269
http://biostats.bepress.com/jhubiostat/paper269Fri, 26 Sep 2014 13:26:30 PDT
Although covariate measurement error is likely the norm rather than the exception, methods for handling covariate measurement error in propensity score methods have not been widely investigated. We consider a multiple imputation-based approach that uses an external calibration sample with information on the true and mismeasured covariates, Multiple Imputation for External Calibration (MI-EC), to correct for the measurement error, and investigate its performance using simulation studies. As expected, using the covariate measured with error leads to bias in the treatment effect estimate. In contrast, the MI-EC method can eliminate almost all the bias. We confirm that the outcome must be used in the imputation process to obtain good results, a finding related to the idea of congenial imputation and analysis in the broader multiple imputation literature. We illustrate the MI-EC approach using a motivating example estimating the effects of living in a disadvantaged neighborhood on mental health and substance use outcomes among adolescents. These results show that estimating the propensity score using covariates measured with error leads to biased estimates of treatment effects, but when a calibration data set is available, MI-EC can be used to help correct for such bias.
]]>
Yenny Webb-Vargas et al.A BAYESIAN APPROACH TO JOINT MODELING OF MENSTRUAL CYCLE LENGTH AND FECUNDITY
http://biostats.bepress.com/jhubiostat/paper268
http://biostats.bepress.com/jhubiostat/paper268Fri, 26 Sep 2014 09:49:48 PDT
Female menstrual cycle length is thought to play an important role in couple fecundity, or the biologic capacity for reproduction irrespective of pregnancy intentions. A complete assessment of the association between menstrual cycle length and fecundity requires a model that accounts for multiple risk factors (both male and female) and the couple's intercourse pattern relative to ovulation. We employ a Bayesian joint model consisting of a mixed effects accelerated failure time model for longitudinal menstrual cycle lengths and a hierarchical model for the conditional probability of pregnancy in a menstrual cycle given no pregnancy in previous cycles of trying, in which we include covariates for the male and the female and a flexible spline function of intercourse timing. Using our joint modeling approach to analyze data from the Longitudinal Investigation of Fertility and the Environment Study, a couple based prospective pregnancy study, we found a significant quadratic relation between menstrual cycle length and the probability of pregnancy even with adjustment for other risk factors, including male semen quality, age, and smoking status.
]]>
Kirsten J. Lum et al.Partially-Latent Class Models (pLCM) for Case-Control Studies of Childhood Pneumonia Etiology
http://biostats.bepress.com/jhubiostat/paper267
http://biostats.bepress.com/jhubiostat/paper267Wed, 27 Aug 2014 12:50:27 PDT
In population studies on the etiology of disease, one goal is the estimation of the fraction of cases attributable to each of several causes. For example, pneumonia is a clinical diagnosis of lung infection that may be caused by viral, bacterial, fungal, or other pathogens. The study of pneumonia etiology is challenging because directly sampling from the lung to identify the etiologic pathogen is not standard clinical practice in most settings. Instead, measurements from multiple peripheral specimens are made. This paper considers the problem of estimating the population etiology distribution and the individual etiology probabilities. We formulate the scientific problem in statistical terms as estimating the posterior distribution of mixing weights and latent class indicators under a partially-latent class model (pLCM) that combines heterogeneous measurements with different error rates obtained from a case-control study. We introduce the pLCM as an extension of the latent class model. We also introduce graphical displays of the population data and inferred latent-class frequencies. The methods are illustrated with simulated and real data sets. The paper closes with a brief description of extensions of the pLCM to the regression setting and to the case where conditional independence among the measures is relaxed.
]]>
Zhenke Wu et al.