Collection of Biostatistics Research Archive

A simple and robust alternative to Bland-Altman method of assessing clinical agreement

Abhaya Indrayan Prof — Sat, 05 Nov 2022 15:05:45 PDT

Clinical agreement between two quantitative measurements on a group of subjects is generally assessed with the help of the Bland-Altman (B-A) limits. These limits only describe the dispersion of disagreements in 95% cases and do not measure the degree of agreement. The interpretation regarding the presence or absence of agreement by this method is based on whether B-A limits are within the pre-specified externally determined clinical tolerance limits. Thus, clinical tolerance limits are necessary for this method. We argue in this communication that the direct use of clinical tolerance limits for assessing agreement without the B-A limits is more effective and has tremendous merits. This nonparametric approach is simple, is robust to the distribution pattern and outliers, has more flexibility, and exactly measures the degree of clinical agreement. This is explained with the help of two examples, including setups where clinical tolerance limits can be set up to follow varying trends if required in the clinical context – a feature not available in the B-A method.

Marginal Proportional Hazards Models for Clustered Interval-Censored Data with Time-Dependent Covariates

Kaitlyn Cook et al. — Mon, 21 Mar 2022 06:57:26 PDT

The Botswana Combination Prevention Project was a cluster-randomized HIV prevention trial whose follow-up period coincided with Botswana’s national adoption of a universal test-and-treat strategy for HIV management. Of interest is whether, and to what extent, this change in policy (i) modified the observed preventative effects of the study intervention and (ii) was associated with a reduction in the population-level incidence of HIV in Botswana. To address these questions, we propose a stratified proportional hazards model for clustered interval-censored data with time-dependent covariates and develop a composite expectation maximization algorithm that facilitates estimation of model parameters without placing parametric assumptions on either the baseline hazard functions or the within-cluster dependence structure. We show that the resulting estimators for the regression parameters are consistent and asymptotically normal. We also propose and provide theoretical justification for the use of the profile composite likelihood function to construct a robust sandwich estimator for the variance. We characterize the finite-sample performance and robustness of these estimators through extensive simulation studies. Finally, we conclude by applying this stratified proportional hazards model to a re-analysis of the Botswana Combination Prevention Project, with the national adoption of a universal test-and-treat strategy now modeled as a time-dependent covariate.

On assessing survival benefit of immunotherapy using long-term restricted mean survival time

Miki Horiguchi et al. — Thu, 27 Jan 2022 06:16:45 PST

The pattern of the difference between two survival curves we often observe in randomized clinical trials for evaluating immunotherapy is not proportional hazards; the treatment effect typically appears several months after the initiation of the treatment (i.e., delayed difference pattern). The commonly used logrank test and hazard ratio estimation approach will be suboptimal concerning testing and estimation for those trials. The long-term restricted mean survival time (LT-RMST) approach is a promising alternative for detecting the treatment effect that potentially appears later in the study. A challenge in employing the LT-RMST approach is that it must specify a lower end of the time window in addition to a truncation time point that the RMST requires. There are several investigations and suggestions regarding the choice of the truncation time point for the RMST. However, little has been investigated to address the choice of the lower end of the time window. In this paper, we propose a flexible LT-RMST-based test/estimation approach that does not require users to specify a lower end of the time window. Numerical studies demonstrated that the potential power loss by adopting this flexibility was minimal, compared to the standard LT-RMST approach using a prespecified lower end of the time window. The proposed method is flexible and can offer higher power than the RMST-based approach when the delayed treatment effect is expected. Also, it provides a robust estimate of the magnitude of the treatment effect and its confidence interval that corresponds to the test result.

Nonlinear Mixed-Effects Models for HIV Viral Load Trajectories Before and After Antiretroviral Therapy Interruption, Incorporating Left Censoring

Sihaoyu Gao et al. — Tue, 07 Dec 2021 07:52:29 PST

Characterizing features of the viral rebound trajectories and identifying host, virological, and immunological factors that are predictive of the viral rebound trajectories are central to HIV cure research. In this paper, we investigate if key features of HIV viral decay and CD4 trajectories during antiretroviral therapy (ART) are associated with characteristics of HIV viral rebound following ART interruption. Nonlinear mixed effect (NLME) models are used to model viral load trajectories before and following ART interruption, incorporating left censoring due to lower detection limits of viral load assays. A stochastic approximation EM (SAEM) algorithm is used for parameter estimation and inference. To circumvent the computational intensity associated with maximizing the joint likelihood, we propose an easy-to-implement threestep method. We evaluate the performance of this method through simulation studies and apply it to data from the Zurich Primary HIV Infection Study. We find that some key features of viral load during ART (e.g., viral decay rate) are significantly associated with important characteristics of viral rebound following ART interruption (e.g., viral set point).

Causal Mediation Analysis for Difference-in-Difference Design and Panel Data

Pei-Hsuan Hsia et al. — Fri, 15 Oct 2021 06:52:24 PDT

Advantages of panel data, i.e., difference in difference (DID) design data, are a large sample size and easy availability. Therefore, panel data are widely used in epidemiology and in all social science fields. The literatures on causal inferences of panel data setting or DID design are growing, but no theory or mediation analysis method has been proposed for such settings. In this study, we propose a methodology for conducting causal mediation analysis in DID design and panel data setting. We provide formal counterfactual definitions for controlled direct effect and natural direct and indirect effect in panel data setting and DID design, including the identification and required assumptions. We also demonstrate that, under the assumptions of linearity and additivity, controlled direct effects can be estimated by contrasting marginal and conditional DID estimators whereas natural indirect effects can be estimated by calculating the product of the exposure-mediator DID estimator and the mediator-outcome DID estimator. A panel regression-based approach is also proposed. The proposed method is then used to investigate mechanisms of the effects of the Covid 19 pandemic on the mental health status of the population. The results revealed that mobility restrictions mediated approximately 45 % of the causal effect of Covid 19 on mental health status.

Ratio and Difference of Average Hazard with Survival Weight: New Measures to Quantify Survival Benefit of New Therapy

HAJIME UNO et al. — Fri, 10 Sep 2021 12:58:40 PDT

The hazard ratio (HR) has been the most popular measure to quantify the magnitude of treatment effect on time-to-event outcomes in clinical research. However, the HR estimated by Cox's method has several drawbacks. One major issue is that there is no clear interpretation when the proportional hazards (PH) assumption does not hold, because it is affected by study-specific censoring time distribution in non-PH cases. Another major issue is that the lack of a group-specific absolute hazard value in each group obscures the clinical significance of the magnitude of the treatment effect. Given these, we propose average hazard with survival weight (AH-SW) as a summary metric of event time distribution and will use difference in AH-SW (DAH-SW) or ratio of AH-SW (RAH-SW) to quantify the treatment effect magnitude. The AH-SW we propose is a new digestible metric interpreted as a person-years event rate when random censoring would not exist. It is defined as the ratio of tau-year event rate and restricted mean survival time, which can be estimated non-parametrically. Numerical studies demonstrate that DAH-SW and RAH-SW offer almost identical power to Cox's method under PH scenarios and can be more powerful for delayed-difference patterns that are often seen in immunotherapy trials. The proposed metrics (i.e., AH-SW, DAH-SW and RAH-SW) and the inferential methods for them offer a digestible interpretation that the conventional Cox's method could not provide about the survival benefit of a new therapy. These metrics will increase the likelihood that results from clinical studies are correctly interpreted.

On The Conventional Definition Of Path-Specific Effects - fully mediated interaction with multiple ordered mediators

An-Shun Tai et al. — Fri, 23 Jul 2021 11:21:56 PDT

Path-specific effects (PSEs) are a critical measure for assessing mediation in the presence of multiple mediators. However, the conventional definition of PSEs has generated controversy because it often causes misinterpretation of the results of multiple mediator analysis. For in-depth analysis of this issue, we propose the concept of decomposing fully mediated interaction (FMI) from the average causal effect. We show that FMI misclassification is the main cause of PSE misinterpretation. Two strategies for specifying FMI are proposed: isolating FMI and reclassifying FMI. The choice of strategy depends on the objective. Isolating FMI is the superior strategy when the main objective is elucidating the mediation mechanism whereas reclassifying FMI is superior when the main objective is precisely interpreting the mediation analysis results. To compare performance, this study used the two proposed strategies and the conventional decomposition strategy to analyze the mediating roles of dyspnea and anxiety in the effect of impaired lung function on poor health status in a population of patients with chronic obstructive pulmonary disease. The estimation result showed that the conventional decomposition strategy underestimates the importance of dyspnea as a mechanism of this disease. Specifically, the strategy of reclassifying FMI revealed that 50% of the average causal effect is attributable to mediating effects, particularly the mediating effect of dyspnea.

Causal Mediation Analysis with Multiple Time-Varying Mediators

An-Shun Tai et al. — Mon, 19 Jul 2021 08:42:23 PDT

In longitudinal studies with time-varying exposures and mediators, the mediational g-formula is an important method for the assessment of direct and indirect effects. However, current methodologies based on the mediational g-formula can deal with only one mediator. This limitation makes these methodologies inapplicable to many scenarios. Hence, we develop a novel methodology by extending the mediational g-formula to cover cases with multiple time-varying mediators. We formulate two variants of our approach that are each suited to a distinct set of assumptions and effect definitions and present nonparametric identification results of each variant. We further show how complex causal mechanisms (whose complexity derives from the presence of multiple time-varying mediators) can be untangled. A parametric method along with a user-friendly algorithm was implemented in R software. We illustrate our method by investigating the complex causal mechanism underlying the progression of chronic obstructive pulmonary disease. We found that the effects of lung function impairment mediated by dyspnea symptoms and mediated by physical activity accounted for 13.7% and 10.8% of the total effect, respectively. Our analyses thus illustrate the power of this approach, providing evidence for the mediating role of dyspnea and physical activity on the causal pathway from lung function impairment to health status.

Power calculation for cross-sectional stepped-wedge cluster randomized trials with binary outcomes

Linda J. Harrison et al. — Tue, 13 Apr 2021 05:52:31 PDT

Power calculation for stepped-wedge cluster randomized trials (SW-CRTs) presents unique challenges, beyond those of standard cluster randomized trials (CRTs), due to the need to consider temporal within cluster correlations and background period effects. To date, power calculation methods specific to SW-CRTs have primarily been developed under a linear model. When the outcome is binary, the use of a linear model corresponds to assessing a prevalence difference; yet trial analysis often employs a non-linear link function. We assess power for cross-sectional SW-CRTs under a logistic model fitted by generalized estimating equations. Firstly, under an exchangeable correlation structure, we show the power based on a logistic model is lower than that from assuming a linear model in the absence of period effects. We then evaluate the impact of background prevalence changes over time on power. To allow the correlation among outcomes in the same cluster to change over time and with treatment status, we generalize the methods to more complex correlation structures. Our simulation studies demonstrate that the proposed power calculation methods perform well with the model-based variance under the true correlation structure and reveal that a working independence structure can result in substantial efficiency loss, while a working exchangeable structure performs well even when the underlying correlation structure deviates from exchangeable. An extension to our methods accounts for variable cluster sizes and reveals unequal cluster sizes have a modest impact on power. We illustrate the approaches by application to a quality of care improvement trial for acute coronary syndrome.

Identification And Robust Estimation Of Swapped Direct And Indirect Effects: Mediation Analysis With Unmeasured Mediator–Outcome Confounding And Intermediate Confounding

An-Shun Tai et al. — Wed, 27 Jan 2021 06:47:30 PST

Counterfactual-model-based mediation analysis can yield substantial insight into the causal mechanism through the assessment of natural direct effects (NDEs) and natural indirect effects (NIEs). However, the assumptions regarding unmeasured mediator–outcome confounding and intermediate mediator–outcome confounding that are required for the determination of NDEs and NIEs present practical challenges. To address this problem, we introduce an instrumental blocker, a novel quasi-instrumental variable, to relax both of these assumptions, and we define a swapped direct effect (SDE) and a swapped indirect effect (SIE) to assess the mediation. We show that the SDE and SIE are identical to the NDE and NIE, respectively, based on a causal interpretation. Moreover, the empirical expressions of the SDE and SIE are derived with and without an intermediate mediator–outcome confounder. Then, a bias formula is developed to examine the plausibility of the proposed instrumental blocker. Moreover, a multiply robust estimation method is derived to mitigate the model misspecification problem. We prove that the proposed estimator is consistent, asymptotically normal, and achieves the semiparametric efficiency bound. As an illustration, we apply the proposed method to genomic datasets of lung cancer to investigate the potential role of the epidermal growth factor receptor in the treatment of lung cancer.

A modular framework for early-phase seamless oncology trials

Philip S. Boonstra et al. — Fri, 18 Dec 2020 08:12:07 PST

Background: As our understanding of the etiology and mechanisms of cancer becomes more sophisticated and the number of therapeutic options increases, phase I oncology trials today have multiple primary objectives. Many such designs are now 'seamless', meaning that the trial estimates both the maximum tolerated dose and the efficacy at this dose level. Sponsors often proceed with further study only with this additional efficacy evidence. However, with this increasing complexity in trial design, it becomes challenging to articulate fundamental operating characteristics of these trials, such as (i) what is the probability that the design will identify an acceptable, i.e. safe and efficacious, dose level? or (ii) how many patients will be assigned to an acceptable dose level on average? Methods: In this manuscript, we propose a new modular framework for designing and evaluating seamless oncology trials. Each module is comprised of either a dose assignment step or a dose-response evaluation, and multiple such modules can be implemented sequentially. We develop modules from existing phase I/II designs as well as a novel module for evaluating dose-response using a Bayesian isotonic regression scheme. Results: We also demonstrate a freely available R package called seamlesssim to numerically estimate, by means of simulation, the operating characteristics of these modular trials. Conclusions: Together, this design framework and its accompanying simulator allow the clinical trialist to compare multiple different candidate designs, more rigorously assess performance, better justify sample sizes, and ultimately select a higher quality design.

Shrinkage Priors for Isotonic Probability Vectors and Binary Data Modeling

Philip S. Boonstra et al. — Fri, 18 Dec 2020 08:11:59 PST

This paper outlines a new class of shrinkage priors for Bayesian isotonic regression modeling a binary outcome against a predictor, where the probability of the outcome is assumed to be monotonically non-decreasing with the predictor. The predictor is categorized into a large number of groups, and the set of differences between outcome probabilities in consecutive categories is equipped with a multivariate prior having support over the set of simplexes. The Dirichlet distribution, which can be derived from a normalized cumulative sum of gamma-distributed random variables, is a natural choice of prior, but using mathematical and simulation-based arguments, we show that the resulting posterior can be numerically unstable, even under simple data configurations. We propose an alternative prior motivated by horseshoe-type shrinkage that is numerically more stable. We show that this horseshoe-based prior is not subject to the numerical instability seen in the Dirichlet/gamma-based prior and that the posterior can estimate the underlying true curve more efficiently than the Dirichlet distribution. We demonstrate the use of this prior in a model predicting the occurrence of radiation-induced lung toxicity in lung cancer patients as a function of dose delivered to normal lung tissue.

Robust inference on effects attributable to mediators: A controlled-direct-effect-based approach for causal effect decomposition with multiple mediators

An-Shun Tai et al. — Thu, 13 Aug 2020 08:13:27 PDT

Effect decomposition is a critical technique for mechanism investigation in settings with multiple causally ordered mediators. Causal mediation analysis is a standard method for effect decomposition, but the assumptions required for the identification process are extremely strong. By extending the framework of controlled direct effects, this study proposes the effect attributable to mediators (EAM) as a novel measure for effect decomposition. For policy making, EAM represents how much an effect can be eliminated by setting mediators to certain values. From the perspective of mechanism investigation, EAM contains information about how much a particular mediator or set of mediators is involved in the causal mechanism through mediation, interaction, or both. The assumptions of EAM for identification are considerably weaker than the those of causal mediation analysis. We develop a semiparametric estimator of EAM with robustness to model misspecification. The asymptotic property is fully realized. We applied EAM to assess the magnitude of the effect of hepatitis C virus infection on mortality, which was eliminated by controlling alanine aminotransferase and treating hepatocellular carcinoma.

Integrated multiple mediation analysis: A robustness–specificity trade-off in causal structure

An-Shun Tai et al. — Tue, 26 May 2020 06:47:57 PDT

Recent methodological developments in causal mediation analysis have addressed several issues regarding multiple mediators. However, these developed methods differ in their definitions of causal parameters, assumptions for identification, and interpretations of causal effects, making it unclear which method ought to be selected when investigating a given causal effect. Thus, in this study, we construct an integrated framework, which unifies all existing methodologies, as a standard for mediation analysis with multiple mediators. To clarify the relationship between existing methods, we propose four strategies for effect decomposition: two-way, partially forward, partially backward, and complete decompositions. This study reveals how the direct and indirect effects of each strategy are explicitly and correctly interpreted as path-specific effects under different causal mediation structures. In the integrated framework, we further verify the utility of the interventional analogues of direct and indirect effects, especially when natural direct and indirect effects cannot be identified or when cross-world exchangeability is invalid. Consequently, this study yields a robustness–specificity trade-off in the choice of strategies. Inverse probability weighting is considered for estimation. The four strategies are further applied to a simulation study for performance evaluation and for analyzing the Risk Evaluation of Viral Load Elevation and Associated Liver Disease/Cancer data set from Taiwan to investigate the causal effect of hepatitis C virus infection on mortality.

Survival mediation analysis with the death-truncated mediator: The completeness of the survival mediation parameter

An-Shun Tai et al. — Wed, 01 Apr 2020 07:02:44 PDT

In medical research, the development of mediation analysis with a survival outcome has facilitated investigation into causal mechanisms. However, studies have not discussed the death-truncation problem for mediators, the problem being that conventional mediation parameters cannot be well-defined in the presence of a truncated mediator. In the present study, we systematically defined the completeness of causal effects to uncover the gap, in conventional causal definitions, between the survival and nonsurvival settings. We proposed three approaches to redefining the natural direct and indirect effects, which are generalized forms of the conventional causal effects for survival outcomes. Furthermore, we developed three statistical methods for the binary outcome of the survival status and formulated a Cox model for survival time. We performed simulations to demonstrate that the proposed methods are unbiased and robust. We also applied the proposed method to explore the effect of hepatitis C virus infection on mortality, as mediated through hepatitis B viral load.

Randomization-Based Confidence Intervals for Cluster Randomized Trials

Dustin J. Rabideau et al. — Tue, 28 Jan 2020 12:21:47 PST

In a cluster randomized trial (CRT), groups of people are randomly assigned to different interventions. Existing parametric and semiparametric methods for CRTs rely on distributional assumptions or a large number of clusters to maintain nominal confidence interval (CI) coverage. Randomization-based inference is an alternative approach that is distribution-free and does not require a large number of clusters to be valid. Although it is well-known that a CI can be obtained by inverting a randomization test, this requires randomization testing a non-zero null hypothesis, which is challenging with non-continuous and survival outcomes. In this paper, we propose a general method for randomization-based CIs using individual-level data from a CRT. This fast and flexible approach accommodates various outcome types, can account for design features such as matching or stratification, and employs a computationally efficient algorithm. We evaluate this method's performance through simulations and apply it to the Botswana Combination Prevention Project, a large HIV prevention trial with an interval-censored time-to-event outcome.

Estimating Marginal Hazard Ratios by Simultaneously Using A Set of Propensity Score Models: A Multiply Robust Approach

Di Shu et al. — Fri, 24 Jan 2020 13:53:04 PST

The inverse probability weighted Cox model is frequently used to estimate marginal hazard ratios. Its validity requires a crucial condition that the propensity score model is correctly specified. To provide protection against misspecification of the propensity score model, we propose a weighted estimation method rooted in empirical likelihood theory. The proposed estimator is multiply robust in that it is guaranteed to be consistent when a set of postulated propensity score models contains a correctly specified model. Our simulation studies demonstrate satisfactory finite sample performance of the proposed method in terms of consistency and efficiency. We apply the proposed method to compare the risk of postoperative hospitalization between sleeve gastrectomy and Roux-en-Y gastric bypass using data from a large medical claims and billing database.We further extend the development to multi-site studies to enable each site to postulate multiple site-specific propensity score models.

Statistical Inference for Networks of High-Dimensional Point Processes

Xu Wang et al. — Wed, 15 Jan 2020 15:29:55 PST

Fueled in part by recent applications in neuroscience, high-dimensional Hawkes process have become a popular tool for modeling the network of interactions among multivariate point process data. While evaluating the uncertainty of the network estimates is critical in scientific applications, existing methodological and theoretical work have only focused on estimation. To bridge this gap, this paper proposes a high-dimensional statistical inference procedure with theoretical guarantees for multivariate Hawkes process. Key to this inference procedure is a new concentration inequality on the first- and second-order statistics for integrated stochastic processes, which summarizes the entire history of the process. We apply this concentration inequality, combining a recent result on martingale central limit theory, to give an upper bounds for the convergence rate of the test statistics. We verify our theoretical results with extensive simulation and an application to a neuron spike train data set.

Generalized Matrix Decomposition Regression: Estimation and Inference for Two-way Structured Data

Yue Wang et al. — Wed, 15 Jan 2020 15:29:49 PST

Analysis of two-way structured data, i.e., data with structures among both variables and samples, is becoming increasingly common in ecology, biology and neuro-science. Classical dimension-reduction tools, such as the singular value decomposition (SVD), may perform poorly for two-way structured data. The generalized matrix decomposition (GMD, Allen et al., 2014) extends the SVD to two-way structured data and thus constructs singular vectors that account for both structures. While the GMD is a useful dimension-reduction tool for exploratory analysis of two-way structured data, it is unsupervised and cannot be used to assess the association between such data and an outcome of interest. In this article, we first propose the GMD regression (GMDR) as an estimation/prediction tool that seamlessly incorporates two-way structures into high-dimensional linear models. The proposed GMDR directly regresses the outcome on a set of GMD components, selected by a novel procedure that guarantees the best prediction performance. We then propose the GMD inference (GMDI) framework to identify variables that are associated with the outcome for any model in a large family of regression models that includes GMDR. As opposed to most existing tools for high-dimensional inference, GMDI efficiently accounts for pre-specified two-way structures and can provide asymptotically valid inference even for non-sparse coefficient vectors. We study the theoretical properties of GMDI in terms of both the type-I error rate and power. We demonstrate the effectiveness of GMDR and GMDI on simulated data and an application to microbiome data.

Estimation of Conditional Power for Cluster-Randomized Trials with Interval-Censored Endpoints

Kaitlyn Cook et al. — Fri, 10 Jan 2020 08:53:28 PST

Cluster-randomized trials (CRTs) of infectious disease preventions often yield correlated, interval-censored data: dependencies may exist between observations from the same cluster, and event occurrence may be assessed only at intermittent clinic visits. This data structure must be accounted for when conducting interim monitoring and futility assessment for CRTs. In this article, we propose a flexible framework for conditional power estimation when outcomes are correlated and interval-censored. Under the assumption that the survival times follow a shared frailty model, we first characterize the correspondence between the marginal and cluster-conditional survival functions, and then use this relationship to semiparametrically estimate the cluster-specific survival distributions from the available interim data. We incorporate assumptions about changes to the event process over the remainder of the trial---as well as estimates of the dependency among observations in the same cluster---to extend these survival curves through the end of the study. Based on these projected survival functions we generate correlated interval-censored observations, and then calculate the conditional power as the proportion of times (across multiple full-data generation steps) that the null hypothesis of no treatment effect is rejected. We evaluate the performance of the proposed method through extensive simulation studies, and illustrate its use on a large cluster-randomized HIV prevention trial.