<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0">
<channel>
<title>U.C. Berkeley Division of Biostatistics Working Paper Series</title>
<copyright>Copyright (c) 2013 University of California, Berkeley All rights reserved.</copyright>
<link>http://biostats.bepress.com/ucbbiostat</link>
<description>Recent documents in U.C. Berkeley Division of Biostatistics Working Paper Series</description>
<language>en-us</language>
<lastBuildDate>Sat, 25 May 2013 01:49:43 PDT</lastBuildDate>
<ttl>3600</ttl>


	
		
	

	
		
	







<item>
<title>Subsemble: An Ensemble Method for Combining Subset-Specific Algorithm Fits</title>
<link>http://biostats.bepress.com/ucbbiostat/paper313</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper313</guid>
<pubDate>Thu, 23 May 2013 09:17:02 PDT</pubDate>
<description>
	<![CDATA[
	<p>Ensemble methods using the same underlying algorithm trained on different subsets of observations have recently received increased attention as practical prediction tools for massive datasets.  We propose Subsemble: a general subset ensemble prediction method, which can be used for small, moderate, or large datasets.  Subsemble partitions the full dataset into subsets of observations, fits a specified underlying algorithm on each subset, and uses a clever form of V-fold cross-validation to output a prediction function that combines the subset-specific fits.  We give an oracle result that provides a theoretical performance guarantee for Subsemble.  Through simulations, we demonstrate that Subsemble can be a beneficial tool for small to moderate sized datasets, and often has better prediction performance than the underlying algorithm fit just once on the full dataset.  We also describe how to include Subsemble as a candidate in a SuperLearner library, providing a practical way to evaluate the performance of Subsemble relative to the underlying algorithm fit just once on the full dataset.</p>

	]]>
</description>

<author>Stephanie Sapp et al.</author>


</item>






<item>
<title>Targeted Maximum Likelihood Estimation for Dynamic and Static Longitudinal Marginal Structural Working Models</title>
<link>http://biostats.bepress.com/ucbbiostat/paper312</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper312</guid>
<pubDate>Wed, 22 May 2013 09:43:49 PDT</pubDate>
<description>
	<![CDATA[
	<p><blockquote>This paper presents a novel targeted maximum likelihood estimator (TMLE) estimator for the parameters of longitudinal static and dynamic marginal structural models.We consider a longitudinal data structure consisting of baseline covariates, time-dependent intervention nodes,  intermediate time-dependent covariates, and a possibly time dependent outcome. The intervention nodes at each time point can include a binary treatment as well as a right-censoring indicator. Given a class of dynamic or static interventions, a marginal structural model is used to model the mean of the intervention specific counterfactual outcome  as a function of the intervention and time point.Because the true shape of this function is rarely known, the marginal structural model is used as a working model. The causal quantity of interest is defined as the projection of the true  function onto this working model. We introduce a new pooled  TMLE for the parameters of such marginal structural working models, and compare this estimator to a recently proposed  stratified TMLE that is based on estimating the intervention-specific mean separately for each intervention of interest. The performance of the  pooled TMLE is compared to the performance of the stratified TMLE and the performance of inverse probability weighted estimators using simulations. Concepts are illustrated using an example in which the aim is to estimate the causal effect of delayed switch following immunological failure of first line antiretroviral therapy among HIV infected patients. Data from the International epidemiological Databases to Evaluate AIDS, Southern Africa are analyzed to investigate this question using both TMLE and IPW estimators. Our results demonstrate  practical advantages over an IPW estimator for working marginal structural models for survival, as well as  cases in which the pooled TMLE is superior to its stratified counterpart.</blockquote></p>

	]]>
</description>

<author>Maya L. Petersen et al.</author>


</item>






<item>
<title>Balancing Score Adjusted Targeted Minimum Loss-based Estimation</title>
<link>http://biostats.bepress.com/ucbbiostat/paper311</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper311</guid>
<pubDate>Thu, 16 May 2013 15:20:03 PDT</pubDate>
<description>
	<![CDATA[
	<p>Adjusting for a balancing score is sufficient for bias reduction when estimating causal effects including the average treatment effect and effect among the treated.  Estimators that adjust for the propensity score in a nonparametric way, such as matching on an estimate of the propensity score, can be consistent when the estimated propensity score is not consistent for the true propensity score but converges to some other balancing score. We call this property the balancing score property, and discuss a class of estimators that have this property. We introduce a targeted minimum loss-based estimator (TMLE) for a treatment specific mean with the balancing score property that is additionally locally efficient and doubly robust. We investigate the new estimator's performance relative to other estimators, including another TMLE, a propensity score matching estimator, an inverse probability of treatment weighted estimator, and a regression based estimator  in simulation studies.</p>

	]]>
</description>

<author>Samuel D. Lendle et al.</author>


</item>






<item>
<title>Estimating Effects on Rare Outcomes: Knowledge is Power</title>
<link>http://biostats.bepress.com/ucbbiostat/paper310</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper310</guid>
<pubDate>Mon, 13 May 2013 11:23:25 PDT</pubDate>
<description>
	<![CDATA[
	<p>Understanding the etiology of rare cancers, perinatal mortality, international conflicts or natural disasters can have profound impacts on population health and well-being. However, when the outcome of interest occurs in 5% or less of the population, effect estimation can be particularly challenging. To increase statistical power and the stability of results, researchers commonly oversample cases or events. However, the study of rare outcomes need not be limited to case-control settings. Building on the work of Gruber and van der Laan (2010), we construct a new targeted minimum loss-based estimator (TMLE) for estimating the effect of an exposure or treatment on a rare outcome. We focus on the average treatment effect and statistical models incorporating known bounds on the conditional probability of the outcome. The proposed TMLE improves upon existing methods in several ways. First, the substitution estimator respects the global knowledge in the statistical model. Second, the proposed TMLE achieves high rates of convergence in that it solves the efficient influence curve equation despite the paucity of events. Third, the new TMLE achieves lower bias and variance than the standard TMLE in finite samples. Fourth, the proposed estimator is powered to detect effect sizes on the order of the prevalence of the outcome, even in small samples. Fifth, the new TMLE allows for more aggressive estimation of the conditional mean outcome, while guaranteeing the fitted probabilities are within the model bounds and maintaining valid Type I error rates. Finally, the methodology permits the examination rare outcomes in prospective studies, randomized trials or case-control designs. These advantages are illustrated with Monte Carlo simulations.</p>

	]]>
</description>

<author>Laura B. Balzer et al.</author>


</item>






<item>
<title>An Application Of Machine Learning Methods To The Derivation Of Exposure-Response Curves For Respiratory Outcomes</title>
<link>http://biostats.bepress.com/ucbbiostat/paper309</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper309</guid>
<pubDate>Wed, 01 May 2013 10:18:22 PDT</pubDate>
<description>
	<![CDATA[
	<p>Analyses of epidemiological studies of the association between short-term changes in air pollution and health outcomes have not sufficiently discussed the degree to which the statistical models chosen for these analyses reflect what is actually known about the true data-generating distribution. We present a method to estimate population-level ambient air pollution (NO2) exposure-health (wheeze in children with asthma) response functions that is not dependent on assumptions about the data-generating function that underlies the observed data and which focuses on a specific scientific parameter of interest (the marginal adjusted association of exposure on probability of wheeze, over a grid of possible exposure values). We show that this approach provides a more nuanced summary of the data than more typical statistical methods used in air pollution epidemiology and epidemiological studies in general.</p>

	]]>
</description>

<author>Ekaterina Eliseeva et al.</author>


</item>






<item>
<title>Vertically Shifted Mixture Models for Clustering Longitudinal Data by Shape</title>
<link>http://biostats.bepress.com/ucbbiostat/paper308</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper308</guid>
<pubDate>Mon, 25 Mar 2013 08:31:49 PDT</pubDate>
<description>
	<![CDATA[
	<p>Longitudinal studies play a prominent role in health, social and behavioral sciences as well as in the biological sciences, economics, and marketing. By following subjects over time, temporal changes in an outcome of interest can be directly observed and studied. An important question concerns the existence of distinct trajectory patterns. One way to determine these distinct patterns is through cluster analysis, which seeks to separate objects (subjects, patients, observational units) into homogeneous groups. Many methods have been adapted for longitudinal data, but almost all of them fail to explicitly group trajectories according to distinct pattern shapes.    To fulfill the need for clustering based explicitly on shape, we propose vertically shifting the data by subtracting the subject-specific mean directly removes the level prior to fitting a mixture modeling. This non-invertible transformation can result in singular covariance matrixes, which makes mixture model estimation difficult. Despite the challenges, this method outperforms existing clustering methods in a simulation study.</p>

	]]>
</description>

<author>Brianna C. Heggeseth et al.</author>


</item>






<item>
<title>Targeted Estimation of Variable Importance Measures with Interval-Censored Outcomes</title>
<link>http://biostats.bepress.com/ucbbiostat/paper307</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper307</guid>
<pubDate>Fri, 22 Feb 2013 09:25:19 PST</pubDate>
<description>
	<![CDATA[
	<p>In most experimental and observational studies, participants are not followed in continuous time. Instead, data is collected about participants only at certain monitoring times. These monitoring times are random, and often participant specific. As a result, outcomes are only known up to random time intervals, resulting in interval-censored data. In contrast, when estimating variable importance measures on interval-censored outcomes, practitioners often ignore the presence of interval-censoring, and instead treat the data as continuous or right-censored, applying ad-hoc approaches to mask the true interval-censoring. In this paper, we describe Targeted Minimum Loss-based Estimation methods tailored for estimation of variable importance measures with interval-censored outcomes. We demonstrate the performance of the interval-censored TMLE procedure through simulation studies, and apply the method to analyze the effects of a variety of variables on spontaneous hepatitis C virus clearance among injection drug users, using data from the “International Collaboration of Incident HIV and HCV in Injecting Cohorts” project.</p>

	]]>
</description>

<author>Stephanie Sapp et al.</author>


</item>






<item>
<title>Targeted Data Adaptive Estimation of the Causal Dose Response Curve</title>
<link>http://biostats.bepress.com/ucbbiostat/paper306</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper306</guid>
<pubDate>Thu, 24 Jan 2013 15:24:15 PST</pubDate>
<description>
	<![CDATA[
	<p>Estimation of the causal dose-response curve is an old problem in statistics. In a non parametric model, if the treatment is continuous, the dose-response curve is not a pathwise differentiable parameter, and no root-n-consistent estimator is available. However, the risk of a candidate algorithm for estimation of the dose response curve is a pathwise differentiable parameter, whose consistent and efficient estimation is possible. In this work, we review the cross validated augmented inverse probability of treatment weighted estimator (CV A-IPTW) of the risk, and present a cross validated targeted minimum loss based estimator (CV-TMLE) counterpart. These estimators are proven consistent an efficient under certain consistency and regularity conditions on the initial estimators of the outcome and treatment mechanism. We also present a methodology that uses these estimated risks to select among a library of candidate algorithms. These selectors are proven optimal in the sense that they are asymptotically equivalent to the oracle selector under certain consistency conditions on the estimators of the treatment and outcome mechanisms. Because the CV-TMLE is a substitution estimator, it is more robust than the CV-AIPTW against empirical violations of the positivity assumption. This and other small sample size differences between the CV-TMLE and the CV-A-IPTW are explored in a simulation study.</p>

	]]>
</description>

<author>Iván Díaz et al.</author>


</item>






<item>
<title>Optimal Spatial Prediction Using Ensemble Machine Learning</title>
<link>http://biostats.bepress.com/ucbbiostat/paper305</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper305</guid>
<pubDate>Thu, 20 Dec 2012 14:32:01 PST</pubDate>
<description>
	<![CDATA[
	<p>Spatial prediction is an important problem in many scientific disciplines. Super Learner is an ensemble prediction approach related to stacked generalization that uses cross-validation to search for the optimal predictor amongst all convex combinations of a heterogeneous candidate set. It has been applied to non-spatial data, where theoretical results demonstrate it will perform asymptotically at least as well as the best candidate under consideration. We review these optimality properties and discuss the assumptions required in order for them to hold for spatial prediction problems. We present results of a simulation study confirming Super Learner works well in practice under a variety of sample sizes, sampling designs, and data-generating functions. We also apply Super Learner to a real world dataset.</p>

	]]>
</description>

<author>Molly M. Davies et al.</author>


</item>






<item>
<title>Computationally Efficient Confidence Intervals for Cross-validated Area Under the ROC Curve Estimates</title>
<link>http://biostats.bepress.com/ucbbiostat/paper304</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper304</guid>
<pubDate>Wed, 05 Dec 2012 20:21:00 PST</pubDate>
<description>
	<![CDATA[
	<p>In binary classification problems, the area under the ROC curve (AUC), is an effective means of measuring the performance of your model. Most often, cross-validation is also used, in order to assess how the results will generalize to an independent data set. In order to evaluate the quality of an estimate for cross-validated AUC, we must obtain an estimate for its variance. For massive data sets, the process of generating a single performance estimate can be computationally expensive. Additionally, when using a complex prediction method, calculating the cross-validated AUC on even a relatively small data set can still require a large amount of computation time. Thus, when the processes of obtaining a single estimate for cross-validated AUC is significant, the bootstrap, as a means of variance estimation, can be computationally intractable. As an alternative to the bootstrap, we demonstrate a computationally efficient influence curve based approach to obtaining a variance estimate for cross-validated AUC.</p>

	]]>
</description>

<author>Erin LeDell et al.</author>


</item>






<item>
<title>Sensitivity Analysis for Causal Inference Under Unmeasured  Confounding and Measurement Error Problems</title>
<link>http://biostats.bepress.com/ucbbiostat/paper303</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper303</guid>
<pubDate>Wed, 05 Dec 2012 14:49:04 PST</pubDate>
<description>
	<![CDATA[
	<p>In this paper we present a sensitivity analysis for drawing inferences about parameters that are not estimable from observed data without additional assumptions. We present the methodology using two different examples: a causal parameter that is not identifiable due to violations of the randomization assumption, and a parameter that is not estimable in the nonparametric model due to measurement error. Existing methods for tackling these problems assume a parametric model for the type of violation to the identifiability assumption, and require the development of new estimators and inference for every new model. The method we present can be used in conjunction with any existing asymptotically linear estimator of an observed data parameter that approximates the unidentifiable full data parameter, and does not require the study of additional models.</p>

	]]>
</description>

<author>Iván Díaz et al.</author>


</item>






<item>
<title>Statistical Inference when using Data Adaptive Estimators of Nuisance Parameters</title>
<link>http://biostats.bepress.com/ucbbiostat/paper302</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper302</guid>
<pubDate>Mon, 26 Nov 2012 09:57:02 PST</pubDate>
<description>
	<![CDATA[
	<p>In order to be concrete we focus on estimation of the treatment specific mean, controlling for all measured baseline covariates, based on observing n independent and identically distributed copies of a random variable consisting of baseline covariates, a subsequently assigned binary treatment, and a final outcome. The statistical model only assumes possible restrictions on the conditional distribution of treatment, given the covariates, the so called propensity score. Estimators of the treatment specific mean involve estimation of the propensity score and/or estimation of the conditional mean of the outcome, given the treatment and covariates. In order to make these estimators asymptotically unbiased at any data distribution in the statistical model, it is essential to use data adaptive estimators of these nuisance parameters such as ensemble learning, and specifically super-learning. Because such estimators involve optimal trade-off of bias and variance w.r.t. the infinite dimensional nuisance parameter itself, they result in a sub-optimal bias/variance trade-off for the resulting real valued estimator of the estimand. We demonstrate that additional targeting of the estimators of these nuisance parameters guarantees that this bias for the estimand is second order, and thereby allows us to prove theorems thatestablish asymptotic linearity of the estimator of the treatment specific mean under regularity conditions. These insights result in novel targeted maximum likelihood estimators (TMLE) that use ensemble learning withadditional targeted bias reduction to construct estimators of the nuisance parameters. In particular, we construct collaborative targeted maximum likelihood estimators (CTMLE) with known influence curve allowing for statistical inference, even though these CTMLEs involve variable selection for the propensity score based on a criterion that measures how effective the resulting fit of the propensity score is in removing bias for the estimand. As a particular special case, we also demonstrate the required targeting of the propensity score for the inverse probability of treatment weighted estimator using super-learning to fit the propensity score.</p>

	]]>
</description>

<author>Mark J. van der Laan</author>


</item>






<item>
<title>The Impact of Covariance Misspecification in Multivariate Gaussian Mixtures on Estimation and Inference: An Application to Longitudinal Modeling</title>
<link>http://biostats.bepress.com/ucbbiostat/paper301</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper301</guid>
<pubDate>Fri, 12 Oct 2012 09:31:34 PDT</pubDate>
<description>
	<![CDATA[
	<p>Multivariate Gaussian mixtures are a class of models that provide a flexible parametric approach for the representation of heterogeneous multivariate outcomes. When the outcome is a vector of repeated measurements taken on the same subject, there is often inherent dependence between observations. However, a common covariance assumption is conditional independence---that is, given the mixture component label, the outcomes for subjects are independent. In this paper, we study, through asymptotic bias calculations and simulation, the impact of covariance misspecification in multivariate Gaussian mixtures. Although maximum likelihood estimators of regression and mixing probability parameters are not consistent under misspecification, they have little asymptotic bias when mixture components are well-separated or if the assumed correlation is close to the truth even when the covariance is misspecified. We also present a robust standard error estimator and show that it outperforms conventional estimators in simulations and can indicate the model is misspecified. Body mass index data from a national longitudinal study is used to demonstrate the effects of misspecification on potential inferences made in practice.</p>

	]]>
</description>

<author>Brianna C. Heggeseth et al.</author>


</item>






<item>
<title>Causal Inference for Networks</title>
<link>http://biostats.bepress.com/ucbbiostat/paper300</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper300</guid>
<pubDate>Wed, 10 Oct 2012 10:50:16 PDT</pubDate>
<description>
	<![CDATA[
	<p>Suppose that we observe a population of causally connected units according to a network. On each unit we observe a set of potentially connected units that contains the true connections, and a longitudinal data structure, which includes time-dependent exposure or treatment, time-dependent covariates, a final outcome of interest.   The target quantity of interest is defined as the mean outcome for this group of units if the exposures of the units would be probabilistically assigned according to a known specified mechanism, where the latter is called a stochastic intervention. Causal effects of interest are defined as contrasts of the mean of the unit specific outcomes under different stochastic interventions one wishes to evaluate. By varying the network structure, this covers a large range of estimation problems ranging from independent units, independent clusters of units, anda single cluster of units in which each unit has a limited number of connections to other units. We present a few motivating classes of examples, propose a structural causal model, define the desired causal quantities, address the identification of these quantities from the observed data, and define maximum likelihood based estimators based on cross-validation.</p>
<p>Such smoothed/regularized maximum likelihood estimators are not targeted and will thereby be overly bias w.r.t. the target parameter, and, as a consequence, generally not result in asymptotically normally distributed estimators of the statistical target parameter. Therefore, we formulated targeted maximum likelihood estimators of this estimand, and showed that the robustness of the efficient influence curve implies that the bias of the TMLE will be a second order term involving squared differences of two nuisance parameters. In order to deal with the curse of dimensionality, we present super-learning based on cross-validation, and we develop targeted maximum likelihood estimators, which are less biased than maximum likelihood estimators due to a targeted bias reduction step. Due to the causal dependencies between units, the data set may correspond with the realization of a single experiment, so that establishing a (e.g., normal) limit distribution for the estimators, and corresponding statistical inference, is a challenging topic. In order to establish a formal theorem, we focus on the point-treatment longitudinal data structure, thereby also putting down a foundation for its generalization to the general longitudinal data structure, which we reserve for future research.We conclude with a discussion.</p>

	]]>
</description>

<author>Mark J. van der Laan</author>


</item>






<item>
<title>Targeted Learning of The Probability of Success of An In Vitro Fertilization Program Controlling for Time-dependent Confounders</title>
<link>http://biostats.bepress.com/ucbbiostat/paper299</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper299</guid>
<pubDate>Wed, 10 Oct 2012 10:31:12 PDT</pubDate>
<description>
	<![CDATA[
	<p><blockquote>Infertility is a global public health issue and various treatments are available. In vitro fertilization (IVF) is an increasingly common treatment method, but accurately assessing the success of IVF programs has proven challenging since they consist of multiple cycles. We present a double robust semiparametric method that incorporates machine learning to estimate the probability of success (i.e., delivery resulting from embryo transfer) of a program of at most four IVF cycles in the French Devenir Apr`es Interruption de la FIV (DAIFI) study and several simulation studies, controlling for time-dependent confounders. We find that the probability of success in the DAIFI study is 50% (95% confidence interval [0.48, 0.53]), therefore approximately half of future participants in a program of at most four IVF cycles can expect a delivery resulting from embryo transfer.</blockquote></p>

	]]>
</description>

<author>Antoine Chambaz et al.</author>


</item>






<item>
<title>Assessing the Causal Effect of Policies: An Approach Based on Stochastic Interventions</title>
<link>http://biostats.bepress.com/ucbbiostat/paper298</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper298</guid>
<pubDate>Mon, 08 Oct 2012 10:06:33 PDT</pubDate>
<description>
	<![CDATA[
	<p>Stochastic interventions are a powerful tool to define parameters that measure the causal effect of a realistic intervention that intends to alter the population distribution of an exposure. In this paper we follow the approach described in D\'iaz and van der Laan (2011) to define and estimate the effect of an intervention that is expected to cause a truncation in the population distribution of the exposure. The observed data parameter that identifies the causal parameter of interest is established, as well as its efficient influence function under the non parametric model. Inverse probability of treatment weighted (IPTW), augmented IPTW and targeted minimum loss based estimators (TMLE) are proposed, their consistency and efficiency properties are determined. An extension to longitudinal data structures is presented and its use is demonstrated with a real data example.</p>

	]]>
</description>

<author>Iván Díaz et al.</author>


</item>






<item>
<title>Targeted Learning for Causality and Statistical Analysis in Medical Research</title>
<link>http://biostats.bepress.com/ucbbiostat/paper297</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper297</guid>
<pubDate>Thu, 23 Aug 2012 09:24:21 PDT</pubDate>
<description>
	<![CDATA[
	<p>The authors present the use of targeted learning methods for medical research, prepared as a chapter for the upcoming book "Statistics: Discovering Your Future Power." The targeted learning framework involves the explicit specification of the data, model, and parameter. The estimators are double robust and efficient, and can incorporate machine learning procedures such as the super learner.</p>

	]]>
</description>

<author>Sherri Rose et al.</author>


</item>






<item>
<title>Adaptive Matching in Randomized Trials and Observational Studies</title>
<link>http://biostats.bepress.com/ucbbiostat/paper296</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper296</guid>
<pubDate>Fri, 27 Jul 2012 15:55:10 PDT</pubDate>
<description>
	<![CDATA[
	<p>In many randomized and observational studies the allocation of treatment among a sample of n independent and identically distributed units is a function of the covariates of all sampled units. As a result, the treatment labels among the units are possibly dependent, complicating estimation and posing challenges for statistical inference. For example, cluster randomized trials frequently sample communities from some target population, construct matched pairs of communities from those included in the sample based on some metric of similarity in baseline community characteristics, and then randomly allocate a treatment and a control intervention within each matched pair. In this case, the observed data can neither be represented as the realization of <em>n</em> independent random variables, nor, contrary to current practice, as the realization of <em>n/2</em> independent random variables (treating the matched pair as the independent sampling unit).</p>
<p>In this paper we study estimation of the average causal effect of a treatment under experimental designs in which treatment allocation potentially depends on the pre-intervention covariates of all units included in the sample. We define efficient targeted minimum loss based estimators for this general design, present a theorem that establishes the desired asymptotic normality of these estimators and allows for asymptotically valid statistical inference, and discuss implementation of these estimators.   We further investigate the relative asymptotic efficiency of this design compared with a design in which unit-specific treatment assignment depends only on the units' covariates. Our findings have practical implications for the optimal design and analysis of pair matched cluster randomized trials, as well as for observational studies in which treatment decisions may depend on characteristics of the entire sample.</p>

	]]>
</description>

<author>Mark J. van der Laan et al.</author>


</item>






<item>
<title>Causal Mediation in a Survival Setting with Time-Dependent Mediators</title>
<link>http://biostats.bepress.com/ucbbiostat/paper295</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper295</guid>
<pubDate>Tue, 17 Jul 2012 16:10:45 PDT</pubDate>
<description>
	<![CDATA[
	<p>The effect of an expsore on an outcome of interest is often mediated by intermediate variables. The goal of causal mediation analysis is to evaluate the role of these intermediate variables (mediators) in the causal effect of the exposure on the outcome. In this paper, we consider causal mediation of a baseline exposure on a survival (or time-to-event) outcome, when the mediator is time-dependent. The challenge in this setting lies in that the event process takes places jointly with the mediator process; in particular, the length of the mediator history depends on the survival time. As a result, we argue that the definition of natural effects in this setting should be based on only blocking those paths from treatment to mediators that are not through the survival history. We propose to use a stochastic interventions (SI) perspective, introduced by Didelez, Dawid, and Geneletti (2006), to formulate the causal mediation analysis problem in this setting. Under this formulation, the mediators are regarded as intervention variables, onto which a given counterfactual distribution is enforced. The natural direct and indirect effects can be defined analogously to the ideas in Pearl (2001). In particular, they also allow for a total effect decomposition and an interpretation of the natural direct effect as a weighted average of controlled direct effects. The statistical parameters that should arise are defined nonparametrically; therefore, they have meaningful interpretations, independent of the causal formulations and assumptions. We present a general semiparametric inference framework for these parameters. Using their efficient influence functions, we develop semiparametric efficient and robust targeted substitution-based (TMLE) and estimating-equation-based (A-IPTW) estimators. An IPTW estimator and g-computation estimator will also be presented.</p>

	]]>
</description>

<author>Wenjing Zheng et al.</author>


</item>






<item>
<title>Why Match in Individually and Cluster Randomized Trials?</title>
<link>http://biostats.bepress.com/ucbbiostat/paper294</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper294</guid>
<pubDate>Thu, 17 May 2012 15:12:48 PDT</pubDate>
<description>
	<![CDATA[
	<p>The decision to match individuals or clusters in randomized trials is motivated by both practical and statistical concerns. Matching protects against chance imbalances in baseline covariate distributions and is thought to improve study credibility. Matching is also implemented to increase study power. This article compares the asymptotic efficiency of the pair-matched design, where units are matched on baseline covariates and the treatment randomized within pairs, to the independent design, where units are randomly paired and the treatment randomized within pairs. We focus on estimating the average treatment effect and use the efficient influence curve to understand the information provided by each design for estimation of this causal parameter. Our theoretical results indicate that the pair-matched design is asymptotically less efficient than its non-matched counterpart. Our simulations confirm these results asymptotically and in finite samples. Our approach is estimator-independent, avoids all parametric modeling assumptions, and applies equally to individually randomized and cluster randomized trials.</p>

	]]>
</description>

<author>Laura B. Balzer et al.</author>


</item>





</channel>
</rss>
