<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0">
<channel>
<title>Collection of Biostatistics Research Archive</title>
<copyright>Copyright (c) 2017 COBRA All rights reserved.</copyright>
<link>http://biostats.bepress.com</link>
<description>Recent documents in Collection of Biostatistics Research Archive</description>
<language>en-us</language>
<lastBuildDate>Sun, 09 Jul 2017 01:42:32 PDT</lastBuildDate>
<ttl>3600</ttl>








<item>
<title>Biomarker Combinations for Diagnosis and Prognosis in Multicenter Studies: Principles and Methods</title>
<link>http://biostats.bepress.com/uwbiostat/paper419</link>
<guid isPermaLink="true">http://biostats.bepress.com/uwbiostat/paper419</guid>
<pubDate>Thu, 29 Jun 2017 11:54:38 PDT</pubDate>
<description>
	<![CDATA[
	<p>Many investigators are interested in combining biomarkers to predict an outcome of interest or detect underlying disease. This endeavor is complicated by the fact that many biomarker studies involve data from multiple centers. Depending upon the relationship between center, the biomarkers, and the target of prediction, care must be taken when constructing and evaluating combinations of biomarkers. We introduce a taxonomy to describe the role of center and consider how a biomarker combination should be constructed and evaluated. We show that ignoring center, which is frequently done by clinical researchers, is often not appropriate. The limited statistical literature proposes using random intercept logistic regression models, an approach that we demonstrate is generally inadequate and may be misleading. We instead propose using fixed intercept logistic regression, which appropriately accounts for center without relying on untenable assumptions. After constructing the biomarker combination, we recommend using performance measures that account for the multicenter nature of the data, namely the center-adjusted area under the receiver operating characteristic curve. We apply these methods to data from a multicenter study of acute kidney injury after cardiac surgery. Appropriately accounting for center, both in construction and evaluation, may increase the likelihood of identifying clinically useful biomarker combinations.</p>

	]]>
</description>

<author>Allison Meisner et al.</author>


</item>






<item>
<title>Age- and Sex-Specific Transformations of Health Status Measures to Incorporate Death</title>
<link>http://biostats.bepress.com/uwbiostat/paper418</link>
<guid isPermaLink="true">http://biostats.bepress.com/uwbiostat/paper418</guid>
<pubDate>Fri, 16 Jun 2017 13:40:21 PDT</pubDate>
<description>
	<![CDATA[
	<p><strong>Introduction:</strong> Measures of health status and physical function do not usually include a specific code for death. This can cause problems in longitudinal studies because analyses limited to survivors may bias the results. One approach is to recode the status variables to include a reasonable value for death. One method that has been used is to replace each scale value with the estimated probability that a person with this value will be “healthy”. “Healthy” has been defined as being above a particular threshold on the variable of interest one year later, or alternatively as being in excellent, very good, or good self-rated health in the same year. Transformation coefficients have been published for various health status measures, but the coefficients were estimated from data for older adults (usually older than 65).</p>
<p><strong>Methods:</strong> Here, we used data from the Medical Expenditures Panel Survey (MEPS) to develop new age-specific coefficients for self-rated health, activities of daily living (ADL), instrumental activities of daily living (IADL), and the SF-12 physical function scale (PCS). We computed new age-specific transformations for ages 0 through 85 and compared the new transformations with published transformations for persons aged 65 and older.</p>
<p><strong>Results: </strong>The transformed values were different at different ages, The new transformed values for persons 65 and over were remarkably similar to the published results, calculated from different datasets.</p>
<p><strong>Conclusion: </strong>The new transformation equations should be particularly useful for studies involving persons younger than 65. For older persons, either the published equations or these new equations may be used.</p>

	]]>
</description>

<author>Ann M. Derleth et al.</author>


</item>






<item>
<title>Constructing a Confidence Interval for the Fraction Who Benefit from Treatment, Using Randomized Trial Data</title>
<link>http://biostats.bepress.com/jhubiostat/paper287</link>
<guid isPermaLink="true">http://biostats.bepress.com/jhubiostat/paper287</guid>
<pubDate>Mon, 05 Jun 2017 11:25:39 PDT</pubDate>
<description>
	<![CDATA[
	<p>The fraction who benefit from treatment is defined as the proportion of patients whose potential outcome under treatment is better than that under control. Statistical inference for this parameter is challenging since it is only partially identifiable, even in our context of a randomized trial. We propose and evaluate a new method for constructing a confidence interval for the fraction who benefit, when the outcome is ordinal-valued (with binary outcomes as a special case). This confidence interval procedure is proved to be pointwise consistent. Our method does not require any assumptions about the joint distribution of the potential outcomes, although it has the flexibility to incorporate a wide range of user-defined assumptions. A potential advantage of our approach is that, unlike existing confidence interval methods for partially identified parameters (such as m-out-of-n bootstrap and subsampling), we do not need to select m or the subsample size, which is generally a challenging problem. Our method is based on a stochastic optimization technique involving a second order, asymptotic approximation that, to the best of our knowledge, has not been applied to biomedical studies. This approximation leads to statistics that are solutions to quadratic programs, and so they can be computed efficiently using existing optimization tools. In all of our simulations, our method attains the nominal coverage probability or higher, and can have substantially narrower average width compared to the m-out-of-n bootstrap. We also apply our method to a completed trial data set of a new surgical intervention for severe stroke.</p>

	]]>
</description>

<author>Emily J. Huang et al.</author>


</item>






<item>
<title>ESTIMATING AUTOANTIBODY SIGNATURES TO DETECT AUTOIMMUNE DISEASE PATIENT SUBSETS</title>
<link>http://biostats.bepress.com/jhubiostat/paper286</link>
<guid isPermaLink="true">http://biostats.bepress.com/jhubiostat/paper286</guid>
<pubDate>Wed, 19 Apr 2017 13:29:16 PDT</pubDate>
<description>
	<![CDATA[
	<p>Autoimmune diseases are characterized by highly specific immune responses against molecules in self-tissues. Different autoimmune diseases are characterized by distinct immune responses, making autoantibodies useful for diagnosis and prediction. In many diseases, the targets of autoantibodies are incompletely defined. Although the technologies for autoantibody discovery have advanced dramatically over the past decade, each of these techniques generates hundreds of possibilities, which are onerous and expensive to validate. We set out to establish a method to greatly simplify autoantibody discovery, using a pre-filtering step to define subgroups with similar specificities based on migration of labeled, immunoprecipitated proteins on sodium dodecyl sulfate (SDS) gels and autoradiography [Gel Electrophoresis and band detection on Autoradiograms (GEA)]. Human recognition of patterns is not optimal when the patterns are complex or scattered across many samples. Multiple sources of errors - including irrelevant intensity differences and warping of gels - have challenged automation of pattern discovery from autoradiograms. In this paper, we address these limitations using a Bayesian hierarchical model with shrinkage priors for pattern alignment and spatial dewarping. The Bayesian model combines information from multiple gel sets and corrects spatial warping for coherent estimation of autoantibody signatures defined by presence or absence of a grid of landmark proteins. We show the preprocessing creates better separated clusters and improves the accuracy of autoantibody subset detection via hierarchical clustering. Finally, we demonstrate the utility of the proposed methods with GEA data from scleroderma patients.</p>

	]]>
</description>

<author>Zhenke Wu et al.</author>


</item>






<item>
<title>Estimating the Probability of Clonal Relatedness of Pairs of Tumors in Cancer Patients</title>
<link>http://biostats.bepress.com/mskccbiostat/paper33</link>
<guid isPermaLink="true">http://biostats.bepress.com/mskccbiostat/paper33</guid>
<pubDate>Wed, 22 Feb 2017 13:50:35 PST</pubDate>
<description>
	<![CDATA[
	<p>Next generation sequencing panels are being used increasingly in cancer research to study tumor evolution. A specific statistical challenge is to compare the mutational profiles in different tumors from a patient to determine the strength of evidence that the tumors are clonally related, i.e. derived from a single, founder clonal cell. The presence of identical mutations in each tumor provides evidence of clonal relatedness, although the strength of evidence from a match is related to how commonly the mutation is seen in the tumor type under investigation. This evidence must be weighed against the evidence in favor of independent tumors from non-matching mutations. In this article we frame this challenge in the context of diagnosis using a novel random effects model. In this way, by analyzing a set of tumor pairs, we can estimate the proportion of cases that are clonally related in the sample as well as the individual diagnostic probabilities for each case. The method is illustrated using data from a study to determine the clonal relationship of lobular carcinoma in situ with subsequent invasive breast cancers where each tumor in the pair was subjected to whole exome sequencing. The statistical properties of the method are evaluated using simulations, demonstrating that the key model parameters are estimated with only modest bias in small samples.</p>

	]]>
</description>

<author>Audrey Mauguen et al.</author>


</item>






<item>
<title>Evaluation of Progress Towards the UNAIDS 90-90-90 HIV Care Cascade: A Description of Statistical Methods Used in an Interim Analysis of the Intervention Communities in the SEARCH Study</title>
<link>http://biostats.bepress.com/ucbbiostat/paper357</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper357</guid>
<pubDate>Tue, 21 Feb 2017 12:34:58 PST</pubDate>
<description>
	<![CDATA[
	<p>WHO guidelines call for universal antiretroviral treatment, and UNAIDS has set a global target to virally suppress most HIV-positive individuals. Accurate estimates of population-level coverage at each step of the HIV care cascade (testing, treatment, and viral suppression) are needed to assess the effectiveness of "test and treat" strategies implemented to achieve this goal. The data available to inform such estimates, however, are susceptible to informative missingness: the number of HIV-positive individuals in a population is unknown; individuals tested for HIV may not be representative of those whom a testing intervention fails to reach, and HIV-positive individuals with a viral load measured may not be representative of those for whom no viral load is obtained.  We provide an in-depth description of the statistical methods (target parameters, assumptions, statistical estimands, and algorithms) used in an interim analysis of the intervention arm of the SEARCH Study (NCT01864603) to analyze progress towards the UNAIDS 90-90-90 target at study baseline and after one and two years. We describe the methods used to account for informative measurement in all analyses as well as for informative censoring in longitudinal analyses. We use targeted maximum likelihood estimation (TMLE) with Super Learning to generate semi-parametric efficient and double robust estimates of the care cascade  among a open cohort of prevalent HIV-positive adults and among a closed cohort of baseline HIV-positive adults. TMLE is also used to evaluate predictors of poor outcomes.</p>

	]]>
</description>

<author>Laura Balzer et al.</author>


</item>






<item>
<title>IMPROVING POWER IN GROUP SEQUENTIAL, RANDOMIZED TRIALS BY ADJUSTING FOR PROGNOSTIC BASELINE VARIABLES AND SHORT-TERM OUTCOMES</title>
<link>http://biostats.bepress.com/jhubiostat/paper285</link>
<guid isPermaLink="true">http://biostats.bepress.com/jhubiostat/paper285</guid>
<pubDate>Fri, 10 Feb 2017 08:54:28 PST</pubDate>
<description>
	<![CDATA[
	<p>In group sequential designs, adjusting for baseline variables and short-term outcomes can lead to increased power and reduced sample size. We derive formulas for the precision gain from such variable adjustment using semiparametric estimators for the average treatment effect, and give new results on what conditions lead to substantial power gains and sample size reductions. The formulas reveal how the impact of prognostic variables on the precision gain is modified by the number of pipeline participants, analysis timing, enrollment rate, and treatment effect heterogeneity, when the semiparametric estimator uses correctly specified models. Given set prognostic value of baseline variables and short-term outcomes within each arm, the precision gain is maximal when there is no treatment effect heterogeneity. In contrast, a purely predictive baseline variable, which only explains treatment effect heterogeneity but is marginally uncorrelated with the outcome, can lead to no precision gain. The theory is supported by simulation studies based on data from a trial of a new surgical intervention for treating stroke.</p>

	]]>
</description>

<author>Tianchen Qian et al.</author>


</item>






<item>
<title>IT&apos;S ALL ABOUT BALANCE: PROPENSITY SCORE MATCHING IN THE CONTEXT OF COMPLEX SURVEY DATA</title>
<link>http://biostats.bepress.com/jhubiostat/paper284</link>
<guid isPermaLink="true">http://biostats.bepress.com/jhubiostat/paper284</guid>
<pubDate>Wed, 08 Feb 2017 14:12:32 PST</pubDate>
<description>
	<![CDATA[
	<p>Many research studies aim to draw causal inferences using data from large, nationally representative survey samples, and many of these studies use propensity score matching to make those causal inferences as rigorous as possible given the non-experimental nature of the data. However, very few applied studies are careful about incorporating the survey design with the propensity score analysis, which may mean that the results don’t generate population inferences. This may be because few methodological studies examine how to best combine these methods. Furthermore, even fewer of the methodological studies incorporate different non-response mechanisms in their analysis. This study examines methods for how to handle survey weights in propensity score matching analyses of survey data, under diferent non-response mechanisms. Based on the results from Monte Carlo simulations implemented on synthetic data as well as a data based application we developed suggestions regarding the implementation of propensity score methods to make causal inferences relevant to the target population of a sample survey. Our main conclusions are: (1) whether the survey weights are incorporated in the estimation of the propensity score does not impact estimation of the population treatment effect, as long as good population balance is achieved across confounders, (2) survey weights must be taken into account in the outcome analysis and (3) transfer of survey weights (i.e., matched comparison units are assigned the sampling weight of the treated unit they have been matched to) can be benefcial under certain non-response mechanisms.</p>

	]]>
</description>

<author>David Lenis et al.</author>


</item>






<item>
<title>Adaptive Non-Inferiority Margins under Observable Non-Constancy</title>
<link>http://biostats.bepress.com/uwbiostat/paper417</link>
<guid isPermaLink="true">http://biostats.bepress.com/uwbiostat/paper417</guid>
<pubDate>Wed, 08 Feb 2017 09:21:58 PST</pubDate>
<description>
	<![CDATA[
	<p>A central assumption in the design and conduct of non-inferiority trials is that the active-control therapy will have the same degree of effectiveness in the planned non-inferiority trial as it had in the prior placebo-controlled trials used to define the non-inferiority margin. This is referred to as the `constancy' assumption. If the constancy assumption fails, the chosen non-inferiority margin is not valid and the study runs the risk of approving an inferior product or failing to approve a beneficial product. The constancy assumption cannot be validated in a trial without a placebo arm, and it is unlikely ever to be met completely. However, it is often the case that there exist strong, measurable predictors of constancy, such as dosing and adherence, and such predictors can be used to identify situations where the constancy assumption will likely fail. Here we propose a method for using measurable predictors of active-control effectiveness to specify non-inferiority margins targeted to the planned study population, and further use these predictors to adapt the non-inferiority margin at the end of the study. Population-specific margins can help avoid violations of the constancy assumption, and adaptive margins can help adjust for violations that will inevitably occur in real clinical trials, while at the same time maintain pre-specified levels of type I error and power.</p>

	]]>
</description>

<author>Brett S. Hanscom et al.</author>


</item>






<item>
<title>Quantifying the totality of treatment effect with multiple event-time observations in the presence of a terminal event from a comparative clinical study</title>
<link>http://biostats.bepress.com/harvardbiostat/paper212</link>
<guid isPermaLink="true">http://biostats.bepress.com/harvardbiostat/paper212</guid>
<pubDate>Wed, 25 Jan 2017 12:14:32 PST</pubDate>
<description>
	<![CDATA[
	<p>To evaluate the totality of one treatment's benefit/risk profile relative to an alternative treatment via a longitudinal comparative clinical study, the timing and occurrence of multiple clinical events are typically collected during the patient's followup. These multiple observations reflect the patient's disease progression/burden over time. The standard practice is to create a composite endpoint from the multiple outcomes, the timing of the occurrence of the first clinical event, to evaluate the treatment via the standard survival analysis techniques. By ignoring all events after the composite outcome, this type of assessment may not be ideal. Various parametric or semi-parametric procedures have been extensively discussed in the literature for the purposes of analyzing multiple event-time data. Many existing methods were developed based on extensive model assumptions. When the model assumptions are not plausible, the resulting inferences for the treatment effect may be misleading. In this article, we propose a simple, non-parametric inference procedure to quantify the treatment effect which has an intuitive, clinically meaningful interpretation. We use the data from a cardiovascular clinical trial for heart failure to illustrate the procedure. A simulation study is also conducted to evaluate the performance of the new proposal.</p>

	]]>
</description>

<author>Brian Claggett et al.</author>


</item>






<item>
<title>Mediation Analysis for Censored Survival Data under an Accelerated Failure Time Model</title>
<link>http://biostats.bepress.com/harvardbiostat/paper211</link>
<guid isPermaLink="true">http://biostats.bepress.com/harvardbiostat/paper211</guid>
<pubDate>Thu, 19 Jan 2017 09:52:46 PST</pubDate>
<description>
	<![CDATA[
	<p>Recent advances in causal mediation analysis have formalized conditions for estimating direct and indirect effects in various contexts. These approaches have been extended to a number of models for survival outcomes including accelerated failure time (AFT) models which are widely used in a broad range of health applications given their intuitive interpretation. In this setting, it has been suggested that under standard assumptions, the “difference” and “product” methods produce equivalent estimates of the indirect effect of exposure on the survival outcome. We formally show that these two methods may produce substantially different estimates in the presence of censoring or truncation, due to a form of model misspecification. Specifically, we establish that while the product method remains valid under standard assumptions in the presence of independent censoring, the difference method can be biased in the presence of such censoring whenever the error distribution of the AFT model fails to be collapsible upon marginalizing over the mediator. This will invariably be the case for most choices of mediator and outcome error distributions. A notable exception arises in case of normal mediator-normal outcome where we show consistency of both difference and product estimators in the presence of independent censoring. These results are confirmed in simulation studies and two data applications.</p>

	]]>
</description>

<author>Isabel Fulcher et al.</author>


</item>






<item>
<title>Efficiency of Two Sample Tests via the t-Mean Survival Time for Analyzing Event Time Observations</title>
<link>http://biostats.bepress.com/harvardbiostat/paper210</link>
<guid isPermaLink="true">http://biostats.bepress.com/harvardbiostat/paper210</guid>
<pubDate>Thu, 15 Dec 2016 14:28:01 PST</pubDate>
<description>
	<![CDATA[
	<p>In comparing two treatments with the event time observations, the hazard ratio (HR) estimate is routinely used to quantify the treatment difference. However, this model dependent estimate may be difficult to interpret clinically especially when the proportional hazards (PH) assumption is violated. An alternative estimation procedure for treatment efficacy based on the restricted means survival time or t-year mean survival time (t-MST) has been discussed extensively in the statistical and clinical literature. On the other hand, a statistical test 1 via the HR or its asymptotically equivalent counterpart, the logrank test, is asymptotically distribution-free. In this paper, we assess the relative efficiency of the hazard ratio and t-MST tests with respect to the statistical power using various PH and non-PH models under theoretical and practical settings. When the PH assumption is valid, the t-MST test performs almost as well as the HR test. For non-PH models, the t-MST test can substantially outperform its HR counter- part. On the other hand, the HR test can be powerful when the true difference of two survival functions is quite large at end of the study. Unfortunately, for this case, the HR estimate may not have a simple clinical interpretation for the treatment effect due to the violation of the PH assumption.</p>

	]]>
</description>

<author>Lu Tian et al.</author>


</item>






<item>
<title>Using Sensitivity Analyses for Unobserved Confounding to Address Covariate Measurement Error in Propensity Score Methods</title>
<link>http://biostats.bepress.com/jhubiostat/paper283</link>
<guid isPermaLink="true">http://biostats.bepress.com/jhubiostat/paper283</guid>
<pubDate>Fri, 18 Nov 2016 10:48:58 PST</pubDate>
<description>
	<![CDATA[
	<p>Propensity score methods are a popular tool to control for confounding in observational data, but their bias-reduction properties are threatened by covariate measurement error. There are few easy-to-implement methods to correct for such bias. We describe and demonstrate how existing sensitivity analyses for unobserved confounding---propensity score calibration, Vanderweele and Arah's bias formulas, and Rosenbaum's sensitivity analysis---can be adapted to address this problem. In a simulation study, we examined the extent to which these sensitivity analyses can correct for several measurement error structures: classical, systematic differential, and heteroscedastic covariate measurement error. We then apply these approaches to address covariate measurement error in estimating the association between depression and weight gain in a cohort of adults in Baltimore City. We recommend the use of Vanderweele and Arah's bias formulas and propensity score calibration (assuming it is adapted appropriately for the measurement error structure), as both approaches perform well for a variety of propensity score estimators and measurement error structures.</p>

	]]>
</description>

<author>Kara E. Rudolph et al.</author>


</item>






<item>
<title>Robust alternatives to ANCOVA for estimating the treatment effect via a randomized comparative study</title>
<link>http://biostats.bepress.com/harvardbiostat/paper209</link>
<guid isPermaLink="true">http://biostats.bepress.com/harvardbiostat/paper209</guid>
<pubDate>Tue, 15 Nov 2016 06:07:08 PST</pubDate>
<description>
	<![CDATA[
	<p>In comparing two treatments via a randomized clinical trial, the analysis of covari- ance technique is often utilized to estimate an overall treatment effect. The ANCOVA is generally perceived as a more efficient procedure than its simple two sample estima- tion counterpart. Unfortunately when the ANCOVA model is not correctly specified, the resulting estimator is generally not consistent especially when the model is nonlin- ear. Recently various nonparametric alternatives, such as the augmentation methods, to ANCOVA have been proposed to estimate the treatment effect by adjusting the covariates. However, the properties of these alternatives have not been studied in the presence of treatment allocation imbalance. In this paper, we take a different approach to explore how to improve the precision of the naive two-sample estimate even when the observed distributions of baseline covariates between two groups are dissimilar.</p>
<p>Specifically, we derive a bias-adjusted estimation procedure constructed from a condi- tional inference principle via relevant ancillary statistics from the observed covariates. This estimator is shown to be asymptotically equivalent to an augmentation estimator under the conditional setting. We utilize the data from a clinical trial for evaluating a combination treatment of cardiovascular diseases to illustrate our findings.</p>

	]]>
</description>

<author>Fei Jiang et al.</author>


</item>






<item>
<title>Confidence Intervals for Heritability via Haseman-Elston Regression</title>
<link>http://biostats.bepress.com/uwbiostat/paper416</link>
<guid isPermaLink="true">http://biostats.bepress.com/uwbiostat/paper416</guid>
<pubDate>Wed, 09 Nov 2016 09:12:48 PST</pubDate>
<description>
	<![CDATA[
	<p>Heritability is the proportion of phenotypic variance in a population that is attributable to individual genotypes. Heritability is considered an important measure in both evolutionary biology and in medicine, and is routinely estimated and reported in genetic epidemiology studies. In population-based genome-wide association studies (GWAS), mixed models are used to estimate variance components, from which a heritability estimate is obtained. The estimated heritability is the proportion of the model's total variance that is due to the genetic relatedness matrix (kinship measured from genotypes). Current practice is to use bootstrapping, which is slow, or normal asymptotic approximation to estimate the precision of the heritability estimate; however, this approximation fails to hold near the boundaries of the parameter space or when the sample size is small. In this paper we propose to estimate variance components via a Haseman-Elston regression, find the asymptotic distribution of the variance components and proportions of variance, and use them to construct  confidence intervals (CIs). Our method is further developed to estimate unbiased variance components and construct CIs by meta-analyzing information from multiple studies. We demonstrate our approach on data from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL).</p>

	]]>
</description>

<author>Tamar Sofer</author>


</item>






<item>
<title>Classification with Ultrahigh-Dimensional Features</title>
<link>http://biostats.bepress.com/umichbiostat/paper122</link>
<guid isPermaLink="true">http://biostats.bepress.com/umichbiostat/paper122</guid>
<pubDate>Mon, 07 Nov 2016 08:56:44 PST</pubDate>
<description>
	<![CDATA[
	
	]]>
</description>

<author>Yanming Li et al.</author>


</item>






<item>
<title>Censoring Unbiased Regression Trees and Ensembles</title>
<link>http://biostats.bepress.com/jhubiostat/paper282</link>
<guid isPermaLink="true">http://biostats.bepress.com/jhubiostat/paper282</guid>
<pubDate>Mon, 31 Oct 2016 09:41:49 PDT</pubDate>
<description>
	<![CDATA[
	<p>This paper proposes a novel approach to building regression trees and ensemble learning in survival analysis. By first extending the theory of censoring unbiased transformations, we construct observed data estimators of full data loss functions in cases where responses can be right censored. This theory is used to construct two specific classes of methods for building regression trees and regression ensembles that respectively make use of Buckley-James and doubly robust estimating equations for a given full data risk function. For the particular case of squared error loss, we further show how to implement these algorithms using existing software (e.g., CART, random forests) by making use of a related form of response imputation. Comparisons of these methods to existing ensemble procedures for predicting survival probabilities are provided in both simulated settings and through applications to four datasets. It is shown that these new methods either improve upon, or remain competitive with, existing implementations of random survival forests, conditional inference forests, and recursively imputed survival trees.</p>

	]]>
</description>

<author>Jon Arni Steingrimsson et al.</author>


</item>






<item>
<title>Doubly-robust Nonparametric Inference on the Average Treatment Effect</title>
<link>http://biostats.bepress.com/ucbbiostat/paper356</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper356</guid>
<pubDate>Tue, 18 Oct 2016 16:21:35 PDT</pubDate>
<description>
	<![CDATA[
	<p>Doubly-robust estimators are widely used to draw inference about the average effect of a treatment. Such estimators are consistent for the effect of interest if either one of two nuisance parameters is consistently estimated. However, if flexible, data-adaptive estimators of these nuisance parameters are used, double-robustness does not readily extend to inference. We present a general theoretical study of the behavior of doubly-robust estimators of an average treatment effect when one of the nuisance parameters is inconsistently estimated. We contrast different approaches for constructing such estimators and investigate the extent to which they may be modified to also allow doubly-robust inference. We find that while targeted maximum likelihood estimation can be used to solve this problem very naturally, common alternative frameworks appear to be inappropriate for this purpose. We provide a theoretical study and a numerical evaluation of the alternatives considered. Our simulations highlight the need and usefulness of these approaches in practice, while our theoretical developments have broad implications for the construction of estimators that permit doubly-robust inference in other problems.</p>

	]]>
</description>

<author>David Benkeser et al.</author>


</item>






<item>
<title>Online Cross-Validation-Based Ensemble Learning</title>
<link>http://biostats.bepress.com/ucbbiostat/paper355</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper355</guid>
<pubDate>Tue, 18 Oct 2016 16:11:35 PDT</pubDate>
<description>
	<![CDATA[
	<p>Online estimators update a current estimate with a new incoming batch of data without having to revisit past data thereby providing streaming estimates that are scalable to big data. We develop flexible, ensemble-based online estimators of an infinite-dimensional target parameter, such as a regression function, in the setting where data are generated sequentially by a common conditional data distribution given summary measures of the past. This setting encompasses a wide range of time-series models and as special case, models for independent and identically distributed data. Our estimator considers a large library of candidate online estimators and uses online cross-validation to identify the algorithm with the best performance. We show that by basing estimates on the cross-validation-selected algorithm, we are asymptotically guaranteed to perform as well as the true, unknown best-performing algorithm. We provide extensions of this approach including online estimation of the optimal ensemble of candidate online estimators. We illustrate the practical performance of our methods using simulations and a real data example where we make streaming predictions of infectious disease incidence using data from a large database.</p>

	]]>
</description>

<author>David Benkeser et al.</author>


</item>






<item>
<title>Moving beyond the conventional stratified analysis to estimate an overall treatment efficacy with the data from a comparative randomized clinical study</title>
<link>http://biostats.bepress.com/harvardbiostat/paper208</link>
<guid isPermaLink="true">http://biostats.bepress.com/harvardbiostat/paper208</guid>
<pubDate>Wed, 05 Oct 2016 07:23:23 PDT</pubDate>
<description>
	<![CDATA[
	
	]]>
</description>

<author>Lu Tian et al.</author>


</item>





</channel>
</rss>
