<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0">
<channel>
<title>Collection of Biostatistics Research Archive</title>
<copyright>Copyright (c) 2013 COBRA All rights reserved.</copyright>
<link>http://biostats.bepress.com</link>
<description>Recent documents in Collection of Biostatistics Research Archive</description>
<language>en-us</language>
<lastBuildDate>Sat, 25 May 2013 01:45:26 PDT</lastBuildDate>
<ttl>3600</ttl>


	
		
	

	
		
	

	
		
	

	
		
	

	
		
	







<item>
<title>A versatile test for equality of two survival functions based on weighted differences of Kaplan-Meier curves</title>
<link>http://biostats.bepress.com/harvardbiostat/paper159</link>
<guid isPermaLink="true">http://biostats.bepress.com/harvardbiostat/paper159</guid>
<pubDate>Fri, 24 May 2013 12:58:11 PDT</pubDate>
<description>
	<![CDATA[
	<p>With censored event time observations, the logrank test is the most popular tool for testing the equality of two underlying survival distributions. Although this test is asymptotically distribution-free, it may not be powerful when the proportional hazards assumption is violated. Various other novel testing procedures have been proposed, which generally are derived by assuming a class of specific alternative hypotheses with respect to the hazard functions. The test considered by Pepe and Fleming (1989) is based on a linear combination of weighted differences of two Kaplan-Meier curves over time and is a natural tool to assess the difference of two survival functions directly. In this article, we take a similar approach, but choose weights which are proportional to the observed standardized difference of the estimated survival curves at each time point. The new proposal automatically makes weighting adjustments empirically. The new test statistic is aimed at a one-sided general alternative hypothesis, and is distributed with a short right tail under the null hypothesis, but with a heavy tail under the alternative. The results from extensive numerical studies demonstrate that the new procedure performs well under various general alternatives. The survival data from a recent cancer comparative study are utilized for illustrating the implementation of the process.</p>

	]]>
</description>

<author>Hajime Uno et al.</author>


</item>






<item>
<title>Subsemble: An Ensemble Method for Combining Subset-Specific Algorithm Fits</title>
<link>http://biostats.bepress.com/ucbbiostat/paper313</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper313</guid>
<pubDate>Thu, 23 May 2013 09:17:02 PDT</pubDate>
<description>
	<![CDATA[
	<p>Ensemble methods using the same underlying algorithm trained on different subsets of observations have recently received increased attention as practical prediction tools for massive datasets.  We propose Subsemble: a general subset ensemble prediction method, which can be used for small, moderate, or large datasets.  Subsemble partitions the full dataset into subsets of observations, fits a specified underlying algorithm on each subset, and uses a clever form of V-fold cross-validation to output a prediction function that combines the subset-specific fits.  We give an oracle result that provides a theoretical performance guarantee for Subsemble.  Through simulations, we demonstrate that Subsemble can be a beneficial tool for small to moderate sized datasets, and often has better prediction performance than the underlying algorithm fit just once on the full dataset.  We also describe how to include Subsemble as a candidate in a SuperLearner library, providing a practical way to evaluate the performance of Subsemble relative to the underlying algorithm fit just once on the full dataset.</p>

	]]>
</description>

<author>Stephanie Sapp et al.</author>


</item>






<item>
<title>Targeted Maximum Likelihood Estimation for Dynamic and Static Longitudinal Marginal Structural Working Models</title>
<link>http://biostats.bepress.com/ucbbiostat/paper312</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper312</guid>
<pubDate>Wed, 22 May 2013 09:43:49 PDT</pubDate>
<description>
	<![CDATA[
	<p><blockquote>This paper presents a novel targeted maximum likelihood estimator (TMLE) estimator for the parameters of longitudinal static and dynamic marginal structural models.We consider a longitudinal data structure consisting of baseline covariates, time-dependent intervention nodes,  intermediate time-dependent covariates, and a possibly time dependent outcome. The intervention nodes at each time point can include a binary treatment as well as a right-censoring indicator. Given a class of dynamic or static interventions, a marginal structural model is used to model the mean of the intervention specific counterfactual outcome  as a function of the intervention and time point.Because the true shape of this function is rarely known, the marginal structural model is used as a working model. The causal quantity of interest is defined as the projection of the true  function onto this working model. We introduce a new pooled  TMLE for the parameters of such marginal structural working models, and compare this estimator to a recently proposed  stratified TMLE that is based on estimating the intervention-specific mean separately for each intervention of interest. The performance of the  pooled TMLE is compared to the performance of the stratified TMLE and the performance of inverse probability weighted estimators using simulations. Concepts are illustrated using an example in which the aim is to estimate the causal effect of delayed switch following immunological failure of first line antiretroviral therapy among HIV infected patients. Data from the International epidemiological Databases to Evaluate AIDS, Southern Africa are analyzed to investigate this question using both TMLE and IPW estimators. Our results demonstrate  practical advantages over an IPW estimator for working marginal structural models for survival, as well as  cases in which the pooled TMLE is superior to its stratified counterpart.</blockquote></p>

	]]>
</description>

<author>Maya L. Petersen et al.</author>


</item>






<item>
<title>VARYING INDEX COEFFICIENT MODELS</title>
<link>http://biostats.bepress.com/umichbiostat/paper100</link>
<guid isPermaLink="true">http://biostats.bepress.com/umichbiostat/paper100</guid>
<pubDate>Tue, 21 May 2013 14:59:50 PDT</pubDate>
<description>
	<![CDATA[
	<p>It has been a long history of utilizing interactions in regression analysis to investigate interactive effects of covariates on response variables. In this paper we aim to address two kinds of new challenges resulted from the inclusion of such high-order effects in the regression model for complex data. The first kind arises from a situation where interaction effects of individual covariates are weak but those of combined covariates are strong, and the other kind pertains to the presence of nonlinear interactive effects. Generalizing the single index coefficient regression model (Xia and Li, 1999), we propose a new class of semiparametric models with varying index coefficients, which enables us to model and assess nonlinear interaction effects between grouped covariates on the response variable. As a result, most of the existing semiparametric regression models are special cases of our proposed models. We develop a numerically stable and computationally fast estimation procedure utilizing both profile least squares method and local fitting. We establish both estimation consistency and asymptotic normality for the proposed estimators of index coefficients as well as the oracle property for the nonparametric function estimator. In addition, a generalized likelihood ratio test is provided to test for the existence of interaction effects or the existence of nonlinear interaction effects. Our models and estimation methods are illustrated by both simulation studies and an analysis of body fat dataset.</p>

	]]>
</description>

<author>Shujie Ma et al.</author>


</item>






<item>
<title>Surrogacy Assessment Using Principal Stratification When Surrogate and Outcome Measures are Multivariate Normal</title>
<link>http://biostats.bepress.com/umichbiostat/paper99</link>
<guid isPermaLink="true">http://biostats.bepress.com/umichbiostat/paper99</guid>
<pubDate>Tue, 21 May 2013 14:59:44 PDT</pubDate>
<description>
	<![CDATA[
	
	]]>
</description>

<author>Anna Conlon et al.</author>


</item>






<item>
<title>Balancing Score Adjusted Targeted Minimum Loss-based Estimation</title>
<link>http://biostats.bepress.com/ucbbiostat/paper311</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper311</guid>
<pubDate>Thu, 16 May 2013 15:20:03 PDT</pubDate>
<description>
	<![CDATA[
	<p>Adjusting for a balancing score is sufficient for bias reduction when estimating causal effects including the average treatment effect and effect among the treated.  Estimators that adjust for the propensity score in a nonparametric way, such as matching on an estimate of the propensity score, can be consistent when the estimated propensity score is not consistent for the true propensity score but converges to some other balancing score. We call this property the balancing score property, and discuss a class of estimators that have this property. We introduce a targeted minimum loss-based estimator (TMLE) for a treatment specific mean with the balancing score property that is additionally locally efficient and doubly robust. We investigate the new estimator's performance relative to other estimators, including another TMLE, a propensity score matching estimator, an inverse probability of treatment weighted estimator, and a regression based estimator  in simulation studies.</p>

	]]>
</description>

<author>Samuel D. Lendle et al.</author>


</item>






<item>
<title>Estimating Effects on Rare Outcomes: Knowledge is Power</title>
<link>http://biostats.bepress.com/ucbbiostat/paper310</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper310</guid>
<pubDate>Mon, 13 May 2013 11:23:25 PDT</pubDate>
<description>
	<![CDATA[
	<p>Understanding the etiology of rare cancers, perinatal mortality, international conflicts or natural disasters can have profound impacts on population health and well-being. However, when the outcome of interest occurs in 5% or less of the population, effect estimation can be particularly challenging. To increase statistical power and the stability of results, researchers commonly oversample cases or events. However, the study of rare outcomes need not be limited to case-control settings. Building on the work of Gruber and van der Laan (2010), we construct a new targeted minimum loss-based estimator (TMLE) for estimating the effect of an exposure or treatment on a rare outcome. We focus on the average treatment effect and statistical models incorporating known bounds on the conditional probability of the outcome. The proposed TMLE improves upon existing methods in several ways. First, the substitution estimator respects the global knowledge in the statistical model. Second, the proposed TMLE achieves high rates of convergence in that it solves the efficient influence curve equation despite the paucity of events. Third, the new TMLE achieves lower bias and variance than the standard TMLE in finite samples. Fourth, the proposed estimator is powered to detect effect sizes on the order of the prevalence of the outcome, even in small samples. Fifth, the new TMLE allows for more aggressive estimation of the conditional mean outcome, while guaranteeing the fitted probabilities are within the model bounds and maintaining valid Type I error rates. Finally, the methodology permits the examination rare outcomes in prospective studies, randomized trials or case-control designs. These advantages are illustrated with Monte Carlo simulations.</p>

	]]>
</description>

<author>Laura B. Balzer et al.</author>


</item>






<item>
<title>Mixtures of Receiver Operating Characteristic Curves</title>
<link>http://biostats.bepress.com/mskccbiostat/paper27</link>
<guid isPermaLink="true">http://biostats.bepress.com/mskccbiostat/paper27</guid>
<pubDate>Tue, 07 May 2013 07:10:44 PDT</pubDate>
<description>
	<![CDATA[
	<p><strong>Rationale and Objectives:</strong> ROC curves are ubiquitous in the analysis of imaging metrics as markers of both diagnosis and prognosis. While empirical estimation of ROC curves remains the most popular method, there are several reasons to consider smooth estimates based on a parametric model.</p>
<p><strong>Materials and Methods:</strong> A mixture model is considered for modeling the distribution of the marker in the diseased population motivated by the biological observation that there is more heterogeneity in the diseased population than there is in the normal one. It is shown that this model results in an analytically tractable ROC curve which is itself a mixture of ROC curves.</p>
<p><strong>Results:</strong> The use of CK-BB isoenzyme in diagnosis of severe head trauma is used as an example. ROC curves are fit using the direct binormal method, ROCKIT and the Box-Cox transformation as well as the proposed mixture model. The mixture model generates an ROC curve that is much closer to the empirical one than the other methods considered.</p>
<p><strong>Conclusions:</strong> Mixtures of ROC curves can be helpful in fitting smooth ROC curves in datasets where the diseased population has higher variability than can be explained by a single distribution.</p>

	]]>
</description>

<author>Mithat Gonen</author>


</item>






<item>
<title>Visualizing Longitudinal Data with Dropouts</title>
<link>http://biostats.bepress.com/mskccbiostat/paper26</link>
<guid isPermaLink="true">http://biostats.bepress.com/mskccbiostat/paper26</guid>
<pubDate>Tue, 07 May 2013 07:10:42 PDT</pubDate>
<description>
	<![CDATA[
	<p>A triangle plot is proposed to display longitudinal data with dropouts. The triangle plot is a tool of data visualization that can also serve as a graphical check for informativeness of the dropout process. There are similarities between the lasagna plot and the triangle plot but the explicit use of dropout time as an axis is an advantage of the triangle plot over the more commonly used graphical strategies for longitudinal data. It is possible to interpret the triangle plot as a trellis plot 1 which gives rise to several extensions such as the triangle histogram and the triangle boxplot. R code is available to streamline the use of the triangle plot in practice.</p>

	]]>
</description>

<author>Mithat Gonen</author>


</item>






<item>
<title>An Application Of Machine Learning Methods To The Derivation Of Exposure-Response Curves For Respiratory Outcomes</title>
<link>http://biostats.bepress.com/ucbbiostat/paper309</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper309</guid>
<pubDate>Wed, 01 May 2013 10:18:22 PDT</pubDate>
<description>
	<![CDATA[
	<p>Analyses of epidemiological studies of the association between short-term changes in air pollution and health outcomes have not sufficiently discussed the degree to which the statistical models chosen for these analyses reflect what is actually known about the true data-generating distribution. We present a method to estimate population-level ambient air pollution (NO2) exposure-health (wheeze in children with asthma) response functions that is not dependent on assumptions about the data-generating function that underlies the observed data and which focuses on a specific scientific parameter of interest (the marginal adjusted association of exposure on probability of wheeze, over a grid of possible exposure values). We show that this approach provides a more nuanced summary of the data than more typical statistical methods used in air pollution epidemiology and epidemiological studies in general.</p>

	]]>
</description>

<author>Ekaterina Eliseeva et al.</author>


</item>






<item>
<title>Más-o-menos: A Simple Sign Averaging Method for Discrimination in Genomic Data Analysis</title>
<link>http://biostats.bepress.com/harvardbiostat/paper158</link>
<guid isPermaLink="true">http://biostats.bepress.com/harvardbiostat/paper158</guid>
<pubDate>Wed, 01 May 2013 09:27:17 PDT</pubDate>
<description>
	<![CDATA[
	
	]]>
</description>

<author>Sihai Dave Zhao et al.</author>


</item>






<item>
<title>Structured Functional Principal Component Analysis</title>
<link>http://biostats.bepress.com/jhubiostat/paper255</link>
<guid isPermaLink="true">http://biostats.bepress.com/jhubiostat/paper255</guid>
<pubDate>Tue, 30 Apr 2013 09:35:21 PDT</pubDate>
<description>
	<![CDATA[
	<p>Motivated by modern observational studies, we introduce a class of functional models that expands nested and crossed designs. These models account for the natural inheritance of correlation structure from sampling design in studies where the fundamental sampling unit is a function or image. Inference is based on functional quadratics and their relationship with the underlying covariance structure of the latent processes. A computationally fast and scalable estimation procedure is developed for ultra-high dimensional data. Methods are illustrated in three examples: high-frequency accelerometer data for daily activity, pitch linguistic data for phonetic analysis, and EEG data for studying electrical brain activity during sleep.</p>

	]]>
</description>

<author>Haochang Shou et al.</author>


</item>






<item>
<title>PENALIZED FUNCTION-ON-FUNCTION REGRESSION</title>
<link>http://biostats.bepress.com/jhubiostat/paper254</link>
<guid isPermaLink="true">http://biostats.bepress.com/jhubiostat/paper254</guid>
<pubDate>Tue, 23 Apr 2013 09:50:30 PDT</pubDate>
<description>
	<![CDATA[
	<p>We propose a general framework for smooth regression of a functional response on one or multiple functional predictors. Using the mixed model representation of penalized regression expands the scope of function on function regression to many realistic scenarios. In particular, the approach can accommodate a densely or sparsely sampled functional response as well as multiple functional predictors that are observed:  1) on the same or different domains than the functional response; 2) on a dense or sparse grid; and 3) with or without noise. It also allows for seamless integration of continuous or categorical covariates and provides approximate confidence intervals as a by-product of the mixed model inference. The proposed methods are accompanied by easy to use and robust software implemented in the pffr function of the R package refund.  Methodological developments are general, but were inspired by and applied to a Diffusion Tensor Imaging (DTI) brain tractography dataset.</p>

	]]>
</description>

<author>Andrada E. Ivanescu et al.</author>


</item>






<item>
<title>OPTIMAL TESTS OF TREATMENT EFFECTS FOR THE OVERALL POPULATION AND TWO SUBPOPULATIONS IN RANDOMIZED TRIALS, USING SPARSE LINEAR PROGRAMMING</title>
<link>http://biostats.bepress.com/jhubiostat/paper253</link>
<guid isPermaLink="true">http://biostats.bepress.com/jhubiostat/paper253</guid>
<pubDate>Tue, 23 Apr 2013 06:49:10 PDT</pubDate>
<description>
	<![CDATA[
	<p>We propose new, optimal methods for analyzing randomized trials, when it is suspected that treatment effects may differ in two predefined subpopulations. Such sub-populations could be defined by a biomarker or risk factor measured at baseline. The goal is to simultaneously learn which subpopulations benefit from an experimental treatment, while providing strong control of the familywise Type I error rate. We formalize this as a multiple testing problem and show it is computationally infeasible to solve using existing techniques. Our solution involves a novel approach, in which we first transform the original multiple testing problem into a large, sparse linear program. We then solve this problem using advanced optimization techniques. This general method can solve a variety of multiple testing problems and decision theory problems related to optimal trial design, for which no solution was previously available. In particular, we construct new multiple testing procedures that satisfy minimax and Bayes optimality criteria. For a given optimality criterion, our new approach yields the optimal tradeoff between power to detect an effect in the overall population versus power to detect effects in subpopulations. We demonstrate our approach in examples motivated by two randomized trials of new treatments for HIV.</p>

	]]>
</description>

<author>Michael Rosenblum et al.</author>


</item>






<item>
<title>Feature Elimination in Empirical Risk Minimization and Support Vector Machines</title>
<link>http://biostats.bepress.com/uncbiostat/art37</link>
<guid isPermaLink="true">http://biostats.bepress.com/uncbiostat/art37</guid>
<pubDate>Thu, 18 Apr 2013 13:16:51 PDT</pubDate>
<description>
	<![CDATA[
	<p>We develop an approach for feature elimination in empirical risk minimization and support vector machines, based on recursive elimination of features. We present theoretical properties of this method and show that this is uniformly consistent in finding the correct feature space under certain generalized assumptions. We present case studies to show that the assumptions are met in most practical situations and also present simulation studies to demonstrate performance of the proposed approach.</p>

	]]>
</description>

<author>Sayan Dasgupta et al.</author>


</item>






<item>
<title>Vertically Shifted Mixture Models for Clustering Longitudinal Data by Shape</title>
<link>http://biostats.bepress.com/ucbbiostat/paper308</link>
<guid isPermaLink="true">http://biostats.bepress.com/ucbbiostat/paper308</guid>
<pubDate>Mon, 25 Mar 2013 08:31:49 PDT</pubDate>
<description>
	<![CDATA[
	<p>Longitudinal studies play a prominent role in health, social and behavioral sciences as well as in the biological sciences, economics, and marketing. By following subjects over time, temporal changes in an outcome of interest can be directly observed and studied. An important question concerns the existence of distinct trajectory patterns. One way to determine these distinct patterns is through cluster analysis, which seeks to separate objects (subjects, patients, observational units) into homogeneous groups. Many methods have been adapted for longitudinal data, but almost all of them fail to explicitly group trajectories according to distinct pattern shapes.    To fulfill the need for clustering based explicitly on shape, we propose vertically shifting the data by subtracting the subject-specific mean directly removes the level prior to fitting a mixture modeling. This non-invertible transformation can result in singular covariance matrixes, which makes mixture model estimation difficult. Despite the challenges, this method outperforms existing clustering methods in a simulation study.</p>

	]]>
</description>

<author>Brianna C. Heggeseth et al.</author>


</item>






<item>
<title>Efficient Estimation of Risk Ratios From Clustered Binary Data</title>
<link>http://biostats.bepress.com/harvardbiostat/paper157</link>
<guid isPermaLink="true">http://biostats.bepress.com/harvardbiostat/paper157</guid>
<pubDate>Thu, 21 Mar 2013 07:15:25 PDT</pubDate>
<description>
	<![CDATA[
	
	]]>
</description>

<author>Matthew Cefalu et al.</author>


</item>






<item>
<title>The Net Reclassification Index (NRI): a Misleading Measure of Prediction Improvement with Miscalibrated or Overfit Models</title>
<link>http://biostats.bepress.com/uwbiostat/paper392</link>
<guid isPermaLink="true">http://biostats.bepress.com/uwbiostat/paper392</guid>
<pubDate>Tue, 19 Mar 2013 09:25:13 PDT</pubDate>
<description>
	<![CDATA[
	<p>The Net Reclassification Index (NRI) is a very popular measure for evaluating the improvement in prediction performance gained by adding a marker to a set of baseline predictors. However, the statistical properties of this novel measure have not been explored in depth. We demonstrate the alarming result that the NRI statistic calculated on a large test dataset using risk models derived from a training set is likely to be positive even when the new marker has no predictive information. A related theoretical example is provided in which a miscalibrated risk model that includes an uninformative marker is proven to erroneously yield a positive NRI. Some insight into this phenomenon is derived from Hilden and Gerds (2013) who noted that the NRI statistic does not function as a proper scoring rule. Since large values for the NRI statistic may simply be due to use of miscalibrated risk models we suggest caution in using the NRI as the basis for marker evaluation. Other measures of prediction performance improvement, such as measures derived from the ROC curve, the net benefit function and the Brier score, cannot be large due to model miscalibration and may be preferred for that reason.</p>

	]]>
</description>

<author>Margaret Pepe et al.</author>


</item>






<item>
<title>Latent Supervised Learning</title>
<link>http://biostats.bepress.com/uncbiostat/art36</link>
<guid isPermaLink="true">http://biostats.bepress.com/uncbiostat/art36</guid>
<pubDate>Mon, 18 Mar 2013 05:13:29 PDT</pubDate>
<description>
	<![CDATA[
	<p>A new machine learning task is introduced, called latent supervised learning, where the goal is to learn a binary classifier from <em>continuous</em> training labels which serve as surrogates for the unobserved class labels. A specific model is investigated where the surrogate variable arises from a two-component Gaussian mixture with unknown means and variances, and the component membership is determined by a hyperplane in the covariate space. The estimation of the separating hyperplane and the Gaussian mixture parameters forms what shall be referred to as the change-line classification problem. A data-driven sieve maximum likelihood estimator for the hyperplane is proposed, which in turn can be used to estimate the parameters of the Gaussian mixture. The estimator is shown to be consistent. Simulations as well as empirical data show the estimator has high classification accuracy.</p>

	]]>
</description>

<author>Susan Wei et al.</author>


</item>






<item>
<title>Cross-Validation for Nonlinear Mixed Effects Models</title>
<link>http://biostats.bepress.com/uncbiostat/art35</link>
<guid isPermaLink="true">http://biostats.bepress.com/uncbiostat/art35</guid>
<pubDate>Thu, 14 Mar 2013 06:15:36 PDT</pubDate>
<description>
	<![CDATA[
	<p>Cross-validation is frequently used for model selection in a variety of applications. However, it is difficult to apply cross-validation to mixed effects models (including the nonlinear mixed effects models) due to the fact that cross-validation requires “out-of-sample” predictions of the outcome variable, which cannot be easily calculated when random effects are present.We describe two novel variants of cross-validation that can be applied to nonlinear mixed effects models. One variant, where out-of-sample predictions are based on post hoc estimates of the random effects, can be used to select the overall structural model. Another variant, where cross-validation seeks to minimize the estimated random effects rather than the estimated residuals, can be used to select covariates to include in the model.We show that these methods produce accurate results in a variety of simulated data sets and apply them to two publicly available population pharmacokinetic data sets.</p>

	]]>
</description>

<author>Emily Colby et al.</author>


</item>





</channel>
</rss>
