<?xml version="1.0" encoding="iso-8859-1" ?>
<rss version="2.0">
<channel>
<title>Collection of Biostatistics Research Archive</title>
<copyright>Copyright (c) 2009  All rights reserved.</copyright>
<link>http://biostats.bepress.com</link>
<description>Recent documents in Collection of Biostatistics Research Archive</description>
<language>en-us</language>
<lastBuildDate>Sat, 07 Nov 2009 05:23:25 PST</lastBuildDate>
<ttl>3600</ttl>





<item>
<title>Quasi-Least Squares with Mixed Linear Correlation Structures</title>
<link>http://biostats.bepress.com/upennbiostat/papers/art33</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/papers/art33</guid>
<pubDate>Thu, 08 Oct 2009 12:56:48 PDT</pubDate>
<description>Quasi-least squares (QLS) is a two-stage computational approach for estimation of the correlation parameters in the framework of generalized estimating equations (GEE). We prove two general results for the class of mixed linear correlation structures: namely, that the stage one QLS estimate of the correlation parameter always exists and is feasible (yields a positive definite estimated correlation matrix) for any correlation structure, while the stage two estimator exists and is unique (and therefore consistent) with probability one, for the class of mixed linear correlation structures. Our general results justify the implementation of QLS for particular members of the class of mixed linear correlation structures that are appropriate for the analysis of familial data, with families that vary in size and composition. We describe the familial structures and implement them in an analysis of optical spherical values in the Old Order Amish (OOA). For the OOA analysis, we show that we would suffer a substantial loss in efficiency, if the familial structures were the true structures, but were misspecified as simpler approximate structures. We also provide software for implementation of the familial structures in R.  Key-Words: Quasi-least squares; linear correlation structure; mixed correlation structure; familial data.</description>

<author>Jichun Xie</author>


<category>Statistical Theory and Methods</category>

</item>


<item>
<title>Composite Likelihood EM Algorithm with Applications to Multivariate Hidden Markov Model </title>
<link>http://biostats.bepress.com/cobra/ps/art61</link>
<guid isPermaLink="true">http://biostats.bepress.com/cobra/ps/art61</guid>
<pubDate>Thu, 01 Oct 2009 19:28:26 PDT</pubDate>
<description>The method of composite likelihood is useful to deal with estimation and inference in parametric models with high-dimensional data, where the full likelihood approach renders to intractable computational complexity. We develop an extension of the EM algorithm in the framework of composite likelihood estimation in the presence of missing data or latent variables. We establish three key theoretical properties of the composite likelihood EM (CLEM) algorithm, including the ascent property, the algorithmic convergence and the convergence rate. The proposed method is applied to estimate the transition probabilities in multivariate hidden Markov model. Simulation studies are presented to demonstrate the empirical performance of the method. A time-course microarray data is analyzed using the proposed CLEM method to dissect the underlying gene regulatory network.</description>

<author>Xin Gao</author>


<category>General Biostatistics</category>

</item>


<item>
<title>A Permutation-Based Multiple Testing Method for Time-Course Microarray Experiments</title>
<link>http://biostats.bepress.com/dukebiostat/papers/art6</link>
<guid isPermaLink="true">http://biostats.bepress.com/dukebiostat/papers/art6</guid>
<pubDate>Mon, 24 Aug 2009 07:33:05 PDT</pubDate>
<description>A Permutation-Based Multiple Testing Method for Time-Course Microarray Experiments
Background: Time-course microarray experiments are widely used to study the temporal profiles of gene
expression. Storey et al. (2005) developed a method for analyzing time-course microarray studies that can be
applied to discovering genes whose expression trajectories change over time within a single biological group, or
those that follow different time trajectories among multiple groups. They estimated the expression trajectories
of each gene using natural cubic splines under the null (no time-course) and alternative (time-course)
hypotheses, and used a goodness of fit test statistic the quantify the discrepancy. The null distribution of the
statistic was approximated through a bootstrap method. Gene expression levels in microarray data are often
complicatedly correlated. An accurate type I error control adjusting for multiple testing requires the joint null
distribution of test statistics for a large number of genes. For this purpose, permutation methods have been
widely used because of computational ease and their intuitive interpretation.Results: In this paper, we propose a permutation-based multiple testing procedure based on the test statistic
used by Storey et al. (2005). We also propose an efficient computation algorithm. Extensive simulations are
conducted to investigate the performance of the permutation-based multiple testing procedure. The application
of the proposed method is illustrated using the Caenorhabditis elegans dauer developmental data.Conclusions: Our method is computationally efficient and applicable for identifying genes whose expression
levels are time-dependent in a single biological group and for identifying the genes for which the time-profile
depends on the group in a multi-group setting.</description>

<author>Insuk Sohn</author>


<category>Computational Biology/Bioinformatics</category>

<category>Longitudinal Data Analysis and Time Series</category>

<category>Microarrays</category>

<category>Multivariate Analysis</category>

<category>Survival Analysis</category>

</item>


<item>
<title>Shrinkage Estimation of Expression Fold Change As an Alternative to Testing Hypotheses of Equivalent Expression</title>
<link>http://biostats.bepress.com/cobra/ps/art60</link>
<guid isPermaLink="true">http://biostats.bepress.com/cobra/ps/art60</guid>
<pubDate>Sat, 22 Aug 2009 15:07:34 PDT</pubDate>
<description>Research on analyzing microarray data has focused on the problem of identifying differentially expressed genes to the neglect of the problem of how to integrate evidence that a gene is differentially expressed with information on the extent of its differential expression. Consequently, researchers currently prioritize genes for further study either on the basis of volcano plots or, more commonly, according to simple estimates of the fold change after filtering the genes with an arbitrary statistical significance threshold. While the subjective and informal nature of the former practice precludes quantification of its reliability, the latter practice is equivalent to using a hard-threshold estimator of the expression ratio that is not known to perform well in terms of mean-squared error, the sum of estimator variance and squared estimator bias. On the basis of two distinct simulation studies and data from different microarray studies, we systematically compared the performance of several estimators representing both current practice and shrinkage. We find that the threshold-based estimators usually perform worse than the maximum-likelihood estimator (MLE) and they often perform far worse as quantified by estimated mean-squared risk. By contrast, the shrinkage estimators tend to perform as well as or better than the MLE and never much worse than the MLE, as expected from what is known about shrinkage. However, a Bayesian measure of performance based on the prior information that few genes are differentially expressed indicates that hard-threshold estimators perform about as well as the local false discovery rate (FDR), the best of the shrinkage estimators studied. Based on the ability of the latter to leverage information across genes, we conclude that the use of the local-FDR estimator of the fold change instead of informal or threshold-based combinations of statistical tests and non-shrinkage estimators can be expected to substantially improve the reliability of gene prioritization at very little risk of doing so less reliably.</description>

<author>Zahra Montazeri</author>


<category>Computation</category>

<category>Microarrays</category>

<category>Statistical Models</category>

<category>Statistical Theory and Methods</category>

</item>


<item>
<title>Bringing Game Theory to Hypothesis Testing: Establishing Finite Sample Bounds on Inference</title>
<link>http://biostats.bepress.com/cobra/ps/art59</link>
<guid isPermaLink="true">http://biostats.bepress.com/cobra/ps/art59</guid>
<pubDate>Fri, 21 Aug 2009 12:43:16 PDT</pubDate>
<description>Small sample properties are of fundamental interest when only limited data is available. Exact inference is limited by constraints imposed by specific nonrandomized tests and of course also by lack of more data. These effects can be separated as we propose to evaluate a test by comparing its type II error to the minimal type II error among all tests for the given sample. Game theory is used to establish this minimal type II error, the associated randomized test is characterized as part of a Nash equilibrium of a fictitious game against nature. We use this method to investigate sequential tests for the difference between two means when outcomes are constrained to belong to a given bounded set. Tests of inequality and of noninferiority are included. We find that inference in terms of type II error based on a balanced sample cannot be improved by sequential sampling or even by observing counter factual evidence providing there is a reasonable gap between the hypotheses.</description>

<author>Karl H. Schlag</author>


<category>Statistical Theory and Methods</category>

</item>


<item>
<title>A New Method for Constructing Exact Tests without Making any Assumptions</title>
<link>http://biostats.bepress.com/cobra/ps/art58</link>
<guid isPermaLink="true">http://biostats.bepress.com/cobra/ps/art58</guid>
<pubDate>Fri, 21 Aug 2009 12:43:15 PDT</pubDate>
<description>We present a new method for constructing exact distribution-free tests (and con&#133;fidence intervals) for variables that can generate more than two possible outcomes. This method separates the search for an exact test from the goal to create a non- randomized test. Randomization is used to extend any exact test relating to means of variables with fi&#133;nitely many outcomes to variables with outcomes belonging to a given bounded set. Tests in terms of variance and covariance are reduced to tests relating to means. Randomness is then eliminated in a separate step. This method is used to create con&#133;fidence intervals for the difference between two means (or variances) and tests of stochastic inequality and correlation.</description>

<author>Karl H. Schlag</author>


<category>Statistical Theory and Methods</category>

</item>


<item>
<title>Reinforcement Learning Strategies for Clincal Trials in Non-small Cell Lung Cancer</title>
<link>http://biostats.bepress.com/uncbiostat/papers/art13</link>
<guid isPermaLink="true">http://biostats.bepress.com/uncbiostat/papers/art13</guid>
<pubDate>Mon, 03 Aug 2009 08:53:17 PDT</pubDate>
<description>Typical regimens for advanced metastatic stage IIIB/IV non-small cell lung cancer (NSCLC) consist of multiple lines of treatment. We present an adaptive reinforcement learning approach to discover optimal individualized treatment regimens from a specially designed clinical trial (a "clinical reinforcement trial") of an experimental treatment for patients with advanced NSCLC who have not been treated previously with systemic therapy. In addition to the complexity of the problem of selecting optimal compounds for first and second-line treatments based on prognostic factors, another primary scientific goal is to determine the optimal time to initiate second-line therapy, either immediately or delayed after induction therapy, yielding the longest overall survival time. A reinforcement learning method called Q-learning is utilized which involves learning an optimal policy from patient data generated from the clinical reinforcement trial. Approximating the Q-function with time-indexed parameters can be achieved by using a modification of support vector regression which can utilize censored data. Within this framework, a simulation study shows that the procedure can extract optimal strategies for two lines of treatment directly from clinical data without relying on the identification of any accurate mathematical models. In addition, we demonstrate that the design reliably selects the best initial time for second-line therapy while taking into account the heterogeneity of NSCLC across patients.</description>

<author>Yufan Zhao</author>


<category>Clinical Trials</category>

</item>


<item>
<title>Reliability of the Model for Clustering of Longitudinal datasets of Infant Mortality Rate in India</title>
<link>http://biostats.bepress.com/cobra/ps/art57</link>
<guid isPermaLink="true">http://biostats.bepress.com/cobra/ps/art57</guid>
<pubDate>Wed, 15 Jul 2009 10:27:29 PDT</pubDate>
<description>Because of the natural tendency of human beings and heavenly bodies to form groups, the technique of cluster analysis or segmentation analysis find its importance and applications in many fields of study. A model for clustering of time trends was proposed by authors whose beauty is that 2-way dimensions that is the horizontal flow of the trend and vertical distance of the trend from a common base are considered to obtain the natural clusters. In the present paper, the reliability of this model is studied in two steps namely (i) by repeating the analysis but using different interval distance measures and (ii) by repeating the analysis but using different hierarchical clustering techniques. Dissimilarity coefficients were calculated for the time trends of infant mortality rates in India using this model. In SPSSv17.0, four different clustering methods were applied using generalized power function. Agglomeration schedules were obtained and elbow criterion diagrams were made for each trend. Five stable clusters were suggested by these methods. K-means clustering technique was applied to obtain the actual members of these five clusters.</description>

<author>Ajay Kumar Bansal</author>


<category>Longitudinal Data Analysis and Time Series</category>

</item>


<item>
<title>Simple, Defensible Sample Sizes Based on Cost Efficiency -- With Discussion and Rejoinder</title>
<link>http://biostats.bepress.com/cobra/ps/art55</link>
<guid isPermaLink="true">http://biostats.bepress.com/cobra/ps/art55</guid>
<pubDate>Wed, 17 Jun 2009 16:12:32 PDT</pubDate>
<description>The conventional approach of choosing sample size to provide 80% or greater power ignores the cost implications of different sample size choices.  Costs, however, are often impossible for investigators and funders to ignore in actual practice.  Here, we propose and justify a new approach for choosing sample size based on cost efficiency, the ratio of a study's projected scientific and/or practical value to its total cost.  By showing that a study's projected value exhibits diminishing marginal returns as a function of increasing sample size for a wide variety of definitions of study value, we are able to develop two simple choices that can be defended as more cost efficient than any larger sample size.  The first is to choose the sample size that minimizes the average cost per subject.  The second is to choose sample size to minimize total cost divided by the square root of sample size.  This latter method is theoretically more justifiable for innovative studies, but also performs reasonably well and has some justification in other cases.   For example, if projected study value is assumed to be proportional to power at a specific alternative and total cost is a linear function of sample size, then this approach is guaranteed either to produce more than 90% power or to be more cost efficient than any sample size that does.  These methods are easy to implement, based on reliable inputs, and well justified, so they should be regarded as acceptable alternatives to current conventional approaches.</description>

<author>Peter Bacchetti</author>


<category>General Biostatistics</category>

</item>


<item>
<title>&quot;Implementation of quasi-least squares With the R package qlspack&quot;</title>
<link>http://biostats.bepress.com/upennbiostat/papers/art32</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/papers/art32</guid>
<pubDate>Wed, 17 Jun 2009 08:25:14 PDT</pubDate>
<description>Quasi-least squares (QLS)  is an alternative method for estimating the  correlation parameters within the framework of generalized estimating  equations (GEE) that has two main  advantages over the moment estimates that  are typically applied for GEE: (1) It guarantees a consistent estimate of  the correlation parameter and a positive definite estimated  correlation matrix, for several correlation structures; and (2)  It allows for  easier implementation of some correlation structures that have not  yet been implemented in the framework of GEE. Furthermore, because QLS is a  method in the framework of GEE, existing software can be employed within  the QLS  algorithm for estimation of the correlation and regression parameters. In this manuscript  we describe and demonstrate the user written package qlspack  that allows for implementation of QLS in R software. Our package qlspack calls up the  geepack package Yan (2002) and Halekoh et al. (2006) to update the  estimate of the regression parameter at the current QLS  estimate of the correlation parameter; hence, geepack related functions for  standard error estimation can be used after implementing  qlspack.</description>

<author>Jichun Xie</author>


<category>General Biostatistics</category>

</item>



</channel>
</rss>
