<?xml version="1.0" encoding="iso-8859-1" ?>
<rss version="2.0">
<channel>
<title>Collection of Biostatistics Research Archive</title>
<copyright>Copyright (c) 2010  All rights reserved.</copyright>
<link>http://biostats.bepress.com</link>
<description>Recent documents in Collection of Biostatistics Research Archive</description>
<language>en-us</language>
<lastBuildDate>Mon, 08 Feb 2010 01:29:52 PST</lastBuildDate>
<ttl>3600</ttl>








<item>
<title>Mean Survival Time from Right Censored Data</title>
<link>http://biostats.bepress.com/cobra/ps/art66</link>
<guid isPermaLink="true">http://biostats.bepress.com/cobra/ps/art66</guid>
<pubDate>Wed, 13 Jan 2010 21:21:28 PST</pubDate>
<description>A nonparametric estimate of the mean survival time can be obtained as the area under the
Kaplan-Meier estimate of the survival curve. A common modification is to change the
largest observation to a death time if it is censored. We conducted a simulation study to
assess the behavior of this estimator of the mean survival time in the presence of right
censoring.
We simulated data from seven distributions: exponential, normal, uniform, lognormal,
gamma, log-logistic, and Weibull. This allowed us to compare the results of the estimates
to the known true values and to quantify the bias and the variance. Our simulations cover
proportions of random censoring from 0% to 90%.
The bias of the modified Kaplan-Meier mean estimator increases with the proportion of
censoring. The rate of increase varied substantially from distribution to distribution.
Distributions with long right tails (log-logistic, log normal, exponential) increased the
quickest (i.e., at lower censoring proportions). The other distributions are relatively
unbiased until around 60% censoring. The Normal distribution remains unbiased up to
90% censoring.
Thus, the behavior of the modified Kaplan-Meier mean estimator depends heavily on the
nature of the distribution being estimated. Since we rarely have knowledge of the
underlying true distribution, care must be taken when estimating the mean from censored
data. With modest censoring, estimates are relatively unbiased, but as censoring increases
so does the bias. With 30% or more censoring the bias may be too high. This is in
contrast to the Kaplan-Meier estimator of the median which is relatively unbiased.</description>

<author>Ming Zhong</author>


<category>Survival Analysis</category>

</item>






<item>
<title>Bayesian Methods for Network-Structured Genomics Data</title>
<link>http://biostats.bepress.com/upennbiostat/papers/art34</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/papers/art34</guid>
<pubDate>Tue, 05 Jan 2010 09:18:05 PST</pubDate>
<description>Graphs and networks are common ways of depicting information. In biology, many different processes are represented by graphs, such as regulatory networks, metabolic pathways and protein-protein interaction networks. This information provides useful supplement to the standard numerical genomic  data such as microarray gene expression data.  Effectively utilizing such an information can lead to a better identification of biologically relevant genomic features in the context of our prior biological knowledge.  In this paper,  we present a Bayesian variable selection procedure for network-structured covariates for both Gaussian linear and probit models. The key of our approach is the introduction of a Markov random field prior for the indicator variables that describe which covariates should be included in the model and the use of the Wolff algorithm for Markov Chain Monte Carlo inference. We illustrate the proposed procedure with simulations and with an analysis of  genomic data. Finally, we present some other areas of genomics research where  novel Bayesian approaches may play important roles.</description>

<author>Stefano Monni</author>


<category>Computational Biology/Bioinformatics</category>

</item>






<item>
<title>New Statistical Paradigms Leading to Web-Based Tools for Clinical/Translational Science</title>
<link>http://biostats.bepress.com/cobra/ps/art65</link>
<guid isPermaLink="true">http://biostats.bepress.com/cobra/ps/art65</guid>
<pubDate>Tue, 08 Dec 2009 19:57:40 PST</pubDate>
<description>As the field of functional genetics and genomics is beginning to mature, we become confronted with new challenges. The constant drop in price for sequencing and gene expression profiling as well as the increasing number of genetic and genomic variables that can be measured makes it feasible to address more complex questions. The success with rare diseases caused by single loci or genes has provided us with a proof-of-concept that new therapies can be developed based on functional genomics and genetics.Common diseases, however, typically involve genetic epistasis, genomic pathways, and proteomic pattern. Moreover, to better understand the underlying biologi-cal systems, we often need to integrate information from several of these sources. Thus, as the field of clinical research moves toward complex diseases, the demand for modern data base systems and advanced statistical methods increases.The traditional statistical methods implemented in most of the bioinformatics tools currently used in the novel field of genetics and functional genomics are based on the linear model and, thus, have shortcomings when applied to nonlinear biological systems. The previous work on partially ordered data (Wittkowski 1988; 1992), when combined with theoretical results (Hoeffding 1948) and computational strategies (Deuchler 1914) has opened a new field of nonparametric statistics. With grid technology, new tools are now feasible when screening for interactions between genetics (Wittkowski, Liu 2002) and functional genomics (Wittkowski, Lee 2004).Having more complex study designs and more specific methods available increases the demand for decision support when selecting appropriate bioinformatics tools. With the advent of rapid prototyping systems for Web based database application, we have recently begun to complement previous work on knowledge based systems with graphical Web-based tools for acquisition of DESIGN and MODEL knowledge.</description>

<author>Knut M. Wittkowski</author>


<category>Clinical Trials</category>

<category>Computational Biology/Bioinformatics</category>

<category>Design of Experiments and Sample Surveys</category>

<category>General Biostatistics</category>

<category>Genetics</category>

<category>Microarrays</category>

<category>Multivariate Analysis</category>

<category>Statistical Models</category>

<category>Statistical Theory and Methods</category>

<category>Survival Analysis</category>

</item>






<item>
<title>Modeling Multilevel Sleep Transitional Data Via Poisson Log-Linear Multilevel Models</title>
<link>http://biostats.bepress.com/cobra/ps/art64</link>
<guid isPermaLink="true">http://biostats.bepress.com/cobra/ps/art64</guid>
<pubDate>Tue, 08 Dec 2009 19:56:29 PST</pubDate>
<description>This paper proposes Poisson log-linear multilevel models to investigate population variability in sleep state transition rates. We specifically propose a Bayesian Poisson regression model that is more flexible, scalable to larger studies, and easily fit than other attempts in the literature. We further use hierarchical random effects to account for pairings of individuals and repeated measures within those individuals, as comparing diseased to non-diseased subjects while minimizing bias is of epidemiologic importance. We estimate essentially non-parametric piecewise constant hazards and smooth them, and allow for time varying covariates and segment of the night comparisons. The Bayesian Poisson regression is justified through a re-derivation of a classical algebraic likelihood equivalence of Poisson regression with a log(time) offset and survival regression assuming piecewise constant hazards. This relationship allows us to synthesize two methods currently used to analyze sleep transition phenomena: stratified multi-state proportional hazards models and log-linear models with GEE for transition counts. An example data set from the Sleep Heart Health Study is analyzed.</description>

<author>Bruce J. Swihart</author>


<category>Longitudinal Data Analysis and Time Series</category>

</item>






<item>
<title>Losses To Follow-Up In WARSS Collaboration Datasets: A Detailed Statistical Presentation of the Imputation Procedures</title>
<link>http://biostats.bepress.com/columbiabiostat/papers/art16</link>
<guid isPermaLink="true">http://biostats.bepress.com/columbiabiostat/papers/art16</guid>
<pubDate>Sat, 05 Dec 2009 16:06:54 PST</pubDate>
<description>This document prospectively records the procedures which will be used for handling losses to follow-up (LTF) in statistical analyses of WARSS data.  They were developed by B Levin Ph.D. (WARSS senior statistical consultant) and JLP Thompson Ph.D. (WARSS statistician), and have been approved by the SOCC (Statistical Oversight and Coordinating Committee of the WARSS collaboration).  They have been accepted by the WARSS Principal Investigator (JP Mohr M.D.), and also by the Principal Investigators of APASS, PICSS, HAS, and GENESIS for use in these collaborating studies.</description>

<author>John L.P. Thompson</author>


<category>Clinical Trials</category>

</item>






<item>
<title>Targeted Genomic signature profiling with Quasi-alignment statistics</title>
<link>http://biostats.bepress.com/cobra/ps/art63</link>
<guid isPermaLink="true">http://biostats.bepress.com/cobra/ps/art63</guid>
<pubDate>Thu, 03 Dec 2009 11:38:17 PST</pubDate>
<description>Genome databases continue to expand with no change in the basic format of sequence data.  The prevalent use of the Classic alignment based search tools like BLAST have significantly pushed the limits of Genome Isolate research. The relatively new frontier of Metagenomic research deals with thousands of diverse genomes with newer demands beyond the current homologue search and analysis.  Compressing sequence data into a complex form could facilitate a broader range of sequence analyses.  To this end, this research explores reorganizing sequence data as complex Markov signatures also known as Extensible Markov Models. Markov models have found successful application in Biological Sequence analysis applications through small, but important extensions to the original theory of Markov Chains.  Extensible Markov Model (EMM) offers a novel Quasi-alignment complement to the classic alignment based homologous sequence search methods like BLAST. EMM based BioInformatic analysis (EMMBA) incorporates automatic learning which allows the Markov chain creation dynamically.  Oligonucletide or Genomic word frequencies form the core sequence data in alignment free methods. EMMBA extends the Karlin-Altschul statistics to bring forth an analogous E-Score statistical significance to the Quasi-alignment domain. By consolidating a community of sequences into a single searchable profile, EMM methodology further reduces the search space for classification.  Through dynamic generation of the score matrix for each community profile, EMMBA fine tunes the score assignments. Each evaluation iteratively adjusts the profile score matrix to account for point probabilities of the query to ensure Karlin-Altschul assumptions are satisfied to derive meaningful statistical significance. The presence of multiple Quasi-alignments resembles multiple local alignments of BLAST. Quasi-alignments are scored based on a difference distribution of Gumbel scores. Species signature profiles allow for statistical validation of novel species identification.  Working in EMM transformation space speeds up classification and generates distance matrix for differentiation.  The techniques and metrics presented are validated using the microbial 16s rRNA sequence data from NCBI.</description>

<author>Rao Mallik Kotamarti</author>


<category>Computational Biology/Bioinformatics</category>

</item>






<item>
<title>Two-stage Decompositions for the Analysis of Functional Connectivity for fMRI With Application to Alzheimer&apos;s Disease Risk</title>
<link>http://biostats.bepress.com/cobra/ps/art62</link>
<guid isPermaLink="true">http://biostats.bepress.com/cobra/ps/art62</guid>
<pubDate>Thu, 03 Dec 2009 11:37:03 PST</pubDate>
<description>Functional connectivity is the study of correlations in measured   neurophysiological signals. Altered functional connectivity has been   shown to be associated with numerous diseases including Alzheimer's   disease and mild cognitive impairment. In this manuscript we use a   two-stage application of the singular value decomposition to obtain   data driven population-level measures of functional connectivity in   functional magnetic resonance imaging (fMRI).  The method is   computationally simple and amenable to high dimensional fMRI data   with large numbers of subjects.  Simulation studies suggest the   ability of the decomposition methods to recover population brain   networks and their associated loadings. We further demonstrate the   utility of these decompositions in a case-control functional   logistic regression model.  The method is applied to a novel fMRI   study of Alzheimer's disease risk under a verbal paired associates   task. We found empirical evidence of alternative connectivity in   clinically asymptomatic at-risk subjects when compared to   controls. The relevant brain network loads primarily on the temporal   lobe and overlaps significantly with the olfactory areas and   temporal poles.</description>

<author>Brian S. Caffo</author>


<category>General Biostatistics</category>

</item>






<item>
<title>Inverse Regression Estimation for Censored Data</title>
<link>http://biostats.bepress.com/uncbiostat/papers/art14</link>
<guid isPermaLink="true">http://biostats.bepress.com/uncbiostat/papers/art14</guid>
<pubDate>Tue, 10 Nov 2009 09:46:41 PST</pubDate>
<description>An inverse regression methodology for assessing predictor performance in the censored data setup is developed along with inference procedures and a computational algorithm. The technique developed here allows for conditioning on the unobserved failure time along with a weighting mechanism that accounts for the censoring. The implementation is nonparametric and computationally fast. This provides an efficient methodological tool that can be used especially in cases where usual modeling assumptions are not applicable to the data under consideration. It can also be a good diagnostic tool that can be used in a model selection process. We have provided theoretical justification of consistency and asymptotic normality of the methodology. Simulation studies and two data analyses are provided to illustrate the practical utility of the procedure. Keywords: right censored data, accelerated failure time, sufficient dimension reduction</description>

<author>Nivedita V. Nadkarni</author>


<category>Statistical Models</category>

<category>Statistical Theory and Methods</category>

</item>






<item>
<title>Quasi-Least Squares with Mixed Linear Correlation Structures</title>
<link>http://biostats.bepress.com/upennbiostat/papers/art33</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/papers/art33</guid>
<pubDate>Thu, 08 Oct 2009 12:56:48 PDT</pubDate>
<description>Quasi-least squares (QLS) is a two-stage computational approach for estimation of the correlation parameters in the framework of generalized estimating equations (GEE). We prove two general results for the class of mixed linear correlation structures: namely, that the stage one QLS estimate of the correlation parameter always exists and is feasible (yields a positive definite estimated correlation matrix) for any correlation structure, while the stage two estimator exists and is unique (and therefore consistent) with probability one, for the class of mixed linear correlation structures. Our general results justify the implementation of QLS for particular members of the class of mixed linear correlation structures that are appropriate for the analysis of familial data, with families that vary in size and composition. We describe the familial structures and implement them in an analysis of optical spherical values in the Old Order Amish (OOA). For the OOA analysis, we show that we would suffer a substantial loss in efficiency, if the familial structures were the true structures, but were misspecified as simpler approximate structures. We also provide software for implementation of the familial structures in R.  Key-Words: Quasi-least squares; linear correlation structure; mixed correlation structure; familial data.</description>

<author>Jichun Xie</author>


<category>Statistical Theory and Methods</category>

</item>






<item>
<title>Composite Likelihood EM Algorithm with Applications to Multivariate Hidden Markov Model </title>
<link>http://biostats.bepress.com/cobra/ps/art61</link>
<guid isPermaLink="true">http://biostats.bepress.com/cobra/ps/art61</guid>
<pubDate>Thu, 01 Oct 2009 19:28:26 PDT</pubDate>
<description>The method of composite likelihood is useful to deal with estimation and inference in parametric models with high-dimensional data, where the full likelihood approach renders to intractable computational complexity. We develop an extension of the EM algorithm in the framework of composite likelihood estimation in the presence of missing data or latent variables. We establish three key theoretical properties of the composite likelihood EM (CLEM) algorithm, including the ascent property, the algorithmic convergence and the convergence rate. The proposed method is applied to estimate the transition probabilities in multivariate hidden Markov model. Simulation studies are presented to demonstrate the empirical performance of the method. A time-course microarray data is analyzed using the proposed CLEM method to dissect the underlying gene regulatory network.</description>

<author>Xin Gao</author>


<category>General Biostatistics</category>

</item>






<item>
<title>A Permutation-Based Multiple Testing Method for Time-Course Microarray Experiments</title>
<link>http://biostats.bepress.com/dukebiostat/papers/art6</link>
<guid isPermaLink="true">http://biostats.bepress.com/dukebiostat/papers/art6</guid>
<pubDate>Mon, 24 Aug 2009 07:33:05 PDT</pubDate>
<description>A Permutation-Based Multiple Testing Method for Time-Course Microarray Experiments
Background: Time-course microarray experiments are widely used to study the temporal profiles of gene
expression. Storey et al. (2005) developed a method for analyzing time-course microarray studies that can be
applied to discovering genes whose expression trajectories change over time within a single biological group, or
those that follow different time trajectories among multiple groups. They estimated the expression trajectories
of each gene using natural cubic splines under the null (no time-course) and alternative (time-course)
hypotheses, and used a goodness of fit test statistic the quantify the discrepancy. The null distribution of the
statistic was approximated through a bootstrap method. Gene expression levels in microarray data are often
complicatedly correlated. An accurate type I error control adjusting for multiple testing requires the joint null
distribution of test statistics for a large number of genes. For this purpose, permutation methods have been
widely used because of computational ease and their intuitive interpretation.Results: In this paper, we propose a permutation-based multiple testing procedure based on the test statistic
used by Storey et al. (2005). We also propose an efficient computation algorithm. Extensive simulations are
conducted to investigate the performance of the permutation-based multiple testing procedure. The application
of the proposed method is illustrated using the Caenorhabditis elegans dauer developmental data.Conclusions: Our method is computationally efficient and applicable for identifying genes whose expression
levels are time-dependent in a single biological group and for identifying the genes for which the time-profile
depends on the group in a multi-group setting.</description>

<author>Insuk Sohn</author>


<category>Computational Biology/Bioinformatics</category>

<category>Longitudinal Data Analysis and Time Series</category>

<category>Microarrays</category>

<category>Multivariate Analysis</category>

<category>Survival Analysis</category>

</item>






<item>
<title>Shrinkage Estimation of Expression Fold Change As an Alternative to Testing Hypotheses of Equivalent Expression</title>
<link>http://biostats.bepress.com/cobra/ps/art60</link>
<guid isPermaLink="true">http://biostats.bepress.com/cobra/ps/art60</guid>
<pubDate>Sat, 22 Aug 2009 15:07:34 PDT</pubDate>
<description>Research on analyzing microarray data has focused on the problem of identifying differentially expressed genes to the neglect of the problem of how to integrate evidence that a gene is differentially expressed with information on the extent of its differential expression. Consequently, researchers currently prioritize genes for further study either on the basis of volcano plots or, more commonly, according to simple estimates of the fold change after filtering the genes with an arbitrary statistical significance threshold. While the subjective and informal nature of the former practice precludes quantification of its reliability, the latter practice is equivalent to using a hard-threshold estimator of the expression ratio that is not known to perform well in terms of mean-squared error, the sum of estimator variance and squared estimator bias. On the basis of two distinct simulation studies and data from different microarray studies, we systematically compared the performance of several estimators representing both current practice and shrinkage. We find that the threshold-based estimators usually perform worse than the maximum-likelihood estimator (MLE) and they often perform far worse as quantified by estimated mean-squared risk. By contrast, the shrinkage estimators tend to perform as well as or better than the MLE and never much worse than the MLE, as expected from what is known about shrinkage. However, a Bayesian measure of performance based on the prior information that few genes are differentially expressed indicates that hard-threshold estimators perform about as well as the local false discovery rate (FDR), the best of the shrinkage estimators studied. Based on the ability of the latter to leverage information across genes, we conclude that the use of the local-FDR estimator of the fold change instead of informal or threshold-based combinations of statistical tests and non-shrinkage estimators can be expected to substantially improve the reliability of gene prioritization at very little risk of doing so less reliably.</description>

<author>Zahra Montazeri</author>


<category>Computation</category>

<category>Microarrays</category>

<category>Statistical Models</category>

<category>Statistical Theory and Methods</category>

</item>






<item>
<title>Bringing Game Theory to Hypothesis Testing: Establishing Finite Sample Bounds on Inference</title>
<link>http://biostats.bepress.com/cobra/ps/art59</link>
<guid isPermaLink="true">http://biostats.bepress.com/cobra/ps/art59</guid>
<pubDate>Fri, 21 Aug 2009 12:43:16 PDT</pubDate>
<description>Small sample properties are of fundamental interest when only limited data is available. Exact inference is limited by constraints imposed by specific nonrandomized tests and of course also by lack of more data. These effects can be separated as we propose to evaluate a test by comparing its type II error to the minimal type II error among all tests for the given sample. Game theory is used to establish this minimal type II error, the associated randomized test is characterized as part of a Nash equilibrium of a fictitious game against nature. We use this method to investigate sequential tests for the difference between two means when outcomes are constrained to belong to a given bounded set. Tests of inequality and of noninferiority are included. We find that inference in terms of type II error based on a balanced sample cannot be improved by sequential sampling or even by observing counter factual evidence providing there is a reasonable gap between the hypotheses.</description>

<author>Karl H. Schlag</author>


<category>Statistical Theory and Methods</category>

</item>






<item>
<title>A New Method for Constructing Exact Tests without Making any Assumptions</title>
<link>http://biostats.bepress.com/cobra/ps/art58</link>
<guid isPermaLink="true">http://biostats.bepress.com/cobra/ps/art58</guid>
<pubDate>Fri, 21 Aug 2009 12:43:15 PDT</pubDate>
<description>We present a new method for constructing exact distribution-free tests (and con&#133;fidence intervals) for variables that can generate more than two possible outcomes. This method separates the search for an exact test from the goal to create a non- randomized test. Randomization is used to extend any exact test relating to means of variables with fi&#133;nitely many outcomes to variables with outcomes belonging to a given bounded set. Tests in terms of variance and covariance are reduced to tests relating to means. Randomness is then eliminated in a separate step. This method is used to create con&#133;fidence intervals for the difference between two means (or variances) and tests of stochastic inequality and correlation.</description>

<author>Karl H. Schlag</author>


<category>Statistical Theory and Methods</category>

</item>






<item>
<title>Reinforcement Learning Strategies for Clincal Trials in Non-small Cell Lung Cancer</title>
<link>http://biostats.bepress.com/uncbiostat/papers/art13</link>
<guid isPermaLink="true">http://biostats.bepress.com/uncbiostat/papers/art13</guid>
<pubDate>Mon, 03 Aug 2009 08:53:17 PDT</pubDate>
<description>Typical regimens for advanced metastatic stage IIIB/IV non-small cell lung cancer (NSCLC) consist of multiple lines of treatment. We present an adaptive reinforcement learning approach to discover optimal individualized treatment regimens from a specially designed clinical trial (a "clinical reinforcement trial") of an experimental treatment for patients with advanced NSCLC who have not been treated previously with systemic therapy. In addition to the complexity of the problem of selecting optimal compounds for first and second-line treatments based on prognostic factors, another primary scientific goal is to determine the optimal time to initiate second-line therapy, either immediately or delayed after induction therapy, yielding the longest overall survival time. A reinforcement learning method called Q-learning is utilized which involves learning an optimal policy from patient data generated from the clinical reinforcement trial. Approximating the Q-function with time-indexed parameters can be achieved by using a modification of support vector regression which can utilize censored data. Within this framework, a simulation study shows that the procedure can extract optimal strategies for two lines of treatment directly from clinical data without relying on the identification of any accurate mathematical models. In addition, we demonstrate that the design reliably selects the best initial time for second-line therapy while taking into account the heterogeneity of NSCLC across patients.</description>

<author>Yufan Zhao</author>


<category>Clinical Trials</category>

</item>






<item>
<title>Reliability of the Model for Clustering of Longitudinal datasets of Infant Mortality Rate in India</title>
<link>http://biostats.bepress.com/cobra/ps/art57</link>
<guid isPermaLink="true">http://biostats.bepress.com/cobra/ps/art57</guid>
<pubDate>Wed, 15 Jul 2009 10:27:29 PDT</pubDate>
<description>Because of the natural tendency of human beings and heavenly bodies to form groups, the technique of cluster analysis or segmentation analysis find its importance and applications in many fields of study. A model for clustering of time trends was proposed by authors whose beauty is that 2-way dimensions that is the horizontal flow of the trend and vertical distance of the trend from a common base are considered to obtain the natural clusters. In the present paper, the reliability of this model is studied in two steps namely (i) by repeating the analysis but using different interval distance measures and (ii) by repeating the analysis but using different hierarchical clustering techniques. Dissimilarity coefficients were calculated for the time trends of infant mortality rates in India using this model. In SPSSv17.0, four different clustering methods were applied using generalized power function. Agglomeration schedules were obtained and elbow criterion diagrams were made for each trend. Five stable clusters were suggested by these methods. K-means clustering technique was applied to obtain the actual members of these five clusters.</description>

<author>Ajay Kumar Bansal</author>


<category>Longitudinal Data Analysis and Time Series</category>

</item>






<item>
<title>Simple, Defensible Sample Sizes Based on Cost Efficiency -- With Discussion and Rejoinder</title>
<link>http://biostats.bepress.com/cobra/ps/art55</link>
<guid isPermaLink="true">http://biostats.bepress.com/cobra/ps/art55</guid>
<pubDate>Wed, 17 Jun 2009 16:12:32 PDT</pubDate>
<description>The conventional approach of choosing sample size to provide 80% or greater power ignores the cost implications of different sample size choices.  Costs, however, are often impossible for investigators and funders to ignore in actual practice.  Here, we propose and justify a new approach for choosing sample size based on cost efficiency, the ratio of a study's projected scientific and/or practical value to its total cost.  By showing that a study's projected value exhibits diminishing marginal returns as a function of increasing sample size for a wide variety of definitions of study value, we are able to develop two simple choices that can be defended as more cost efficient than any larger sample size.  The first is to choose the sample size that minimizes the average cost per subject.  The second is to choose sample size to minimize total cost divided by the square root of sample size.  This latter method is theoretically more justifiable for innovative studies, but also performs reasonably well and has some justification in other cases.   For example, if projected study value is assumed to be proportional to power at a specific alternative and total cost is a linear function of sample size, then this approach is guaranteed either to produce more than 90% power or to be more cost efficient than any sample size that does.  These methods are easy to implement, based on reliable inputs, and well justified, so they should be regarded as acceptable alternatives to current conventional approaches.</description>

<author>Peter Bacchetti</author>


<category>General Biostatistics</category>

</item>






<item>
<title>&quot;Implementation of quasi-least squares With the R package qlspack&quot;</title>
<link>http://biostats.bepress.com/upennbiostat/papers/art32</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/papers/art32</guid>
<pubDate>Wed, 17 Jun 2009 08:25:14 PDT</pubDate>
<description>Quasi-least squares (QLS)  is an alternative method for estimating the  correlation parameters within the framework of generalized estimating  equations (GEE) that has two main  advantages over the moment estimates that  are typically applied for GEE: (1) It guarantees a consistent estimate of  the correlation parameter and a positive definite estimated  correlation matrix, for several correlation structures; and (2)  It allows for  easier implementation of some correlation structures that have not  yet been implemented in the framework of GEE. Furthermore, because QLS is a  method in the framework of GEE, existing software can be employed within  the QLS  algorithm for estimation of the correlation and regression parameters. In this manuscript  we describe and demonstrate the user written package qlspack  that allows for implementation of QLS in R software. Our package qlspack calls up the  geepack package Yan (2002) and Halekoh et al. (2006) to update the  estimate of the regression parameter at the current QLS  estimate of the correlation parameter; hence, geepack related functions for  standard error estimation can be used after implementing  qlspack.</description>

<author>Jichun Xie</author>


<category>General Biostatistics</category>

</item>






<item>
<title>Graphical Displays for Clarifying How Allocation Ratio Affects Total Sample Size for the Two Sample Logrank Test</title>
<link>http://biostats.bepress.com/uncbiostat/papers/art12</link>
<guid isPermaLink="true">http://biostats.bepress.com/uncbiostat/papers/art12</guid>
<pubDate>Tue, 16 Jun 2009 05:46:13 PDT</pubDate>
<description>For time-to-event data, the power of the two sample logrank test for the comparison of two treatment groups can be greatly influenced by the ratio of the number of patients in each of the treatment groups. Despite the possible loss of power, unequal allocations may be of interest due to a need to collect more data on one of the groups or to considerations related to the acceptability of the treatments to patients. Investigators pursuing such designs may be interested in the cost of the unbalanced design relative to a balanced design with respect to the total number of patients required for the study. We present graphical displays to illustrate the sample size adjustment factor, or ratio of the sample size required by an unequal allocation compared to the sample size required by a balanced allocation, for various survival rates, treatment hazards ratios, and sample size allocation ratios. These graphical displays conveniently summarize information in the literature and provide a useful tool for planning sample sizes for the two sample logrank test.</description>

<author>Benjamin R. Saville</author>


<category>Clinical Trials</category>

<category>Statistical Theory and Methods</category>

</item>






<item>
<title>Two-Stage Phase II Clinical Trials with Heterogeneous Patient Populations</title>
<link>http://biostats.bepress.com/dukebiostat/papers/art5</link>
<guid isPermaLink="true">http://biostats.bepress.com/dukebiostat/papers/art5</guid>
<pubDate>Fri, 12 Jun 2009 09:13:36 PDT</pubDate>
<description>The patient population for a phase II trial often consists of multiple subgroups with different prognosis. In this case, a popular design approach is to specify the response rate and the prevalence of each subgroup, to calculate the response rate of the whole population by the weighted average of the response rates across subgroups, and to choose a standard phase II design such as Simon's optimal or minimax design to test on the response rate for the whole population. Although the prevalence of each subgroup is accurately specified, the observed prevalence among the accrued patients to the study may be quite different from the estimated one because of the small sample size, which is typical in most phase II trials. In this case, the fixed rejection value for a chosen standard phase II design may be either too conservative (i.e., increasing the false rejection probability of the experimental therapy) if the trial accrues more high-risk patients than expected or too anti-conservative (i.e., increasing the false acceptance probability of the experimental therapy) if the trial accrues more low-risk patients than expected. We can avoid such problem by adjusting the rejection value depending on the observed prevalence from the trial. In this paper, we investigate two flexible design approaches that choose rejection values depending on the observed prevalence, and compare them under various</description>

<author>Sin-Ho Jung</author>


<category>Clinical Trials</category>

</item>





</channel>
</rss>
