<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0">
<channel>
<title>Duke Biostatistics and Bioinformatics (B&amp;B) Working Paper Series</title>
<copyright>Copyright (c) 2013 Duke University All rights reserved.</copyright>
<link>http://biostats.bepress.com/dukebiostat</link>
<description>Recent documents in Duke Biostatistics and Bioinformatics (B&amp;B) Working Paper Series</description>
<language>en-us</language>
<lastBuildDate>Wed, 23 Jan 2013 21:52:28 PST</lastBuildDate>
<ttl>3600</ttl>








<item>
<title>Dynamic Thresholds and a Summary ROC Curve: Assessing the Prognostic Accuracy of Longitudinal Markers</title>
<link>http://biostats.bepress.com/dukebiostat/art19</link>
<guid isPermaLink="true">http://biostats.bepress.com/dukebiostat/art19</guid>
<pubDate>Tue, 27 Nov 2012 17:25:18 PST</pubDate>
<description>
	<![CDATA[
	<p>Cancer patients, chronic kidney disease (CDK) patients, and subjects infected with HIV are commonly monitored over time using biomarkers that represent key health status indicators. Furthermore, biomarkers are frequently used to guide initiation of new treatments or to inform changes in intervention strategies. Since key medical decisions can be made on the basis of a longitudinal biomarker it is important to evaluate the potential accuracy associated with longitudinal monitoring. We introduce a summary receiver operating characteristic (ROC) curve that displays the overall sensitivity associated with a time-dependent threshold that controls specificity. The proposed statistical methods are similar to concepts considered in disease screening, yet our methods are novel in choosing a potentially time-dependent threshold to define a positive test, and our methods allow time-specific control of the false-positive rate. Finally, the proposed summary ROC curve is a natural averaging of time-dependent incident/dynamic ROC curves proposed by Heagerty and Zheng (2005) and therefore provides a single summary of net error rates that can be achieved in the longitudinal setting.</p>

	]]>
</description>

<author>Paramita Saha Chaudhuri et al.</author>


</item>






<item>
<title>Power and Sample Size Calculations for SNP Association Studies with Censored Time-to-Event Outcomes</title>
<link>http://biostats.bepress.com/dukebiostat/art18</link>
<guid isPermaLink="true">http://biostats.bepress.com/dukebiostat/art18</guid>
<pubDate>Sat, 17 Mar 2012 09:13:02 PDT</pubDate>
<description>
	<![CDATA[
	<p>For many clinical studies in cancer, germline DNA is prospectively collected for the purpose of discovering or validating Single Nucleotide Polymorphisms associated with clinical outcomes. The primary clinical endpoint for many of these studies are time-to-event outcomes such as time of death or disease progression which are subject to censoring mechanisms. The Cox score test can be readily employed to test the association between a SNP and the outcome of interest. In addition to the effect and sample size, and censoring distribution, the power of the test will depend on the underlying genetic risk model and the distribution of the risk allele. We propose a rigorous account for power and sample size calculations under a variety of genetic risk models without resorting to the commonly used contiguous alternative assumption. Practical advice along with an open-source software package to design SNP association studies with survival outcomes are provided.</p>

	]]>
</description>

<author>Kouros Owzar et al.</author>


<category>Genetics</category>

</item>






<item>
<title>A Bayesian Hierarchical Model for Adaptive Biomarker Strategies in Randomized Phase II Studies</title>
<link>http://biostats.bepress.com/dukebiostat/art17</link>
<guid isPermaLink="true">http://biostats.bepress.com/dukebiostat/art17</guid>
<pubDate>Sat, 17 Mar 2012 08:00:18 PDT</pubDate>
<description>
	<![CDATA[
	<p>The role of biomarkers has increased dramatically in cancer therapeutic trials, and with molecular markers now integrated into phase II and III studies, there is a need for novel clinical trial designs to efficiently answer questions of both drug effects and biomarker performance. Further, trials with integral markers need greater flexibility for the wider variety of potential outcomes. To address these needs, we propose a Bayesian hierarchical model for use in response-adaptive, randomized phase II studies integrating multiple agents and multiple biomarker sub-populations. This allows for a gradual and seamless approach to transition from a randomized block design to a biomarker-enrichment design, such that a greater fraction of participants are randomized to optimal therapy when therapeutics are effective. Compared to the use of traditional staged designs within biomarker sub-populations, our method is more robust against misspecification of marker prevalence, and has improved performance in identifying the subgroups where therapeutics are effective.</p>

	]]>
</description>

<author>William T. Barry et al.</author>


</item>






<item>
<title>Statistical Considerations for Analysis of Microarray Experiments</title>
<link>http://biostats.bepress.com/dukebiostat/art16</link>
<guid isPermaLink="true">http://biostats.bepress.com/dukebiostat/art16</guid>
<pubDate>Wed, 05 Oct 2011 12:41:41 PDT</pubDate>
<description>
	<![CDATA[
	<p>Microarray technologies enable the simultaneous interrogation of expressions from thousands of genes from a  biospecimen sample taken from a patient.  This large set of expressions generate a genetic profile of the patient that may be used to identify potential prognostic or predictive genes or genetic models for clinical outcomes. The aim of this article is to provide a broad overview of some of the major statistical considerations for the design and analysis of microarrays experiments  conducted as correlative science studies to clinical trials. An emphasis will be placed on how the lack of understanding and improper use of  statistical concepts and methods will lead to noise discovery and misinterpretation of experimental results.</p>

	]]>
</description>

<author>Kouros Owzar et al.</author>


</item>






<item>
<title>Multiple Testing for Gene Sets from Microarray Experiments</title>
<link>http://biostats.bepress.com/dukebiostat/art15</link>
<guid isPermaLink="true">http://biostats.bepress.com/dukebiostat/art15</guid>
<pubDate>Mon, 11 Apr 2011 08:22:16 PDT</pubDate>
<description>
	<![CDATA[
	<p>Background: A key objective in many microarray association studies is the identification of individual genes associated with clinical outcome. It is often of additional interest to identify sets of genes, known a priori to have similar biologic function, associated with the outcome.</p>
<p>Results: In this paper, we propose a general permutation-based framework for gene set testing that controls the false discovery rate (FDR) while accounting for the dependency among the genes within and across each gene set. The application</p>
<p>of the proposed method is demonstrated using three public microarray data sets. The performance of our proposed method is contrasted to two other existing Gene Set Enrichment Analysis (GSEA) and Gene Set Analysis (GSA) methods.</p>
<p>Conclusions: Our simulations show that the proposed method controls the FDR at the desired level. Through simulations and case studies, we observe that our method performs better than GSEA and GSA, especially when the number of prognostic gene sets is large.</p>

	]]>
</description>

<author>Insuk Sohn et al.</author>


<category>Genetics</category>

</item>






<item>
<title>SNPpy - Database Management for SNP Data from GWAS Studies</title>
<link>http://biostats.bepress.com/dukebiostat/art14</link>
<guid isPermaLink="true">http://biostats.bepress.com/dukebiostat/art14</guid>
<pubDate>Mon, 07 Feb 2011 12:28:54 PST</pubDate>
<description>
	<![CDATA[
	<p>Background: We describe SNPpy, a hybrid script database system using the Python   SQLAlchemy library coupled with the PostgreSQL database to manage   genotype data from Genome-Wide Association Studies (GWAS). This   system makes it possible to merge study data with HapMap data, and   merge across studies for meta-analyses, including data filtering   based on the values of phenotype and SNP data.</p>
<p>Results: The current version of SNPpy offers utility functions to import   genotype and annotation data from two commercial platforms.  We use   these to import data from two GWAS studies and the HapMap Project.   We then export these individual datasets to standard data format   files that can be imported into statistical software for downstream   analyses.</p>
<p>Conclusions: By leveraging the power of relational databases, SNPpy offers   integrated management and manipulation of genotype and phenotype   data from GWAS studies. The analysis of these studies requires   merging across GWAS datasets as well as patient and marker   selection. To this end, SNPpy enables the user to filter the data   and output the results as standardized GWAS file formats. It does   low level and flexible data validation, including validation of   patient data. SNPpy is a practical and extensible solution for   investigators who seek to deploy central management of their GWAS   data.</p>

	]]>
</description>

<author>Faheem Mitha et al.</author>


<category>Genetics</category>

</item>






<item>
<title>Bayesian Approach for Analysis of Binary Responses in Clinical Trials with Protocol Amendments</title>
<link>http://biostats.bepress.com/dukebiostat/art13</link>
<guid isPermaLink="true">http://biostats.bepress.com/dukebiostat/art13</guid>
<pubDate>Mon, 07 Feb 2011 12:28:53 PST</pubDate>
<description>
	<![CDATA[
	<p>In clinical trials, the study protocols are often amended to modify trial procedures and/or statistical methods during the conduct of the clinical trials. Major or significant modification (adaptation) could result in a shift in the target patient population and consequently lead to a totally different trial, which is unable to answer the medical or scientific questions that the trial is intended to answer. The approaches of covariate-adjusted model for continuous and binary responses have been proposed. In this paper, we propose a Bayesian approach for analysis of binary study endpoint when there is a shift in patient population after protocol amendments.</p>

	]]>
</description>

<author>Lan-Yan Yang et al.</author>


</item>






<item>
<title>Statistical Inference for Clinical Trials with Random Shift in Scale Parameter of Target Patient Population</title>
<link>http://biostats.bepress.com/dukebiostat/art11</link>
<guid isPermaLink="true">http://biostats.bepress.com/dukebiostat/art11</guid>
<pubDate>Wed, 23 Jun 2010 08:12:17 PDT</pubDate>
<description>
	<![CDATA[
	<p>In clinical research and development, major or significant protocol amendments of on-going trials may result in a totally different trial, which is unable to address the scientific/medical questions that the original trial intends to answer. Chow, Chang and Pong [1] examined the impact of protocol amendments on statistical inference assuming a random location shift and a fixed scale shift in the target patient population. In this article, we will focus on the case where there is a fixed location shift but a random scale shift in the target patient population. The impact on the shift in target patient population, statistical inference, and power analysis for sample size adjustment after changes or modifications made to an on-going clinical trial via protocol amendments are studied. Assuming that there is a fixed location shift, an approach taking into consideration of potential random shift in scale parameter as a result of protocol amendment is proposed. Through a simulation study, it shows the proposed method is superior to the classical method by ignoring the shift in patient population in terms of accuracy and reliability for assessment of the treatment effect under study.</p>

	]]>
</description>

<author>Ying Lu et al.</author>


</item>






<item>
<title>Considerations to Sample Size in Random Shift among Adjusted Clinical Trials</title>
<link>http://biostats.bepress.com/dukebiostat/art12</link>
<guid isPermaLink="true">http://biostats.bepress.com/dukebiostat/art12</guid>
<pubDate>Wed, 23 Jun 2010 08:12:17 PDT</pubDate>
<description>
	<![CDATA[
	<p>Adaptive method in the clinical research has been concerned increasingly in the recent years, since the protocol amendments of the on-going clinical trial is becoming more and more common. As a series study of random shifts in clinical research, Chow et al. (2005)considered the case of the target population mean shift, Lu et al. focused on the shift in variance when the population mean is fixed. In this article, we discussed the shift in both mean and variance when the amendments happened in the conduction of the clinical trial. EM algorithm is an efficient method to solve the maximum likelihood estimates of the corresponding parameters, and the complete data information matrix can be obtained by Louis (1982) method. Through a simulation study, the proposed method will be compared to the classical method by ignoring the shift in both mean and variance, we will find the superiority of the proposed method.</p>

	]]>
</description>

<author>Ying Lu et al.</author>


</item>






<item>
<title>Nonparametric Estimation of AUC and Partial AUC under Test-Result-Dependent Sampling</title>
<link>http://biostats.bepress.com/dukebiostat/art10</link>
<guid isPermaLink="true">http://biostats.bepress.com/dukebiostat/art10</guid>
<pubDate>Wed, 05 May 2010 13:50:33 PDT</pubDate>
<description>
	<![CDATA[
	<p>The area under a ROC curve (AUC) and partial area under a ROC curve (pAUC) are important summary measures useful in assessing the accuracy of a diagnostic test or a biomarker in discriminating true disease status. We consider nonparametric estimation of AUC and pAUC under a test-result-dependent sampling (TDS) design, which consists of a simple random component and a test-result-dependent component. A TDS design can yield better efficiency and reduced study cost by oversampling or undersampling subjects falling into specified ranges of test results. We obtain a nonparametric empirical likelihood estimate of the test-result distribution under the TDS design. The estimated test-result distribution is then used to construct consistent estimators for AUC and pAUC. We establish asymptotic properties of the proposed estimators. Simulation shows that the proposed estimators have good finite sample properties and that the TDS design is more efficient than a simple random sampling (SRS) design. A data example based on an ongoing lung cancer trial is provided to illustrate the TDS design and the proposed estimators.</p>

	]]>
</description>

<author>Xiaofei Wang et al.</author>


</item>






<item>
<title>Semiparametric Estimation of ROC Curve Under Test-Result-Dependent Sampling</title>
<link>http://biostats.bepress.com/dukebiostat/art9</link>
<guid isPermaLink="true">http://biostats.bepress.com/dukebiostat/art9</guid>
<pubDate>Wed, 05 May 2010 13:50:32 PDT</pubDate>
<description>
	<![CDATA[
	<p>The receiver operating characteristic (ROC) curve may be used to evaluate the performance of a biomarker measured on continuous scale to predict disease status or clinical condition. Motivated by the need for novel study designs with better estimation efficiency and reduced study cost, we consider in this article a biased sampling scheme that consists of a simple random component and a supplemented testresult- dependent component. Using this approach, investigators can oversample or undersample subjects falling into certain ranges of the biomarker score, allowing an improved precision for the estimation of the ROC curve with a fixed size of subjects. Of course, this sampling scheme will introduce bias in the assessment of the predictive accuracy of the biomarker under standard ROC estimation methods. We develop a semiparametric empirical likelihood method to estimate a covariate-specific ROC curve and a marginal ROC curve, where the latter is an average of the covariatespecific ROC curves over the covariate distribution. We establish the asymptotic properties of the proposed estimators and give their corresponding variance estimators. Simulation studies show that the proposed estimation method yields good small sample properties and is more efficient than alternative methods. The proposed method is illustrated with an example based on the design of an ongoing lung cancer clinical trial</p>

	]]>
</description>

<author>Junling Ma et al.</author>


</item>






<item>
<title>permGPU: Using Graphics Processing Units in RNA Microarray  Association Studies</title>
<link>http://biostats.bepress.com/dukebiostat/art8</link>
<guid isPermaLink="true">http://biostats.bepress.com/dukebiostat/art8</guid>
<pubDate>Sat, 06 Mar 2010 10:55:53 PST</pubDate>
<description>
	<![CDATA[
	<p>Background: Many analyses of microarray association studies involve permutation and bootstrap resampling, and cross-validation, that are ideally formulated as embarrassingly parallel computing problems. Given that these analyses are computationally intensive, scalable approaches that can take advantage of multi-core processor systems need to be developed.</p>
<p>Results: We have developed a CUDA based implementation, permGPU, that employs graphics processing units in microarray association studies. We  illustrate the performance and applicability of permGPU within the context of permutation resampling for a number of test statistics. An extensive simulation study demonstrates a dramatic increase in performance when using permGPU on an NVIDIA GTX 280 card compared to an optimized C solution running on a conventional Linux server.</p>
<p>Conclusions: permGPU is available as an open-source stand-alone application and as an extension package for the R statistical environment. It provides a dramatic increase in performance for permutation resampling analysis in the context of microarray association studies . The current version offers six test statistics for carrying  out permutation resampling analyses for binary, quantitative and censored time-to-event traits.</p>
<p>The homepage for permGPU: http://code.google.com/p/permgpu/</p>

	]]>
</description>

<author>Ivo D. Shterev et al.</author>


</item>






<item>
<title>Randomized Phase II Clinical Trials using Fisher&apos;s Exact Test</title>
<link>http://biostats.bepress.com/dukebiostat/art7</link>
<guid isPermaLink="true">http://biostats.bepress.com/dukebiostat/art7</guid>
<pubDate>Sat, 06 Mar 2010 10:55:51 PST</pubDate>
<description>
	<![CDATA[
	<p>A typical phase II trial is conducted as a single-arm trial to compare the response probabilities between an experimental therapy and a historical control. Historical control data, however, often have a small sample size, are collected from a different patient population, or use a different response assessment method, so that a direct comparison between a historical control and an experimental therapy may be severely biased. Randomized phase II trials entering patients prospectively to both experimental and control arms have been proposed to avoid any bias in such case. In this paper, we propose two-stage randomized phase II trials based on Fisher's exact test. Through numerical studies, we observe that the proposed method controls the type I error accurately and maintains a high power. If we can specify the response probabilities of two arms under the alternative hypothesis accurately, we can identify good randomized phase II trial designs by adopting the Simon's minimax and optimal design concepts that were developed for single-arm phase II trials.</p>

	]]>
</description>

<author>Sin-Ho Jung</author>


</item>






<item>
<title>A Permutation-Based Multiple Testing Method for Time-Course Microarray Experiments</title>
<link>http://biostats.bepress.com/dukebiostat/art6</link>
<guid isPermaLink="true">http://biostats.bepress.com/dukebiostat/art6</guid>
<pubDate>Mon, 24 Aug 2009 07:33:05 PDT</pubDate>
<description>
	<![CDATA[
	<p>A Permutation-Based Multiple Testing Method for Time-Course Microarray Experiments Background: Time-course microarray experiments are widely used to study the temporal profiles of gene expression. Storey et al. (2005) developed a method for analyzing time-course microarray studies that can be applied to discovering genes whose expression trajectories change over time within a single biological group, or those that follow different time trajectories among multiple groups. They estimated the expression trajectories of each gene using natural cubic splines under the null (no time-course) and alternative (time-course) hypotheses, and used a goodness of fit test statistic the quantify the discrepancy. The null distribution of the statistic was approximated through a bootstrap method. Gene expression levels in microarray data are often complicatedly correlated. An accurate type I error control adjusting for multiple testing requires the joint null distribution of test statistics for a large number of genes. For this purpose, permutation methods have been widely used because of computational ease and their intuitive interpretation.</p>
<p>Results: In this paper, we propose a permutation-based multiple testing procedure based on the test statistic used by Storey et al. (2005). We also propose an efficient computation algorithm. Extensive simulations are conducted to investigate the performance of the permutation-based multiple testing procedure. The application of the proposed method is illustrated using the Caenorhabditis elegans dauer developmental data.</p>
<p>Conclusions: Our method is computationally efficient and applicable for identifying genes whose expression levels are time-dependent in a single biological group and for identifying the genes for which the time-profile depends on the group in a multi-group setting.</p>

	]]>
</description>

<author>Insuk Sohn et al.</author>


</item>






<item>
<title>Two-Stage Phase II Clinical Trials with Heterogeneous Patient Populations</title>
<link>http://biostats.bepress.com/dukebiostat/art5</link>
<guid isPermaLink="true">http://biostats.bepress.com/dukebiostat/art5</guid>
<pubDate>Fri, 12 Jun 2009 09:13:36 PDT</pubDate>
<description>
	<![CDATA[
	<p>The patient population for a phase II trial often consists of multiple subgroups with different prognosis. In this case, a popular design approach is to specify the response rate and the prevalence of each subgroup, to calculate the response rate of the whole population by the weighted average of the response rates across subgroups, and to choose a standard phase II design such as Simon's optimal or minimax design to test on the response rate for the whole population. Although the prevalence of each subgroup is accurately specified, the observed prevalence among the accrued patients to the study may be quite different from the estimated one because of the small sample size, which is typical in most phase II trials. In this case, the fixed rejection value for a chosen standard phase II design may be either too conservative (i.e., increasing the false rejection probability of the experimental therapy) if the trial accrues more high-risk patients than expected or too anti-conservative (i.e., increasing the false acceptance probability of the experimental therapy) if the trial accrues more low-risk patients than expected. We can avoid such problem by adjusting the rejection value depending on the observed prevalence from the trial. In this paper, we investigate two flexible design approaches that choose rejection values depending on the observed prevalence, and compare them under various</p>

	]]>
</description>

<author>Sin-Ho Jung</author>


</item>






<item>
<title>Phase I Clinical Trials With Non-Binary Toxicity Response</title>
<link>http://biostats.bepress.com/dukebiostat/art3</link>
<guid isPermaLink="true">http://biostats.bepress.com/dukebiostat/art3</guid>
<pubDate>Wed, 30 May 2007 10:49:18 PDT</pubDate>
<description>
	<![CDATA[
	<p>Phase I clinical trials are often subject to severe limitations.  The most important one is that they usually allow only for binary response—toxic (1) or non-toxic (0)—rather than a range of responses from 0 to 1.  They also may not allow a new patient to be treated until results for all previous patients are available.  They may assign patients to doses in groups of two or more, rather than individually.  They may require the selected dose to be one of a few pre-specified doses.  The method proposed here addresses these limitations.  The model uses a logistic dose-response curve with two parameters for the mean response.  The response at any dose follows a beta distribution, which entails a third parameter.  The choice of dose for a patient is based on a utility function that reflects the latest estimates of toxicity and of the variance of the estimate of the maximum tolerated dose (MTD).  Simulations show that the method works well, and that a non-binary toxicity measure leads to a far more accurate MTD estimate than does a binary one.</p>

	]]>
</description>

<author>Richard F. Potthoff et al.</author>


</item>






<item>
<title>Bayesian Weibull Tree Models for Clinico-Genomic Prediction of Survival</title>
<link>http://biostats.bepress.com/dukebiostat/art2</link>
<guid isPermaLink="true">http://biostats.bepress.com/dukebiostat/art2</guid>
<pubDate>Wed, 30 May 2007 10:40:07 PDT</pubDate>
<description>
	<![CDATA[
	<p>An important goal of research involving gene expression data for outcome prediction is to establish the ability of genomic data to define clinically relevant risk factors. Recent studies have demonstrated that microarray data can successfully cluster patients into low and high risk categories. However, the need exists for models which examine how genomic predictors interact with existing clinical factors and provide personalized outcome predictions. We have developed clinico-genomic tree models for survival outcomes which use recursive partitioning to subdivide the current data set into homogeneous subgroups of patients, each with a specific Weibull survival distribution. These trees can provide personalized predictive distributions of the probability of survival for individuals of interest. Our strategy is to fit multiple models; within each model we adopt a prior on the Weibull scale parameter and update this prior via Empirical Bayes whenever the sample is split at a given node. The decision to split is based on a Bayes factor criterion. The resulting trees are weighted according to their relative likelihood values and predictions are made by averaging over models. In a pilot study of survival in advanced stage ovarian cancer we demonstrate that clinical and genomic data are complementary sources of information relevant to survival, and we use the exploratory nature of the trees to identify potential genomic biomarkers worthy of further study.</p>

	]]>
</description>

<author>Jennifer Clarke et al.</author>


</item>






<item>
<title>Assessing Individual Agreement</title>
<link>http://biostats.bepress.com/dukebiostat/art1</link>
<guid isPermaLink="true">http://biostats.bepress.com/dukebiostat/art1</guid>
<pubDate>Thu, 24 May 2007 15:28:10 PDT</pubDate>
<description>
	<![CDATA[
	<p>Evaluating agreement between measurement methods or between observers is important in method comparison studies and in reliability studies. Often we are interested in whether a new method can replace an existing invasive or expensive method, or whether multiple methods or multiple observers can be used interchangeably. Ideally, interchangeability is established only if individual measurements from different methods are similar to replicated measurements from the same method.  This is the concept of individual equivalence. Interchangeability between methods is similar to bioequivalence between drugs in bioequivalence studies. Following the FDA guidelines on individual bioequivalence, we propose to assess individual agreement among multiple methods via individual equivalence using the moment criteria. In the case where there is a reference method, we extend the individual bioequivalence criteria to individual equivalence criteria and propose to use individual equivalence coefficient (IEC) to compare multiple methods to one or multiple references. In the case where there is no reference method available, we propose a new IEC to assess individual agreement between multiple methods. Furthermore, we propose a coefficient of individual agreement (CIA) that links the IEC with two recent agreement indices. A method of moments is used for estimation, where one can utilize output from ANOVA models. The bootstrap approach is used to construct one-sided 95% confidence bounds for the IEC and CIA. Five examples are used for illustration.</p>

	]]>
</description>

<author>Huiman Barnhart et al.</author>


</item>





</channel>
</rss>
