<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0">
<channel>
<title>Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology &amp; Biostatistics Working Paper Series</title>
<copyright>Copyright (c) 2013 Memorial Sloan-Kettering Cancer Center All rights reserved.</copyright>
<link>http://biostats.bepress.com/mskccbiostat</link>
<description>Recent documents in Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology &amp; Biostatistics Working Paper Series</description>
<language>en-us</language>
<lastBuildDate>Thu, 09 May 2013 01:48:14 PDT</lastBuildDate>
<ttl>3600</ttl>


	
		
	

	
		
	







<item>
<title>Mixtures of Receiver Operating Characteristic Curves</title>
<link>http://biostats.bepress.com/mskccbiostat/paper27</link>
<guid isPermaLink="true">http://biostats.bepress.com/mskccbiostat/paper27</guid>
<pubDate>Tue, 07 May 2013 07:10:44 PDT</pubDate>
<description>
	<![CDATA[
	<p><strong>Rationale and Objectives:</strong> ROC curves are ubiquitous in the analysis of imaging metrics as markers of both diagnosis and prognosis. While empirical estimation of ROC curves remains the most popular method, there are several reasons to consider smooth estimates based on a parametric model.</p>
<p><strong>Materials and Methods:</strong> A mixture model is considered for modeling the distribution of the marker in the diseased population motivated by the biological observation that there is more heterogeneity in the diseased population than there is in the normal one. It is shown that this model results in an analytically tractable ROC curve which is itself a mixture of ROC curves.</p>
<p><strong>Results:</strong> The use of CK-BB isoenzyme in diagnosis of severe head trauma is used as an example. ROC curves are fit using the direct binormal method, ROCKIT and the Box-Cox transformation as well as the proposed mixture model. The mixture model generates an ROC curve that is much closer to the empirical one than the other methods considered.</p>
<p><strong>Conclusions:</strong> Mixtures of ROC curves can be helpful in fitting smooth ROC curves in datasets where the diseased population has higher variability than can be explained by a single distribution.</p>

	]]>
</description>

<author>Mithat Gonen</author>


</item>






<item>
<title>Visualizing Longitudinal Data with Dropouts</title>
<link>http://biostats.bepress.com/mskccbiostat/paper26</link>
<guid isPermaLink="true">http://biostats.bepress.com/mskccbiostat/paper26</guid>
<pubDate>Tue, 07 May 2013 07:10:42 PDT</pubDate>
<description>
	<![CDATA[
	<p>A triangle plot is proposed to display longitudinal data with dropouts. The triangle plot is a tool of data visualization that can also serve as a graphical check for informativeness of the dropout process. There are similarities between the lasagna plot and the triangle plot but the explicit use of dropout time as an axis is an advantage of the triangle plot over the more commonly used graphical strategies for longitudinal data. It is possible to interpret the triangle plot as a trellis plot 1 which gives rise to several extensions such as the triangle histogram and the triangle boxplot. R code is available to streamline the use of the triangle plot in practice.</p>

	]]>
</description>

<author>Mithat Gonen</author>


</item>






<item>
<title>A Systematic Selection Method for the Development of Cancer Staging Systems</title>
<link>http://biostats.bepress.com/mskccbiostat/paper25</link>
<guid isPermaLink="true">http://biostats.bepress.com/mskccbiostat/paper25</guid>
<pubDate>Tue, 15 May 2012 10:14:23 PDT</pubDate>
<description>
	<![CDATA[
	<p>The tumor-node-metastasis (TNM) staging system has been the anchor of cancer diagnosis, treatment, and prognosis for many years. For meaningful clinical use, an orderly, progressive condensation of the T and N categories into an overall staging system needs to be defined, usually with respect to a time-to-event outcome. This can be considered as a cutpoint selection problem for a censored response partitioned with respect to two ordered categorical covariates and their interaction. The aim is to select the best grouping of the TN categories. A novel bootstrap cutpoint/model selection method is proposed for this task by maximizing bootstrap estimates of the chosen statistical criteria. The criteria are based on prognostic ability including a landmark measure of the explained variation, the area under the ROC curve, and a concordance probability generalized from Harrell's c-index. We illustrate the utility of our method by applying it to the staging of colorectal cancer.</p>

	]]>
</description>

<author>Yunzhi Lin et al.</author>


</item>






<item>
<title>Sparse Integrative Clustering of Multiple Omics Data Sets</title>
<link>http://biostats.bepress.com/mskccbiostat/paper24</link>
<guid isPermaLink="true">http://biostats.bepress.com/mskccbiostat/paper24</guid>
<pubDate>Mon, 13 Feb 2012 07:36:30 PST</pubDate>
<description>
	<![CDATA[
	<p>High resolution microarrays and second-generation sequencing platforms are powerful tools to investigate genome-wide alterations in DNA copy number, methylation, and gene expression associated with a disease. An integrated genomic profiling approach measuring multiple omics data types simultaneously in the same set of biological samples would render an integrated data resolution that would not be available with any single data type. In a previous publication (Shen et al., 2009), we proposed a latent variable regression with a lasso constraint (Tibshirani, 1996) for joint modeling of multiple omics data types to identify common latent variables that can be used to cluster patient samples into biologically and clinically relevant disease subtypes. The resulting sparse coefficient vectors (with many zero elements) can be used to reveal important genomic features that have significant contributions to the latent variables. In this study, we consider a combination of lasso, fused lasso (Tibshirani et al., 2005) and elastic net (Zou & Hastie, 2005) penalties and use an iterative ridge regression to compute the sparse coefficient vectors. In model selection, a uniform design (Fang & Wang, 1994) is used to seek “experimental” points that scattered uniformly across the search domain for efficient sampling of tuning parameter combinations. We compared our method to sparse singular value decomposition (SVD) and penalized Gaussian mixture model (GMM) using both real and simulated data sets. The proposed method is applied to integrate genomic, epigenomic, and transcriptomic data for subtype analysis in breast and lung cancer data sets.</p>

	]]>
</description>

<author>Ronglai Shen et al.</author>


</item>






<item>
<title>Building a Nomogram for Survey-Weighted Cox Models Using R</title>
<link>http://biostats.bepress.com/mskccbiostat/paper23</link>
<guid isPermaLink="true">http://biostats.bepress.com/mskccbiostat/paper23</guid>
<pubDate>Mon, 17 Oct 2011 11:20:34 PDT</pubDate>
<description>
	<![CDATA[
	<p>Nomograms have become a very useful tool among clinicians as they provide individualized predictions based on the characteristics of the patient. For complex design survey data with survival outcome, Binder (1992) proposed methods for fitting survey-weighted Cox models, but to the best of our knowledge there is no available software to build a nomogram based on such models. This paper introduces R software to accomplish this goal and illustrates its use on a gastric cancer dataset. Validation and calibration routines are also included.</p>

	]]>
</description>

<author>Marinela Capanu et al.</author>


</item>






<item>
<title>A Hybrid Bayesian Laplacian Approach for Generalized Linear Mixed Models</title>
<link>http://biostats.bepress.com/mskccbiostat/paper22</link>
<guid isPermaLink="true">http://biostats.bepress.com/mskccbiostat/paper22</guid>
<pubDate>Mon, 17 Oct 2011 11:20:32 PDT</pubDate>
<description>
	<![CDATA[
	<p>The analytical intractability of generalized linear mixed models (GLMMs) has generated a lot of research in the past two decades. Applied statisticians routinely face the frustrating prospect of widely disparate results produced by the methods that are currently implemented in commercially available software. This article is motivated by this frustration and develops guidance as well as new methods that are computationally efficient and statistically reliable. Two main classes of approximations have been developed: likelihood-based methods and Bayesian methods. Likelihood-based methods such as the penalized quasi-likelihood approach of Breslow and Clayton (1993) have been shown to produce biased estimates especially for binary clustered data with small clusters sizes. More recent methods such as the adaptive Gaussian quadrature approach perform well but can be overwhelmed by problems with large numbers of random effects, and efficient algorithms to better handle these situations have not yet been integrated in standard statistical packages. Similarly, Bayesian methods, though they have good frequentist properties when the model is correct, are known to be computationally intensive and also require specialized code, limiting their use in practice. In this article we build on our previous method (Capanu and Begg 2010) and propose a hybrid approach that provides a bridge between the likelihood-based and Bayesian approaches by employing Bayesian estimation for the variance compo- nents followed by Laplacian estimation for the regression coefficients with the goal of obtaining good statistical properties, with relatively good computing speed, and using widely available software. The hybrid approach is shown to perform well against the other competitors considered. Another impor- tant finding of this research is the surprisingly good performance of the Laplacian approximation in the difficult case of binary clustered data with small clusters sizes. We apply the methods to a real study of head and neck squamous cell carcinoma and illustrate their properties using simulations based on a widely-analyzed salamander mating dataset and on another important dataset involving the Guatemalan Child Health survey.</p>

	]]>
</description>

<author>Marinela Capanu et al.</author>


</item>






<item>
<title>Bland-Altman Plots for Evaluating Agreement Between Solid Tumor Measurements</title>
<link>http://biostats.bepress.com/mskccbiostat/paper21</link>
<guid isPermaLink="true">http://biostats.bepress.com/mskccbiostat/paper21</guid>
<pubDate>Mon, 17 Oct 2011 10:26:13 PDT</pubDate>
<description>
	<![CDATA[
	<p>Rationale and Objectives. Solid tumor measurements are regularly used in clinical trials of anticancer therapeutic agents and in clinical practice managing patients' care. Consequently studies evaluating the reproducibility of solid tumor measurements are important as lack of reproducibility may directly affect patient management. The authors propose utilizing a modified Bland-Altman plot with a difference metric that lends itself naturally to this situation and facilitates interpretation.  Materials and Methods. The modification to the Bland-Altman plot involves replacing the difference plotted on the vertical axis with the relative percent change (RC) between the two measurements. This quantity is the same one used in assessing tumor response to therapeutic agents and is very familiar to radiologists and clinicians working with cancer patients.The distribution of the RC is explored and revised equations for the limits of agreement (LoA) are presented. These methods are applied to positron emission tomography (PET) data studying two radiotracers.  Results. The RC can be calculated separately for each lesion measured or at the patient level by summing over lesions within patient. In both cases, the distribution of the RC is highly skewed and is approximated by a negative shifted lognormal distribution. The standard equations for the 95% LoA assume the differences are approximately normally distributed and are not appropriate for the RC.  Conclusions. The modified Bland-Altman plot with correctly calculated LoA can aid in evaluating agreement between solid tumor measurements.</p>

	]]>
</description>

<author>Chaya S. Moskowitz et al.</author>


</item>






<item>
<title>Comparing ROC Curves Derived From Regression Models</title>
<link>http://biostats.bepress.com/mskccbiostat/paper20</link>
<guid isPermaLink="true">http://biostats.bepress.com/mskccbiostat/paper20</guid>
<pubDate>Fri, 10 Jun 2011 15:07:31 PDT</pubDate>
<description>
	<![CDATA[
	<p>In constructing predictive models, investigators frequently assess the incremental value of a predictive marker by comparing the ROC curve generated from the predictive model including the new marker with the ROC curve from the model excluding the new marker. Many commentators have noticed empirically that a test of the two ROC areas often produces a non-significant result when a corresponding Wald test from the underlying regression model is significant. A recent article showed using simulations that the widely-used ROC area test [1] produces exceptionally conservative test size and extremely low power [2]. In this article we show why the ROC area test is invalid in this context. We demonstrate how a valid test of the ROC areas can be constructed that has comparable statistical properties to the Wald test. We conclude that using the Wald test to assess the incremental contribution of a marker remains the best strategy. We also examine the use of derived markers from non-nested models and the use of validation samples. We show that comparing ROC areas is invalid in these contexts as well.</p>

	]]>
</description>

<author>Venkatraman E. Seshan et al.</author>


</item>






<item>
<title>Assessing noninferiority in a three-arm trial using the Bayesian Approach</title>
<link>http://biostats.bepress.com/mskccbiostat/paper19</link>
<guid isPermaLink="true">http://biostats.bepress.com/mskccbiostat/paper19</guid>
<pubDate>Sat, 08 May 2010 20:58:23 PDT</pubDate>
<description>
	<![CDATA[
	<p>Non-inferiority trials, which aim to demonstrate that a test product is not worse than a competitor by more than a pre-specified small amount, are of great importance to the pharmaceutical community. As a result, methodology for designing and analyzing such trials is required, and developing new methods for such analysis is an important area of statistical research. The three-arm clinical trial is usually recommended for non-inferiority trials by the Food and Drug Administration (FDA). The three-arm trial consists of a placebo, a reference, and an experimental treatment, and simultaneously tests the superiority of the reference over the placebo along with comparing this reference to an experimental treatment. In this paper, we consider the analysis of noninferiority trials using Bayesian methods which incorporate both parametric as well as semi-parametric models. The resulting testing approach is both flexible and robust. The benefit of the proposed Bayesian methods is assessed via simulation, based on a study examining Home Based Blood Pressure Interventions.</p>

	]]>
</description>

<author>Pulak Ghosh et al.</author>


</item>






<item>
<title>Integrative Clustering of Multiple Genomic Data Types using a Joint Latent Variable Model with Application to Breast and Lung Cancer Subtype Analysis</title>
<link>http://biostats.bepress.com/mskccbiostat/paper18</link>
<guid isPermaLink="true">http://biostats.bepress.com/mskccbiostat/paper18</guid>
<pubDate>Fri, 11 Sep 2009 12:34:09 PDT</pubDate>
<description>
	<![CDATA[
	<p>The molecular complexity of a tumor manifests itself at the genomic, epigenomic, transcriptomic, and proteomic levels. Genomic profiling at these multiple levels should allow an integrated characterization of tumor etiology. However, there is a shortage of effective statistical and bioinformatic tools for truly integrative data analysis. The standard approach to integrative clustering is separate clustering followed by manual integration. A more statistically powerful approach would incorporate all data types simultaneously and generate a single integrated cluster assignment. We developed a joint latent variable model for integrative clustering. We call the resulting methodology iCluster. iCluster incorporates flexible modeling of the associations between different data types and the variance-covariance structure within data types in a single framework, while simultaneously reducing the dimensionality of the data sets. Likelihood-based inference is obtained through the Expectation-Maximization algorithm. We demonstrate the iCluster algorithm using two examples of joint analysis of copy number and gene expression data, one from breast cancer and one from lung cancer. In both cases, we identified subtypes characterized by concordant DNA copy number changes and gene expression as well as unique profiles specific to one or the other in a completely automated fashion. In addition, the algorithm discovers potentially novel subtypes by combining weak yet consistent alteration patterns across data types. R code to implement iCluster can be downloaded at http://www.mskcc.org/mskcc/html/85130.cfm.</p>

	]]>
</description>

<author>Ronglai Shen et al.</author>


</item>






<item>
<title>A classification model for distinguishing copy number variants from cancer-related alterations</title>
<link>http://biostats.bepress.com/mskccbiostat/paper17</link>
<guid isPermaLink="true">http://biostats.bepress.com/mskccbiostat/paper17</guid>
<pubDate>Fri, 28 Aug 2009 07:28:19 PDT</pubDate>
<description>
	<![CDATA[
	<p>Both somatic copy number alterations (CNAs) and germline copy number variants (CNVs) that are prevalent in healthy individuals can appear as recurrent changes in comparative genomic hybridization (CGH) analyses of tumors. In order to identify important cancer genes CNAs and CNVs must be distinguished. Although the Database of Genomic Variants (Iafrate et al., 2004) contains a list of all known CNVs, there is no standard methodology to use the database effectively.</p>
<p>We develop a prediction model that distinguishes CNVs from CNAs based on the information contained in the Database and several other variables, including potential CNV’s length, height, closeness to a telomere or centromere and occurrence in other patients. The models are fitted on data from glioblastoma and their corresponding normal samples that were collected as part of The Cancer Genome Atlas project and hybridized on Agilent 244K arrays. Using the Database alone CNVs can be correctly identified with about 85% accuracy if the outliers are removed before segmentation and with 72% accuracy if the outliers are included, and additional variables improve the prediction by about 2-3% and 12%, respectively.</p>

	]]>
</description>

<author>Irina Ostrovnaya et al.</author>


</item>






<item>
<title>Optimal Cutpoint Estimation with Censored Data</title>
<link>http://biostats.bepress.com/mskccbiostat/paper16</link>
<guid isPermaLink="true">http://biostats.bepress.com/mskccbiostat/paper16</guid>
<pubDate>Thu, 13 Nov 2008 13:27:52 PST</pubDate>
<description>
	<![CDATA[
	<p>We consider the problem of selecting an optimal cutpoint for a continuous marker when the outcome of interest is subject to right censoring. Maximal chi square methods and receiver operating characteristic (ROC) curves-based methods are commonly-used when the outcome is binary. In this article we show that selecting the cutpoint that maximizes the concordance, a metric similar to the area under an ROC curve, is equivalent to maximizing the Youden index, a popular criterion when the ROC curve is used to choose a threshold. We use this as a basis for proposing maximal concordance as a metric to use with censored endpoints. Through simulations we evaluate the performance of two concordance estimates and three chi-square statistics under various assumptions. Maximizing the partial likelihood ratio test statistic has the best performance in our simulations.</p>

	]]>
</description>

<author>Mithat Gonen et al.</author>


</item>






<item>
<title>A Metastasis or a Second Independent Cancer? Evaluating the Clonal Origin of Tumors Using Array-CGH Data</title>
<link>http://biostats.bepress.com/mskccbiostat/paper15</link>
<guid isPermaLink="true">http://biostats.bepress.com/mskccbiostat/paper15</guid>
<pubDate>Thu, 14 Aug 2008 14:02:27 PDT</pubDate>
<description>
	<![CDATA[
	<p>When a cancer patient develops a new tumor it is necessary to determine if this is a recurrence (metastasis) of the original cancer, or an entirely new occurrence of the disease. This is accomplished by assessing the histo-pathology of the lesions, and it is frequently relatively straightforward. However, there are many clinical scenarios in which this pathological diagnosis is difficult. Since each tumor is characterized by a genetic fingerprint of somatic mutations, a more definitive diagnosis is possible in principle in these difficult clinical scenarios by comparing the fingerprints. In this article we develop and evaluate a statistical strategy for this comparison when the data are derived from array comparative genomic hybridization, a technique designed to identify all of the somatic allelic gains and losses across the genome. Our method involves several stages. First a segmentation algorithm is used to estimate the regions of allelic gain and loss. Then the broad correlation in these patterns between the two tumors is assessed, leading to an initial likelihood ratio for the two diagnoses. This is then further refined by comparing in detail each plausibly clonal mutation within individual chromosome arms, and the results are aggregated to determine a final likelihood ratio. The method is employed to diagnose patients from several clinical scenarios, and the results show that in many cases a strong clonal signal emerges, occasionally contradicting the clinical diagnosis. The “quality” of the arrays can be summarized by a parameter that characterizes the clarity with which allelic changes are detected. Sensitivity analyses show that most of the diagnoses are robust when the data are of high quality.</p>

	]]>
</description>

<author>Irina Ostrovnaya et al.</author>


</item>






<item>
<title>On Comparing the Clustering of Regression Models Method with K-means Clustering</title>
<link>http://biostats.bepress.com/mskccbiostat/paper14</link>
<guid isPermaLink="true">http://biostats.bepress.com/mskccbiostat/paper14</guid>
<pubDate>Mon, 26 Mar 2007 08:55:45 PDT</pubDate>
<description>
	<![CDATA[
	<p>Gene clustering is a common question addressed with microarray data. Previous methods, such as K-means clustering and hierarchical clustering, base gene clustering directly on the observed measurements. A new model-based clustering method, the clustering of regression models (CORM) method, bases the clustering of genes on their relationship to covariates. It explicitly models different sources of variations and bases gene clustering solely on the systematic variation. Both being partitional clustering, CORM is closely related to K-means clustering. In this paper, we discuss the relationship between the two clustering methods in terms of both model formulation and implications on other important aspects of cluster analysis. We show that the two methods can both be considered as solutions to a least squares problem with missing data but they each concern a different type of least squares. We also show that CORM tends to provide stable clusters across samples and is particularly useful if the cluster averages are used as predictors for sample classification. Finally we illustrate the application of CORM to a set of time course data measured on four yeast samples, which has a complicated experimental design and is difficult for K-means to handle.</p>

	]]>
</description>

<author>Li-Xuan Qin et al.</author>


</item>






<item>
<title>Statistical Evaluation of Evidence for Clonal Allelic Alterations in array-CGH Experiments</title>
<link>http://biostats.bepress.com/mskccbiostat/paper13</link>
<guid isPermaLink="true">http://biostats.bepress.com/mskccbiostat/paper13</guid>
<pubDate>Tue, 06 Mar 2007 13:31:12 PST</pubDate>
<description>
	<![CDATA[
	<p>In recent years numerous investigators have conducted genetic studies of pairs of tumor specimens from the same patient to determine whether the tumors share a clonal origin. These studies have the potential to be of considerable clinical significance, especially in clinical settings where the distinction of a new primary cancer and metastatic spread of a previous cancer would lead to radically different indications for treatment. Studies of clonality have typically involved comparison of the patterns of somatic mutations in the tumors at candidate genetic loci to see if the patterns are sufficiently similar to indicate a clonal origin. More recently, some investigators have explored the use of array CGH for this purpose. Standard clustering approaches have been used to analyze the data, but these existing statistical methods are not suited to this problem due to the paired nature of the data, and the fact that there exists no “gold standard” diagnosis to provide a definitive determination of which pairs are clonal and which pairs are of independent origin. In this article we propose a new statistical method that focuses on the individual allelic gains or losses that have been identified in both tumors, and a statistical test is developed that assesses the degree of matching of the locations of the markers that indicate the endpoints of the allelic change. The validity and statistical power of the test is evaluated, and it is shown to be a promising approach for establishing clonality in tumor samples.</p>

	]]>
</description>

<author>Colin B. Begg et al.</author>


<category>Genetics</category>

</item>






<item>
<title>Estimating the Empirical Lorenz Curve and Gini Coefficient in the Presence of Error</title>
<link>http://biostats.bepress.com/mskccbiostat/paper12</link>
<guid isPermaLink="true">http://biostats.bepress.com/mskccbiostat/paper12</guid>
<pubDate>Tue, 16 Jan 2007 12:56:26 PST</pubDate>
<description>
	<![CDATA[
	<p>The Lorenz curve is a graphical tool that is widely used to characterize the concentration of a measure in a population, such as wealth. It is frequently the case that the measure of interest used to rank experimental units when estimating the empirical Lorenz curve, and the corresponding Gini coefficient, is subject to random error. This error can result in an incorrect ranking of experimental units which inevitably leads to a curve that exaggerates the degree of concentration (variation) in the population.   We explore this bias and discuss several widely available statistical methods that have the potential to reduce or remove the bias in the empirical Lorenz curve.  The properties of these methods are examined and compared in a simulation study.  This work is motivated by a health outcomes application which seeks to assess the concentration of black patient visits among primary care physicians.  The methods are illustrated on data from this study.</p>

	]]>
</description>

<author>Chaya S. Moskowitz et al.</author>


</item>






<item>
<title>Lehmann Family of ROC Curves</title>
<link>http://biostats.bepress.com/mskccbiostat/paper11</link>
<guid isPermaLink="true">http://biostats.bepress.com/mskccbiostat/paper11</guid>
<pubDate>Wed, 20 Dec 2006 11:27:26 PST</pubDate>
<description>
	<![CDATA[
	<p>Receiver operating characteristic (ROC) curves are useful in evaluating the ability of a continuous marker in discriminating between the two states of a binary outcome such as diseased/not diseased. The most popular parametric model for an ROC curve is the binormal model which assumes that the marker is normally distributed conditional on the outcome. Here we present an alternative to the binormal model based on the Lehmann family, also known as the proportional hazards specification. The resulting ROC curve and its functionals (such as the area under the curve) have simple analytic forms. We derive closed-form expressions for the asymptotic variances of the estimators for various quantities of interest. This family easily accommodates comparison of multiple markers, covariate adjustments and clustered data through a regression formulation. Evaluation of the underlying assumptions, model fitting and model selection can all be performed using any off the shelf proportional hazards statistical software package.</p>

	]]>
</description>

<author>Mithat Gonen et al.</author>


</item>






<item>
<title>Smoothed Rank Regression with Censored Data</title>
<link>http://biostats.bepress.com/mskccbiostat/paper10</link>
<guid isPermaLink="true">http://biostats.bepress.com/mskccbiostat/paper10</guid>
<pubDate>Wed, 08 Nov 2006 20:55:02 PST</pubDate>
<description>
	<![CDATA[
	<p>A weighted rank estimating function is proposed to estimate the regression parameter vector in an accelerated failure time model with right censored data. In general, rank estimating functions are discontinuous in the regression parameter, creating difficulties in determining the asymptotic distribution of the estimator. A local distribution function is used to create a rank based estimating function that is continuous and monotone in the regression parameter vector. A weight is included in the estimating function to produce a bounded influence estimate. The asymptotic distribution of the regression estimator is developed and simulations are performed to examine its finite sample properties. A lung cancer data set is used to illustrate the methodology.</p>

	]]>
</description>

<author>glenn heller</author>


</item>






<item>
<title>A Faster Circular Binary Segmentation Algorithm for the Analysis of Array CGH Data</title>
<link>http://biostats.bepress.com/mskccbiostat/paper9</link>
<guid isPermaLink="true">http://biostats.bepress.com/mskccbiostat/paper9</guid>
<pubDate>Wed, 07 Jun 2006 08:54:30 PDT</pubDate>
<description>
	<![CDATA[
	<p>Motivation: Array CGH technologies enable the simultaneous measurement of  DNA copy number for thousands of sites on a genome.  We developed the circular  binary segmentation (CBS) algorithm to divide the genome into regions of equal  copy number (Olshen {\it et~al}, 2004).  The algorithm tests for change-points  using a maximal $t$-statistic with a permutation reference distribution to  obtain the corresponding $p$-value.  The number of computations required for the  maximal test statistic is $O(N^2),$ where $N$ is the number of markers.  This  makes the full permutation approach computationally prohibitive for the newer  arrays that contain tens of thousands markers and highlights the need for a  faster. algorithm.</p>
<p>Results: We present a hybrid approach to obtain the $p$-value of the test  statistic in linear time.  We also introduce a rule for stopping early when  there is strong evidence for the presence of a change.  We show through  simulations that the hybrid approach provides a substantial gain in speed with  only a negligible loss in accuracy and that the stopping rule further increases  speed.  We also present the analysis of array CGH data from a breast cancer cell  line to show the impact of the new approaches on the analysis of real data.</p>
<p>Availability: An R (R Development Core Team, 2006) version of the CBS algorithm  has been implemented in the ``DNAcopy'' package of the Bioconductor project  (Gentleman {\it et~al}, 2004).  The proposed hybrid method for the $p$-value is  available in version 1.2.1 or higher and the stopping rule for declaring a  change early is available in version 1.5.1 or higher.</p>

	]]>
</description>

<author>E S. Venkatraman et al.</author>


<category>Genetics</category>

</item>






<item>
<title>Semiparametric Bayesian Modeling of Multivariate Average Bioequivalence</title>
<link>http://biostats.bepress.com/mskccbiostat/paper8</link>
<guid isPermaLink="true">http://biostats.bepress.com/mskccbiostat/paper8</guid>
<pubDate>Wed, 03 May 2006 10:57:20 PDT</pubDate>
<description>
	<![CDATA[
	<p>Bioequivalence trials are usually conducted to compare two or more formulations of a drug. Simultaneous assessment of bioequivalence on multiple endpoints is called multivariate bioequivalence. Despite the fact that some tests for multivariate bioequivalence are suggested, current practice usually involves univariate bioequivalence assessments ignoring the correlations between the endpoints such as AUC and Cmax. In this paper we develop a semiparametric Bayesian test for bioequivalence under multiple endpoints. Specifically, we show how the correlation between the endpoints can be incorporated in the analysis and how this correlation affects the inference. Resulting estimates and posterior probabilities ``borrow strength'' from one another where the amount and direction of the strength borrowed are determined by the prior correlations. The method developed is illustrated using a real data set.</p>

	]]>
</description>

<author>Pulak Ghosh Dr. et al.</author>


</item>





</channel>
</rss>
