<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0">
<channel>
<title>UPenn Biostatistics Working Papers</title>
<copyright>Copyright (c) 2013 University of Pennsylvania All rights reserved.</copyright>
<link>http://biostats.bepress.com/upennbiostat</link>
<description>Recent documents in UPenn Biostatistics Working Papers</description>
<language>en-us</language>
<lastBuildDate>Wed, 23 Jan 2013 22:07:22 PST</lastBuildDate>
<ttl>3600</ttl>








<item>
<title>Bayesian Methods for Network-Structured Genomics Data</title>
<link>http://biostats.bepress.com/upennbiostat/art34</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/art34</guid>
<pubDate>Tue, 05 Jan 2010 09:18:05 PST</pubDate>
<description>
	<![CDATA[
	<p>Graphs and networks are common ways of depicting information. In biology, many different processes are represented by graphs, such as regulatory networks, metabolic pathways and protein-protein interaction networks. This information provides useful supplement to the standard numerical genomic  data such as microarray gene expression data.  Effectively utilizing such an information can lead to a better identification of biologically relevant genomic features in the context of our prior biological knowledge.  In this paper,  we present a Bayesian variable selection procedure for network-structured covariates for both Gaussian linear and probit models. The key of our approach is the introduction of a Markov random field prior for the indicator variables that describe which covariates should be included in the model and the use of the Wolff algorithm for Markov Chain Monte Carlo inference. We illustrate the proposed procedure with simulations and with an analysis of  genomic data. Finally, we present some other areas of genomics research where  novel Bayesian approaches may play important roles.</p>

	]]>
</description>

<author>Stefano Monni et al.</author>


</item>






<item>
<title>Quasi-Least Squares with Mixed Linear Correlation Structures</title>
<link>http://biostats.bepress.com/upennbiostat/art33</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/art33</guid>
<pubDate>Thu, 08 Oct 2009 12:56:48 PDT</pubDate>
<description>
	<![CDATA[
	<p>Quasi-least squares (QLS) is a two-stage computational approach for estimation of the correlation parameters in the framework of generalized estimating equations (GEE). We prove two general results for the class of mixed linear correlation structures: namely, that the stage one QLS estimate of the correlation parameter always exists and is feasible (yields a positive definite estimated correlation matrix) for any correlation structure, while the stage two estimator exists and is unique (and therefore consistent) with probability one, for the class of mixed linear correlation structures. Our general results justify the implementation of QLS for particular members of the class of mixed linear correlation structures that are appropriate for the analysis of familial data, with families that vary in size and composition. We describe the familial structures and implement them in an analysis of optical spherical values in the Old Order Amish (OOA). For the OOA analysis, we show that we would suffer a substantial loss in efficiency, if the familial structures were the true structures, but were misspecified as simpler approximate structures. We also provide software for implementation of the familial structures in R.  Key-Words: Quasi-least squares; linear correlation structure; mixed correlation structure; familial data.</p>

	]]>
</description>

<author>Jichun Xie et al.</author>


</item>






<item>
<title>&quot;Implementation of quasi-least squares With the R package qlspack&quot;</title>
<link>http://biostats.bepress.com/upennbiostat/art32</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/art32</guid>
<pubDate>Wed, 17 Jun 2009 08:25:14 PDT</pubDate>
<description>
	<![CDATA[
	<p>Quasi-least squares (QLS)  is an alternative method for estimating the  correlation parameters within the framework of generalized estimating  equations (GEE) that has two main  advantages over the moment estimates that  are typically applied for GEE: (1) It guarantees a consistent estimate of  the correlation parameter and a positive definite estimated  correlation matrix, for several correlation structures; and (2)  It allows for  easier implementation of some correlation structures that have not  yet been implemented in the framework of GEE. Furthermore, because QLS is a  method in the framework of GEE, existing software can be employed within  the QLS  algorithm for estimation of the correlation and regression parameters. In this manuscript  we describe and demonstrate the user written package qlspack  that allows for implementation of QLS in R software. Our package qlspack calls up the  geepack package Yan (2002) and Halekoh et al. (2006) to update the  estimate of the regression parameter at the current QLS  estimate of the correlation parameter; hence, geepack related functions for  standard error estimation can be used after implementing  qlspack.</p>

	]]>
</description>

<author>Jichun Xie et al.</author>


</item>






<item>
<title>A Hidden Markov Random Field Model for Genome-wide Association Studies</title>
<link>http://biostats.bepress.com/upennbiostat/art31</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/art31</guid>
<pubDate>Mon, 05 Jan 2009 06:49:54 PST</pubDate>
<description>
	<![CDATA[
	<p>Genome-wide association studies (GWAS) are increasingly utilized for identifying novel susceptible genetic variants for complex traits, but there is little consensus on  analysis methods for such data. Most commonly used methods include single SNP analysis or haplotype analysis with Bonferroni correction for multiple comparisons. Since the SNPs in typical GWAS are often in linkage disequilibrium (LD), at least locally, Bonferonni correction of multiple comparisons often leads to conservative error control and therefore lower statistical power.  In this paper, we propose a hidden Markov random field model (HMRF)  for GWAS analysis based on a weighted LD graph built from the prior LD information among the SNPs and an efficient iterative conditional mode algorithm for estimating the model parameters. This model effectively utilizes the LD information in calculating the posterior probability that a SNP is associated with the disease. These posterior probabilities can then be used to define a false discovery controlling procedure in order to select the disease-associated SNPs. Simulation studies demonstrated the potential gain in power over single SNP analysis.  The proposed method is especially  effective in  identifying SNPs with borderline significance at the single-marker level that nonetheless are in high  LD with significant SNPs. In addition, by simultaneously   considering the SNPs in LD, the proposed method can also help to reduce the number of  false identifications   of disease-associated SNPs.   We demonstrate the application of the proposed HMRF model using data from a case-control genome-wide association study of neuroblastoma and identify one new SNP that is potentially associated with neuroblastoma.</p>

	]]>
</description>

<author>HongZhe Li et al.</author>


<category>Genetics</category>

</item>






<item>
<title>Analysis of Adverse Events in Drug Safety: A Multivariate Approach Using Stratified Quasi-least Squares</title>
<link>http://biostats.bepress.com/upennbiostat/art29</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/art29</guid>
<pubDate>Sun, 28 Dec 2008 17:06:27 PST</pubDate>
<description>
	<![CDATA[
	<p>Safety assessment in drug development involves numerous statistical challenges, and yet statistical methodologies and their applications to safety data have not been fully developed, despite a recent increase of interest in this area. In practice, a conventional univariate approach for analysis of safety data involves application of the Fisher's exact test to compare the proportion of subjects who experience adverse events (AEs) between treatment groups; This approach ignores several common features of safety data, including the presence of multiple endpoints, longitudinal follow-up, and a possible relationship between the AEs within body systems. In this article, we propose various regression modeling strategies to model multiple longitudinal AEs that are biologically classified into different body systems via the stratified quasi-least squares (SQLS) method. We then analyze safety data from a clinical drug development program at Wyeth Research that compared an experimental drug with a standard treatment using SQLS, which could be a superior alternative to application of the Fisher's exact test.</p>

	]]>
</description>

<author>Hanjoo Kim et al.</author>


</item>






<item>
<title>A Network-constrained Empirical Bayes Method for Analysis of Genomic Data</title>
<link>http://biostats.bepress.com/upennbiostat/art28</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/art28</guid>
<pubDate>Wed, 29 Oct 2008 07:07:06 PDT</pubDate>
<description>
	<![CDATA[
	<p>Empirical Bayes methods are widely used in the analysis of microarray gene expression data in order to identify the differentially expressed genes or genes that are associated with other general phenotypes.  Available methods often assume that genes are independent. However, genes are expected to function interactively and to form  molecular modules to affect the phenotypes. In order to account for regulatory dependency among genes, we propose in this paper a network-constrained empirical Bayes method for analyzing genomic data in the framework of general linear models, where the dependency of genes is modeled by a discrete Markov random field model defined on a pre-defined biological network. This method provides a statistical framework for integrating the known biological network information into the analysis of genomic data. We present an iterated conditional mode algorithm for parameter estimation and for estimating the posterior probabilities using Gibbs sampling. We demonstrate the application of the proposed methods using simulations and analysis of a human brain aging  microarray gene expression data set.</p>

	]]>
</description>

<author>Caiyan Li et al.</author>


</item>






<item>
<title>&quot;%QLS SAS Macro: A SAS macro for Analysis of Longitudinal Data Using Quasi-Least Squares&quot;.</title>
<link>http://biostats.bepress.com/upennbiostat/art27</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/art27</guid>
<pubDate>Tue, 05 Aug 2008 09:05:29 PDT</pubDate>
<description>
	<![CDATA[
	<p>Quasi-least squares (QLS) is an alternative computational approach for estimation of the correlation parameter in the framework of generalized estimating equations (GEE). QLS overcomes some limitations of GEE that were discussed in Crowder (Biometrika 82 (1995) 407-410). In addition, it allows for easier implementation of some correlation structures that are not available for GEE. We describe a user written SAS macro called %QLS, and demonstrate application of our macro using a clinical trial example for the comparison of two treatments for a common toenail infection. %QLS also computes the lower and upper boundaries of the correlation parameter for analysis of longitudinal binary data that were described by Prentice (Biometrics 44 (1988), 1033-1048). Furthermore, it displays a warning message if the Prentice constraints are violated; This warning is not provided in existing GEE software packages and other packages that were recently developed for application of QLS (in Stata, Matlab, and R). %QLS allows for analysis of normal, binary, or Poisson data with one of the following working correlation structures: the first-order autoregressive (AR(1)), equicorrelated, Markov, or tri-diagonal structures. Keywords: longitudinal data, generalized estimating equations, quasi-least squares, SAS.</p>

	]]>
</description>

<author>Hanjoo Kim et al.</author>


</item>






<item>
<title>On the designation of the patterned associations for longitudinal Bernoulli data:  weight matrix versus true correlation structure?</title>
<link>http://biostats.bepress.com/upennbiostat/art26</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/art26</guid>
<pubDate>Wed, 02 Jul 2008 09:33:49 PDT</pubDate>
<description>
	<![CDATA[
	<p>Due to potential violation of standard constraints for the correlation for binary data, it has been argued recently that the working correlation matrix  should be viewed as a weight matrix that should not be confused with the true correlation structure. We propose two arguments to support our view to the contrary for the first-order autoregressive AR(1) correlation matrix. First, we prove that the standard constraints are not unduly restrictive for the AR(1) structure that is plausible for longitudinal data; furthermore, for the logit link function the upper boundary value only depends on the regression parameter and the change in covariate values between successive measurements. In addition, for given marginal means and parameter $\alpha$, we provide a general proof that satisfaction of the standard constraints for consecutive marginal means will guarantee the existence of a compatible multivariate distribution with an AR(1) structure. The relative laxity of the standard constraints for the AR(1) structure coupled with the existence of a simple model that yields data with an AR(1) structure bolsters our view that for the AR(1) structure at least, it is appropriate to view this model as a correlation structure versus a weight matrix.</p>

	]]>
</description>

<author>Hanjoo Kim et al.</author>


</item>






<item>
<title>U-Statistics-based Tests for Multiple Genes  in Genetic Association Studies</title>
<link>http://biostats.bepress.com/upennbiostat/art25</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/art25</guid>
<pubDate>Fri, 25 Apr 2008 07:42:00 PDT</pubDate>
<description>
	<![CDATA[
	<p>Abstract: As our understanding of biological pathways and the genes that regulate these pathways increases, consideration of these biological pathways has become an increasingly important part of genetic and molecular epidemiology. Pathway-based genetic association studies often involve genotyping of variants in genes acting in  certain biological pathways. Such pathway-based genetic association studies can potentially capture the highly heterogeneous nature of many complex traits, with multiple causative loci and multiple alleles at some of the causative loci. In this paper, we  develop two nonparametric test statistics that consider simultaneously the effects of multiple markers. Our approach, which is based on data-adaptive U-statistics, can handle both qualitative data such as  case-control data and quantitative continuous phenotype data. Simulations demonstrate that our proposed methods are more powerful than standard methods, especially  when there are multiple risk loci each with small genetic effects. When the number of disease-predisposing genes is small, the data-adaptive weighting of the  U-statistics over all the markers produces similar power to commonly used single marker tests. We further illustrate the potential merits of our proposed tests in the analysis of a data set from a pathway-based candidate gene association study of breast cancer and hormone metabolism pathways. Finally, potential applications of the proposed tests to genome-wide association studies are also discussed.</p>

	]]>
</description>

<author>zhi wei et al.</author>


<category>Genetics</category>

</item>






<item>
<title>Incorporation of  Genetic Pathway Information into Analysis of Multivariate  Gene Expression Data</title>
<link>http://biostats.bepress.com/upennbiostat/art24</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/art24</guid>
<pubDate>Mon, 14 Apr 2008 09:52:09 PDT</pubDate>
<description>
	<![CDATA[
	<p>Abstract: Multivariate microarray  gene expression data are commonly collected to study the genomic responses under ordered conditions such as over increasing/decreasing dose levels or over time during biological processes. One important question from such multivariate gene expression experiments is to identify genes that show different expression patterns  over treatment dosages or over time and  pathways that are perturbed during a given biological process. In this paper, we develop a hidden Markov random field model for multivariate expression data in order to identify genes and subnetworks that are related to biological processes, where the dependency of the differential expression patterns of genes on the networks are modeled by a Markov random field. Simulation studies indicated that the method  is quite effective in identifying genes and the modified subnetworks and has higher sensitivity  than the commonly used procedures that do not use the pathway information, with similar observed false discovery rates. We applied the proposed methods for analysis of a microarray time course gene expression study of TrkA- and TrkB-transfected neuroblastoma cell lines and  identified genes and subnetworks  on  MAPK, focal adhesion and prion disease pathways that may explain cell  differentiation in TrkA-transfected cell lines.</p>

	]]>
</description>

<author>zhi wei et al.</author>


</item>






<item>
<title>Network-constrained Regularization and Variable Selection for Analysis of Genomic Data</title>
<link>http://biostats.bepress.com/upennbiostat/art23</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/art23</guid>
<pubDate>Mon, 10 Dec 2007 06:49:55 PST</pubDate>
<description>
	<![CDATA[
	<p>Graphs or networks are common ways of depicting information. In biology in particular, many different biological processes are represented by graphs, such as regulatory networks or metabolic pathways. This kind of {\it a priori} information gathered over many years of biomedical research is a useful supplement to the standard numerical genomic data such as microarray gene expression data. How to incorporate information encoded by the known biological networks or graphs into analysis of numerical data raises interesting statistical challenges. In this paper, we introduce a network-constrained regularization procedure for linear regression analysis in order to incorporate the information from these graphs into an analysis of the numerical data, where the network is represented as a graph and its corresponding Laplacian matrix. We define a network-constrained penalty function that penalizes the $L_1$-norm of the coefficients but encourages smoothness of the coefficients on the network. An efficient algorithm is also proposed for computing the network-constrained regularization paths, much like the Lars algorithm does for the lasso. We illustrate the methods using simulated data and analysis of a microarray gene expression data set of glioblastoma.</p>

	]]>
</description>

<author>Caiyan Li et al.</author>


</item>






<item>
<title>Vertex Clustering in Random Graphs  via Reversible Jump Markov Chain Monte Carlo</title>
<link>http://biostats.bepress.com/upennbiostat/art22</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/art22</guid>
<pubDate>Wed, 05 Dec 2007 06:21:34 PST</pubDate>
<description>
	<![CDATA[
	<p>Networks are a natural and effective tool to study relational data, in which observations are collected on pairs of units. The units are represented by nodes and their relations by edges. In biology, for example, proteins and their interactions, and,  in social science,  people and inter-personal relations  may be the nodes and the edges of the network. In this paper we address the question of clustering vertices in networks, as a way to uncover homogeneity patterns in data that enjoy a network representation. We use a mixture  model for random graphs and propose a reversible jump Markov chain Monte Carlo algorithm to  infer its  parameters.  Applications of the algorithm to one simulated data set and three real data sets, which describe friendships among members of a University karate club, social interactions of dolphins, and gap junctions  in the C. Elegans, are given.</p>

	]]>
</description>

<author>Stefano Monni et al.</author>


</item>






<item>
<title>A Hidden Spatial-temporal  Markov Random Field Model  for Network-based Analysis of  Time Course Gene Expression Data </title>
<link>http://biostats.bepress.com/upennbiostat/art21</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/art21</guid>
<pubDate>Tue, 02 Oct 2007 12:40:25 PDT</pubDate>
<description>
	<![CDATA[
	<p>Microarray time course (MTC) gene expression data are commonly collected to study the dynamic nature of  biological processes. One important problem is to identify genes that show different expression profiles over time and  pathways that are perturbed during a given biological process. While methods are available to identify the genes with differential expression levels over time, there is a lack of methods that can  incorporate the pathway information in identifying the pathways being modified/activated during a biological process. In this paper, we develop a hidden spatial-temporal Markov random field (hstMRF)-based method for identifying genes and subnetworks that are related to biological diseases, where the dependency of the differential expression patterns of genes on the networks are modeled over time and over the network of pathways. Simulation studies indicated that the method  is quite effective in identifying genes and modified subnetworks  and has higher sensitivity  than the commonly used procedures that do not use the pathway structure or time dependency information, with similar false discovery rates. Application to a  microarray gene expression study of systemic inflammation in humans   identified a core set of genes on the KEGG pathways that show  clear differential expression patterns over time. In addition,  the method confirmed that the TOLL-like signaling pathway  plays  an important role in immune response to endotoxins.</p>

	]]>
</description>

<author>zhi wei et al.</author>


<category>Genetics</category>

</item>






<item>
<title>Variable Selection for Nonparametric Varying-Coefficient Models for Analysis of Repeated Measurements</title>
<link>http://biostats.bepress.com/upennbiostat/art20</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/art20</guid>
<pubDate>Mon, 23 Jul 2007 11:59:35 PDT</pubDate>
<description>
	<![CDATA[
	<p>Nonparametric varying-coefficient models are commonly used for analysis of data measured repeatedly over time, including longitudinal and functional responses data. While many procedures have been developed for estimating the varying-coefficients, the problem of variable selection for such models has not been addressed. In this article, we present a regularized estimation procedure for variable selection for such  nonparametric varying-coefficient models using basis function approximations and a group smoothly clipped absolute deviation penalty (gSCAD). This gSCAD procedure simultaneously selects significant variables with time-varying effects and estimates unknown smooth functions using basis function approximations. With appropriate selection of the tuning parameters, we have established the oracle property of the procedure and the consistency of the function estimation.  The methods are illustrated with simulations and an application to analysis of microarray time-course gene expression data to in order to identify the transcription factors that are related to yeast cell cycle process.</p>

	]]>
</description>

<author>Lifeng Wang et al.</author>


</item>






<item>
<title>Methodological Issues in the Study of the Effects of Hemoglobin Variability</title>
<link>http://biostats.bepress.com/upennbiostat/art19</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/art19</guid>
<pubDate>Tue, 19 Jun 2007 11:33:33 PDT</pubDate>
<description>
	<![CDATA[
	<p>We consider estimating the effect of hemoglobin variability on mortality in hemodialysis patients.  Causal effects can be defined as comparisons of outcomes under different hypothetical interventions.  Defining measures of the effect of hemoglobin variability and clinical outcomes is complicated by the fact that hypothetical interventions on variability used to define its effect inevitably involve manipulation of related variables.  We propose a model-based definition of the effect of the hemoglobin variability as a parameter for variability in a causal model for the effect of an overall intervention on hemoglobin levels over time.  We consider this problem using history-adjusted marginal structural models, and apply this approach to data from a large observational database.  We consider issues arising when the variable of interest is endogenous, and consider in principle alternate estimands.</p>

	]]>
</description>

<author>Marshall Joffe et al.</author>


</item>






<item>
<title>A Markov Random Field Model for Network-based Analysis of Genomic Data</title>
<link>http://biostats.bepress.com/upennbiostat/art18</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/art18</guid>
<pubDate>Thu, 29 Mar 2007 11:59:29 PDT</pubDate>
<description>
	<![CDATA[
	<p>A central problem in genomic research  is the identification of genes and pathways involved in diseases and other biological processes. The genes identified or the univariate test statistics are often linked to known biological pathways through gene set enrichment analysis in order to identify the pathways involved. However, most of the procedures for  identifying  differentially expressed genes do not utilize the known pathway information in the phase of identifying such genes. In this paper, we develop a Markov random field (MRF)-based method for identifying genes and subnetworks that are related to diseases. Such a procedure models the dependency of the differential expression patterns of genes on the networks using a local discrete  MRF model. Simulation studies indicated that the method  is quite effective in identifying genes and subnetworks that are related to disease and has higher sensitivity and lower false discovery rates than the commonly used procedures that do not use the pathway structure information. Applications to two breast cancer microarray gene expression datasets identified  several subnetworks on several of the  KEGG transcriptional  pathways that are related to breast cancer recurrence or survival due to breast cancer. The proposed MRF-based model efficiently utilizes the known pathway structures in identifying the differentially expressed genes and the subnetworks that might be related to phenotype. As more biological networks are identified and documented in databases, the proposed method should find more applications in identifying the subnetworks that are related to diseases and other biological processes.</p>

	]]>
</description>

<author>Zhi Wei et al.</author>


</item>






<item>
<title>Statistical Methods for Inference of Genetic Networks and Regulatory Modules</title>
<link>http://biostats.bepress.com/upennbiostat/art17</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/art17</guid>
<pubDate>Fri, 23 Mar 2007 08:53:41 PDT</pubDate>
<description>
	<![CDATA[
	<p>Large-scale microarray gene expression data, motif data derived from promotor sequences, genome-wide chromatin immunoprecipitation (ChIP-chip) data, DNA polymorphism data  and epigenomic data provide the possibility of constructing genetic networks or biological pathways, especially regulatory networks. In this paper, we review some new statistical methods for inference of genetic networks and regulatory modules, including a threshold gradient descent procedure for inference of Gaussian graphical models, a sparse regression mixture modeling approach for inference of regulatory modules, and the varying coefficient model for identifying regulatory subnetworks by integrating microarray time-course gene expression data and motif or ChIP-chip data. We present the statistical formulations of the problems, statistical methods, and results from analysis of real data sets. Areas of future research are also discussed.</p>

	]]>
</description>

<author>HongZhe Li</author>


</item>






<item>
<title>Group SCAD Regression Analysis for Microarray Time Course Gene Expression Data</title>
<link>http://biostats.bepress.com/upennbiostat/art16</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/art16</guid>
<pubDate>Thu, 01 Feb 2007 12:04:56 PST</pubDate>
<description>
	<![CDATA[
	<p>Since many important biological systems or processes are dynamic systems, it is important to study the gene expression patterns over time in a  genomic scale in order to capture the dynamic behavior of gene expression. Microarray technologies have made it possible to measure the gene expression levels of essentially all the genes during a given biological process. In order to determine the transcriptional factors involved in gene regulation during a given biological process, we propose to develop a functional response model with varying coefficients in order to model the transcriptional effects on gene expression levels and to develop a group  smoothly clipped absolute deviation (SCAD) regression procedure for selecting the transcriptional factors  with varying coefficients that are involved in gene regulation during a biological process. Simulation studies indicated that such a procedure is quite effective in selecting the relevant variables with time-varying coefficients and in estimating the coefficients. Application to the yeast cell cycle microarray time course gene expression  data set identified  19 of the 21 known transcriptional factors related to the cell cycle process. In addition, we have  identified another 52 TFs that also have periodic transcriptional effects on gene expression during the cell cycle process. Compared to simple linear regression analysis at each time point, our procedure identified more known cell cycle related transcriptional factors. The proposed group SCAD regression procedure is very effective  for identifying variables with time-varying coefficients, in particular, for identifying the transcriptional factors that are related to gene expression over time. By identifying the transcriptional factors that are related to gene expression variations over time, the procedure can potentially provide more insight into the gene regulatory networks.</p>

	]]>
</description>

<author>Lifeng Wang et al.</author>


</item>






<item>
<title>Analysis of multi-level correlated data in the framework of generalized estimating equations via xtmultcorr procedures in Stata and qls functions in Matlab </title>
<link>http://biostats.bepress.com/upennbiostat/art15</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/art15</guid>
<pubDate>Thu, 11 Jan 2007 10:38:50 PST</pubDate>
<description>
	<![CDATA[
	
	]]>
</description>

<author>Justine Shults et al.</author>


</item>






<item>
<title>Conditional Likelihood Methods for Haplotype-based Association Analysis Using Matched Case-Control Data</title>
<link>http://biostats.bepress.com/upennbiostat/art14</link>
<guid isPermaLink="true">http://biostats.bepress.com/upennbiostat/art14</guid>
<pubDate>Fri, 08 Sep 2006 06:37:06 PDT</pubDate>
<description>
	<![CDATA[
	<p>Genetic epidemiologists routinely assess disease susceptibility in relation to haplotypes, i.e., combinations of alleles on a single chromosome. We study statistical methods for inferring haplotype-related disease risk using SNP genotype data from matched case-control studies, where controls are individually matched to cases on some selected factors. Assuming a logistic regression model for haplotype-disease association, we propose two conditional likelihood approaches that address the issue that haplotypes cannot be inferred with certainty from SNP genotype data (phase ambiquity). One approach is based on the likelihood of disease status conditioned on the total number of cases, genotypes, and other covariates within each matching stratum, and the other is based on the joint likelihood of disease status and genotypes conditioned only on the total number of cases and other covariates. The joint-liklihood approach is generally more efficient, particularly for assessing haplotype-environment interactions. Simulation studies demonstrated that the first approach was more robust to model assumptions on the the diplotype distribution conditioned on environmental risk variables and matching factors in the control population. We applied the two methods to analyze a matched case-control study of prostate cancer.</p>

	]]>
</description>

<author>Jinbo Chen et al.</author>


</item>





</channel>
</rss>
