Located on the Harvard Medical Campus, the Department of Biostatistics was one of the first departments in the newly formed Harvard School of Public Health in 1922. Now in its 80th year, the Department comprises 85 students, 57 faculty members, and 22 research associates and fellows. Our size contributes to our ability to address a broad spectrum of biostatistical and public health issues.
Current departmental research on statistical and computing methods for observational studies and clinical trials includes survival analysis, missing-data problems, and causal inference. Other areas of investigation are environmental research (methods for longitudinal studies, analyses with incomplete data, and meta-analysis); statistical aspects of the study of AIDS and cancer; quantitative problems in health-risk analysis, technology assessment, and clinical decision making; statistical methodology in psychiatric research and in genetic studies; Bayesian statistics; statistical computing; statistical genetics and computational biology; and collaborative research activities with biomedical scientists in other Harvard-affiliated institutions.
The Harvard University Biostatistics Working Paper Series presents contributions by our faculty and researchers that rely on the theory and application of statistical science to analyze public health problems.
Papers from 2011
On Causal Mediation Analysis with a Survival Outcome, Eric J. Tchetgen Tchetgen
Semiparametric Estimation of Models for Natural Direct and Indirect Effects, Eric J. Tchetgen Tchetgen and Ilya Shpitser
Semiparametric Theory for Causal Mediation Analysis: efficiency bounds, multiple robustness, and sensitivity analysis, Eric J. Tchetgen Tchetgen and Ilya Shpitser
On the Covariate-adjusted Estimation for an Overall Treatment Difference with Data from a Randomized Comparative Clinical Trial, Lu Tian, Tianxi Cai, Lihui Zhao, and L. J. Wei
Bayesian Effect Estimation Accounting for Adjustment Uncertainty, Chi Wang, Giovanni Parmigiani, and Francesca Dominici
Effectively Selecting a Target Population for a Future Comparative Study, Lihui Zhao, Lu Tian, Tianxi Cai, Brian Claggett, and L. J. Wei
A Regularization Corrected Score Method for Nonlinear Regression Models with Covariate Error, David M. Zucker, Malka Gorfine, Yi Li, and Donna Spiegelman
Papers from 2010
A New Class of Dantzig Selectors for Censored Linear Regression Models, Yi Li, Lee Dicker, and Sihai Dave Zhao
Estimating Causal Effects in Trials Involving Multi-treatment Arms Subject to Non-compliance: A Bayesian Frame-work, Qi Long, Roderick J. Little, and Xihong Lin
Improving the Power of Chronic Disease Surveillance by Incorporating Residential History, Justin Manjourides and Marcello Pagano
A Perturbation Method for Inference on Regularized Regression Estimates, Jessica Minnier, Lu Tian, and Tianxi Cai
Landmark Prediction of Survival, Layla Parast and Tianxi Cai
Modeling Dependent Gene Expression, Donatello Telesca, Peter Muller, Giovanni Parmigiani, and Ralph S. Freedman
Graphical Procedures for Evaluating Overall and Subject-Specific Incremental Values from New Predictors with Censored Event Time Data, Hajime Uno, Tianxi Cai, Lu Tian, and L. J. Wei
Nonparametric Regression with Missing Outcomes Using Weighted Kernel Estimating Equations, Lu Wang, Andrea Rotnitzky, and Xihong Lin
Powerful SNP Set Analysis for Case-Control Genome Wide Association Studies, Michael C. Wu, Peter Kraft, Michael P. Epstein, Deanne M. Taylor, Stephen J. Chanock, David J. Hunter, and Xihong Lin
Stratifying Subjects for Treatment Selection with Censored Event Time Data from a Comparative Study, Lihui Zhao, Tianxi Cai, Lu Tian, Hajime Uno, Scott D. Solomon, and L. J. Wei
Utilizing the Integrated Difference of Two Survival Functions to Quantify the Treatment Contrast for Designing, Monitoring and Analyzing a Comparative Clinical Study, Lihui Zhao, Lu Tian, Hajime Uno, Scott D. Solomon, Marc A. Pfeffer, J. S. Schindler, and L. J. Wei
Principled Sure Independence Screening for Cox Models with Ultra-high-dimensional Covariates, Sihai Dave Zhao and Yi Li
Papers from 2009
Lot Quality Assurance Sampling (LQAS) and the Mozambique Malaria Indicator Surveys, Caitlin Biedron, Marcello Pagano, Bethany L. Hedt, Albert Kilian, Amy Ratcliffe, Samuel Mabunda, and Joseph J. Valadez
Analysis of Randomized Comparative Clinical Trial Data for Personalized Treatment Selections, Tianxi Cai, Lu Tian, Peggy H. Wong, and L. J. Wei
Spatial Cluster Detection for Repeatedly Measured Outcomes while Accounting for Residential History, Andrea J. Cook, Diane Gold, and Yi Li
Spatial Cluster Detection for Weighted Outcomes Using Cumulative Geographic Residuals, Andrea J. Cook, Yi Li, David Arterburn, and Ram C. Tiwari
Survival Analysis with Error-prone Time-varying Covariates: A Risk Set Calibration Approach, Xiaomei Liao, David M. Zucker, Yi Li, and donna spiegelman
Estimating Subject-Specific Dependent Competing Risk Profile with Censored Event Time Observations, Yi Li, Lu Tian, and L. J. Wei
A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance, Nirian Martin and Yi Li
Marginalized Frailty Models for Multivariate Survival Data, Megan Othus and Yi Li
A Class of Semiparametric Mixture Cure Survival Models with Dependent Censoring, Megan Othus, Yi Li, and Ram C. Tiwari
The Importance of Scale for Spatial-confounding Bias and Precision of Spatial Regression Estimators, Christopher J. Paciorek
Group Comparison of Eigenvalues and Eigenvectors of Diffusion Tensors, Armin Schwartzman, Robert F. Dougherty, and Jonathan E. Taylor
The Effect of Correlation in False Discovery Rate Estimation, Armin Schwartzman and Xihong Lin
On The C-Statistics For Evaluating Overall Adequacy Of Risk Prediction Procedures With Censored Survival Data, Hajime Uno, Tianxi Cai, Michael J. Pencina, Ralph B. D'Agostino, and L. J. Wei
Comparing Risk Scoring Systems Beyond the ROC Paradigm in Survival Analysis, Hajime Uno, Lu Tian, Tianxi Cai, Isaac S. Kohane, and L. J. Wei
Sparse Linear Discriminant Analysis for Simultaneous Testing for the Significance of a Gene Set/Pathway and Gene Selection, Michael C. Wu, Lingson Zhang, Zhaoxi Wang, David C. Christiani, and Xihong Lin
Papers from 2008
Evaluating Subject-level Incremental Values of New Markers for Risk Classification Rule, Tianxi Cai, Lu Tian, Donald M. Lloyd-Jones, and L. J. Wei
Calibrating Parametric Subject-specific Risk Estimation, Tianxi Cai, Lu Tian, Hajime Uno, Scott D. Solomon, and L. J. Wei
A Functional Random Effects Model for Flexible Assessment of Susceptibility in Longitudinal Designs, Brent A. Coull
Estimation of Controlled Direct Effects, Sylvie Goetgeluk, Stijn Vansteelandt, and Els Goetghebeur
A New Class of Rank Tests for Interval-censored Data, Guadalupe Gomez and Ramon Oller Pique
Measurement Error Caused by Spatial Misalignment in Environmental Epidemiology, Alexandros Gryparis, Christopher J. Paciorek, Ariana Zeka, Joel Schwartz, and Brent A. Coull
A Matrix Pooling Algorithm for Disease Detection, Bethany L. Hedt and Marcello Pagano
Matrix Pooling: An Accurate and Cost Effective Testing Algorithm for Detection of Acute HIV Infection, Bethany L. Hedt and Marcello Pagano
Model-based Clustering of Methylation Array Data: A Recursive-partitioning Algorithm for High-dimensional Data Arising as a Mixture of Beta Distributions, E. Andres Houseman, Brock C. Christensen, Ru-Fang Yeh, Carmen J. Marsit, Margaret R. Karagas, Margaret Wrensch, Heather H. Nelson, Joseph Wiemels, Shichun Zheng, John K. Wiencke, and Karl T. Kelsey
A Powerful and Flexible Multilocus Association Test for Quantitative Traits, Lydia Coulter Kwee, Dawei Liu, Xihong Lin, Debashis Ghosh, and Michael P. Epstein
A Comparison of Methods for Estimating the Causal Effect of a Treatment in Randomized Clinical Trials Subject to Noncompliance, Rod Little, Qi Long, and Xihong Lin
Estimation and Testing for the Effect of a Genetic Pathway on a Disease Outcome Using Logistic Kernel Machine Regression via Logistic Mixed Models, Dawei Liu, Debashis Ghosh, and Xihong Lin
Semiparametric Maximum Likelihood Estimation in Normal Transformation Models for Bivariate Survival Data, Yi Li, Ross L. Prentice, and Xihong Lin
Limitations of Remotely-sensed Aerosol as a Spatial Proxy for Fine Particulate Matter, Christopher J. Paciorek and Yang Liu
Expanded Technical Report: Mapping Ancient Forests: Bayesian Inference for Spatio-temporal Trends in Forest Composition Using the Fossil Pollen Proxy Record, Christopher J. Paciorek and Jason S. McLachlan
Practical Large-Scale Spatio-Temporal Modeling of Particulate Matter Concentrations, Christopher J. Paciorek, Jeff D. Yanosky, Robin C. Puett, Francine Laden, and Helen H. Suh
Estimation in Semiparametric Transition Measurement Error Models for Longitudinal Data, Wenqin Pan, Donglin Zeng, and Xihong Lin
Empirical Null and False Discovery Rate Inference for Exponential Families, Armin Schwartzman
The Highest Confidence Density Region and Its Usage for Inferences about the Survival Function with Censored Data, Lu Tian, Rui wang, Tianxi Cai, and L. J. Wei
Marginal Structural Models for Partial Exposure Regimes, Stijn Vansteelandt, Karl Mertens, Carl Suetens, and Els Goetghebeur
Nonparametric Inference Procedure For Percentiles of the Random Effect Distribution in Meta Analysis, Rui Wang, Lu Tian, Tianxi Cai, and L. J. Wei
Nonparametric Regression Using Local Kernel Estimating Equations for Correlated Failure Time Data, Zhangsheng Yu and Xihong Lin
Papers from 2007
Survival Analysis with Large Dimensional Covariates: An Application in Microarray Studies, David A. Engler and Yi Li
Assessment of a CGH-based Genetic Instability, David A. Engler, Yiping Shen, J F. Gusella, and Rebecca A. Betensky
Comparing Trends in Cancer Rates Across Overlapping Regions, Yi Li and Ram C. Tiwari
Estimating Time-to-Event From Longitudinal Categorical Data Using Random Effects Markov Models: Application to Multiple Sclerosis Progression, Micha Mandel and Rebecca A. Betensky
Simultaneous Confidence Intervals Based on the Percentile Bootstrap Approach, Micha Mandel and Rebecca A. Betensky
Assessing Population Level Genetic Instability via Moving Average, Samuel McDaniel, Rebecca Betensky, and Tianxi Cai
Spatio-temporal Associations Between GOES Aerosol Optical Depth Retrievals and Ground-Level PM2.5, Christopher J. Paciorek, Yang Liu, Hortensia Moreno-Macias, and Shobha Kondragunta
Conservative Estimation of Optimal Multiple Testing Procedures, James E. Signorovitch
Effectively Combining Independent 2 x 2 Tables for Valid Inferences in Meta Analysis with all Available Data but no Artificial Continuity Corrections for Studies with Zero Events and its Application to the Analysis of Rosiglitazone's Cardiovascular Disease Related Event Data, Lu Tian, Tianxi Cai, Nikita Piankov, Pierre-Yves Cremieux, and L. J. Wei
Identifying patients who need additional biomarkers for better prediction of health outcome or diagnosis of clinical phenotype, Lu Tian, Tianxi Cai, and L. J. Wei
Correcting Instrumental Variables Estimators for Systematic Measurement Error, Stijn Vansteelandt, Manoochehr Babanezhad, and Els Goetghebeur
Papers from 2006
Regression Analysis for the Partial Area Under the ROC Curve, Tianxi Cai and Lori E. Dodd
Predicting Future Responses Based on Possibly Misspecified Working Models, Tianxi Cai, Lu Tian, Scott D. Solomon, and L.J. Wei
Spatial Cluster Detection for Censored Outcome Data, Andrea J. Cook, Diane Gold, and Yi Li
A Computationally Tractable Multivariate Random Effects Model for Clustered Binary Data, Brent A. Coull, E. Andres Houseman, and Rebecca A. Betensky
A Likelihood Based Method for Real Time Estimation of the Serial Interval and Reproductive Number of an Epidemic, Laura Forsberg White and Marcello Pagano
Survival Analysis with Change Point Hazard Functions, Melody S. Goodman, Yi Li, and Ram C. Tiwari
Semiparametric Latent Variable Regression Models for Spatio-temporal Modeling of Mobile Source Particles in the Greater Boston Area, Alexandros Gryparis, Brent A. Coull, Joel Schwartz, and Helen H. Suh
Posterior Simulation in the Generalized Linear Model with Semiparmetric Random Effects, Subharup Guha
Bayesian Hidden Markov Modeling of Array CGH Data, Subharup Guha, Yi Li, and Donna Neuberg
Spatio-Temporal Analysis of Areal Data and Discovery of Neighborhood Relationships in Conditionally Autoregressive Models, Subharup Guha and Louise Ryan
PLASQ: A Generalized Linear Model-Based Procedure to Determine Allelic Dosage ini Cancer Cells from SNP Array Data, Thomas LaFramboise, David P. Harrington, and Barbara A. Weir
A Comparison of Methods for Estimating the Causal Effect of a Treatment in Randomized Clinical Trials Subject to Noncompliance, Rod Little, Qi Long, and Xihong Lin
Semiparametric Regression of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines and Linear Mixed Models, Dawei Liu, Xihong Lin, and Debashis Ghosh
Causal Inference in Hybrid Intervention Trials Involving Treatment Choice, Qi Long, Rod Little, and Xihong Lin
Selecting 'Significant' Differentially Expressed Genes from the Combined Perspective of the Null and the Alternative, Beatrijs Moerkerke and Els Goetghebeur
An Informative Bayesian Structural Equation Model to Assess Source-Specific Health Effects of Air Pollution, Margaret C. Nikolov, Brent A. Coull, Paul J. Catalano, and John J. Godleski
Mixed Multiplicative Factor Analysis Model for Air Pollution Exposure Assessment, Margaret C. Nikolov, Brent A. Coull, Paul J. Catalano, and John J. Godleski
Bayesian Smoothing of Irregularly-spaced Data Using Fourier Basis Functions, Christopher J. Paciorek
Structural Inference in Transition Measurement Error Models for Longitudinal Data, Wenqin Pan, Xihong Lin, and Donglin Zeng
Estimation in Semiparametric Transition Measurement Error Models for Longitudinal Data, Wenqin Pan, Donglin Zeng, and Xihong Lin
Multiple Testing With an Empirical Alternative Hypothesis, James E. Signorovitch
A Diagnostic Test for the Mixing Distribution in a Generalised Linear Mixed Model, Eric J. Tchetgen and Brent A. Coull
Evaluating Prediction Rules for t-Year Survivors With Censored Regression Models, Hajime Uno, Tianxi Cai, Lu Tian, and L.J. Wei
Using Profile Likelihood for Semiparametric Model Selection with Application to Proportional Hazards Mixed Models, Ronghui Xu, Anthony Gamst, Michael Donohue, Florin Vaida, and David P. Harrington
Nonparametric Regression Using Local Kernel Estimating Equations for Correlated Failure Time Data, Zhangsheng Yu and Xihong Lin
Papers from 2005
The Sensitivity and Specificity of Markers for Event Times, Tianxi Cai, Margaret S. Pepe, Thomas Lumley, Yingye Zheng, and Nancy Swords Jenny
Model Checking for ROC Regression Analysis, Tianxi Cai and Yingye Zheng
A Pseudolikelihood Approach for Simultaneous Analysis of Array Comparative Genomic Hybridizations (aCGH), David A. Engler, Gayatry Mohapatra, David N. Louis, and Rebecca Betensky
Gauss-Seidel Estimation of Generalized Linear Mixed Models with Application to Poisson Modeling of Spatially Varying Disease Rates, Subharup Guha and Louise Ryan
Feature-Specific Penalized Latent Class Analysis for Genomic Data, E. Andres Houseman, Brent A. Coull, and Rebecca A. Betensky
A Nonstationary Negative Binomial Time Series with Time-Dependent Covariates: Enterococcus Counts in Boston Harbor, E. Andres Houseman, Brent Coull, and James P. Shine
Robust Inferences For Covariate Effects On Survival Time With Censored Linear Regression Models, Larry Leon, Tianxi Cai, and L. J. Wei
Semiparametric Estimation in General Repeated Measures Problems, Xihong Lin and Raymond J. Carroll