Biostatistics creates and applies methods for quantitative research in the health sciences. Our faculty conduct research across the spectrum of statistical science from foundations of inference to the discovery of new methodology to health applications. Our designs and analytic methods enable health scientists and professionals in academia, government, pharmaceutical companies, medical research organizations and elsewhere to efficiently acquire knowledge and draw valid conclusions from their ever-expanding sources of information.
A collection of working papers and related research documents from the department faculty may be found here.
Further information about the department may be found at www.biostat.jhsph.edu.
Papers from 2007
RANDOM EFFECTS MODELS IN A META-ANALYSIS OF THE ACCURACY OF DIAGNOSTIC TESTS WITHIN A GOLD STANDARD IN THE PRESENCE OF MISSING DATA, Haitao Chu, Sining Chen, and Thomas A. Louis
The Integrative Correlation Coefficient: a Measure of Cross-study Reproducibility for Gene Expressionea Array Data, Leslie M. Cope, Liz Garrett-Mayer, Edward Gabrielson, and Giovanni Parmigiani
Bayesian Analysis for Penalized Spline Regression Using Win BUGS, Ciprian M. Crainiceanu, David Ruppert, and M.P. Wand
IDENTIFYING EFFECT MODIFIERS IN AIR POLLUTION TIME-SERIES STUDIES USING A TWO-STAGE ANALYSIS, Sandrah P. Eckel and Thomas A. Louis
ASSESSING THE UNRELIABILITY OF THE MEDICAL LITERATURE: A RESPONSE TO "WHY MOST PUBLISHED RESEARCH FINDINGS ARE FALSE", Steven Goodman and Sander Greenland
MULTIPLE MODEL EVALUATION ABSENT THE GOLD STANDARD VIA MODEL COMBINATION, Edwin J. Iversen, Jr.; Giovanni Parmigiani; and Sining Chen
TRENDS IN PARTICULATE MATTER AND MORTALITY: AN APPROACH TO THE ASSESSMENT OF UNMEASURED CONFOUNDING, Holly Janes, Francesca Dominici, and Scott Zeger
MULTIPLE DISEASES IN CARRIER PROBABILITY ESTIMATION: ACCOUNTING FOR SURVIVING ALL CANCERS OTHER THAN BREAST AND OVARY IN BRCAPRO, Hormuzd A. Katki, Amanda Blackford, Sining Chen, and Giovanni Parmigiani
FAST ADAPTIVE PENALIZED SPLINES, Tatyana Krivobokova, Ciprian M. Crainiceanu, and Goran Kauermann
EFFECTIVE COMMUNICATION OF STANDARD ERRORS AND CONFIDENCE INTERVALS, Thomas A. Louis and Scott L. Zeger
DECOMPOSITION OF REGRESSION ESTIMATORS TO EXPLORE THE INFLUENCE OF "UNMEASURED" TIME-VARYING CONFOUNDERS, Yun Lu and Scott L. Zeger
OPTIMAL PROPENSITY SCORE STRATIFICATION, Jessica A. Myers and Thomas A. Louis
TRAB: TESTING WHETHER MUTATION FREQUENCIES ARE ABOVE AN UNKNOWN BACKGROUND, Giovanni Parmigiani, Sining Chen, and Victor E. Velculescu
STATISTICAL METHODS FOR THE ANALYSIS OF CANCER GENOME SEQUENCING DATA, Giovanni Parmigiani, J. Lin, Simina Boca, T. Sjoblom, K.W. Kinzler, V.E. Velculescu, and B. Vogelstein
A REPRODUCIBLE RESEARCH TOOLKIT FOR R, Roger Peng
A BAYESIAN HIERARCHICAL MODEL FOR CONSTRAINED DISTRIBUTED LAG FUNCTIONS: ESTIMATING THE TIME COURSE OF HOSPITALIZATION ASSOCIATED WITH AIR POLLUTION EXPOSURE, Roger Peng, Francesca Dominici, and Leah J. Welty
DISTRIBUTED REPRODUCIBLE RESEARCH USING CACHED COMPUTATIONS, Roger Peng and Sandrah P. Eckel
SEMIPARAMETRIC BIVARIATE QUANTILE-QUANTILE REGRESSION FOR ANALYZING SEMI-COMPETING RISKS DATA, Daniel O. Scharfstein, James M. Robins, and Mark van der Laan
A HIDDEN MARKOV MODEL FOR JOINT ESTIMATION OF GENOTYPE AND COPY NUMBER IN HIGH-THROUGHPUT SNP CHIPS, Robert B. Scharpf, Giovanni Parmigiani, Jonathan Pevnser, and Ingo Ruczinski
A BAYESIAN MODEL FOR CROSS-STUDY DIFFERENTIAL GENE EXPRESSION, Robert B. Scharpf, Hakon Tjelemeland, Giovanni Parmigiani, and Andrew B. Nobel
INFERENCE FOR SURVIVAL CURVES WITH INFORMATIVELY COARSENED DISCRETE EVENT-TIME DATA: APPLICATION TO ALIVE, Michelle Shardell, Daniel O. Scharfstein, David Vlahov, and Noya Galai
MODIFIED TEST STATISTICS BY INTER-VOXEL VARIANCE SHRINKAGE WITH AN APPLICATION TO fMRI, Shu-chih Su, Brian Caffo, Elizabeth Garrett-Mayer, and Susan Bassett
MORTALITY IN THE MEDICARE POPULATION AND CHRONIC EXPOSURE TO FINE PARTICULATE AIR POLLUTION , Scott L. Zeger, Francesca Dominici, Aidan McDermott, and Jonathan M. Samet
OPTIMIZED CROSS-STUDY ANALYSIS OF MICROARRAY-BASED PREDICTORS, Xiaogang Zhong, Luigi Marchionni, Leslie Cope, Edwin S. Iversen, Elizabeth S. Garrett-Mayer, Edward Gabrielson, and Giovanni Parmigiani
A SMOOTHING APPROACH TO DATA MASKING, Yijie Zhous, Francesca Dominici, and Thomas A. Louis
RACIAL DISPARITIES IN MORTALITY RISKS IN A SAMPLE OF THE U.S. MEDICARE POPULATION, Yijie Zhou, Francesca Dominici, and Thomas A. Louis
Papers from 2006
USE OF HIDDEN MARKOV MODELS FOR QTL MAPPING, Karl W. Broman
A FLEXIBLE GENERAL CLASS OF MARGINAL AND CONDITIONAL RANDOM INTERCEPT MODELS FOR BINARY OUTCOMES USING MIXTURES OF NORMALS, Brian Caffo, Ming-Wen An, and Charles A. Rohde
EXPLORATION, NORMALIZATION, AND GENOTYPE CALLS OF HIGH DENSITY OLIGONUCLEOTIDE SNP ARRAY DATA, Benilton Carvalho, Terence P. Speed, and Rafael A. Irizarry
BIVARIATE BINOMIAL SPATIAL MODELLING LOA loa PREVALENCE IN TROPICAL AFRICA, Ciprian M. Crainiceanu, Peter J. Diggle, and Barry Rowlingson
Adjustment Uncertainty in Effect Estimation, Ciprian M. Crainiceanu, Francesca Dominici, and Giovanni Parmigiani
COX MODELS WITH NONLINEAR EFFECT OF COVARIATES MEASURED WITH ERROR: A CASE STUDY OF CHRONIC KIDNEY DISEASE INCIDENCE, Ciprian M. Crainiceanu, David Ruppert, and Josef Coresh
PENALIZED LIKELIHOOD AND BAYESIAN METHODS FOR SPARSE CONTINGENCY TABLES: AN ANALYSIS OF ALTERNATIVE SPLICING IN FULL-LENGTH cDNA LIBRARIES, Corinne Dahinden, Giovanni Parmigiani, Mark C. Emerick, and Peter Buhlmann
INTERACTING WITH LOCAL AND REMOTE DATA RESPOSITORIES USING THE stashR PACKAGE, Sandrah P. Eckel and Roger Peng
A Comparative Analysis of the Chronic Effects of Fine Particulate Matter, Sorina E. Eftim, Holly Janes, Aidan McDermott, Jonathan M. Samet, and Francesca Dominici
INVESTIGATING MEDIATION WHEN COUNTERFACTUALS ARE NOT METAPHYSICAL: DOES SUNLIGHT UVB EXPOSURE MEDIATE THE EFFECT OF EYEGLASSES ON CATARACTS?, Brian Egleston, Daniel O. Scharfstein, Beatriz Munoz, and Sheila West
MULTIVARIATE ANALYSIS AND VISUALIZATION OF SPLICING CORRELATIONS IN SINGLE-GENE TRANSCRIPTOMES, Mark C. Emerick, Giovanni Parmigiani, and William S. Agnew
PRINCIPAL STRATIFICATION DESIGNS TO ESTIMATE INPUT DATA MISSING DUE TO DEATH, Constantine E. Frangakis, Donald B. Rubin, Ming-Wen An, and Ellen MacKenzie
FEATURE-LEVEL EXPLORATION OF THE CHOE ET AL. AFFYMETRIX GENECHIP CONTROL DATASET, Rafael A. Irizarry, Leslie Cope, and Zhijin Wu
ON THE POTENTIAL FOR ILL-LOGIC WITH LOGICALLY DEFINED OUTCOMES, Xianbin Li, Brian S. Caffo, and Daniel O. Scharfstein
RECURRENT EVENT MODELS IN THE PRESENCE OF A TERMINAL EVENT: COMPARISON, INFERENCE AND DATA ANALYSIS, Xianghua Luo and Mei-Cheng Wang
ON THE EQUIVALENCE OF CASE-CROSSOVER AND TIME SERIES METHODS IN ENVIRONMENTAL EPIDEMIOLOGY, Yun Lu and Scott L. Zeger
POOR PERFORMANCE OF BOOTSTRAP CONFIDENCE INTERVALS FOR THE LOCATION OF A QUANTITATIVE TRAIT LOUCS, Ani Manichaikul, Josee Dupuis, Saunak Sen, and Karl W. Broman
FDR and Bayesian Multiple Comparisons Rules, Peter Muller, Giovanni Parmigiani, and Kenneth Rice
INTERACTING WITH DATA USING THE FILEHASH PACKAGE FOR R, Roger Peng
GAMMA SHAPE MIXTURES FOR HEAVY-TAILED DISTRIBUTIONS, Sergio Venturini, Francesca Dominici, and Giovanni Parmigiani
ESTIMATING GENOME-WIDE COPY NUMBER USING ALLELE SPECIFIC MIXTURE MODELS, Wenyi Wang , Benilton Caravalho, Nate Miller, Jonathan Pevsner, Aravinda Chakravarti, and Rafael A. Irizarry
Papers from 2005
NONPARAMETRIC ESTIMATION OF BIVARIATE FAILURE TIME ASSOCIATIONS IN THE PRESENCE OF A COMPETING RISK, Karen Bandeen-Roche and Jing Ning
A User-Friendly Introduction to Link-Probit-Normal Models, Brian S. Caffo and Michael Griswold
Additive Hazards Models with Latent Treatment Effectiveness Lag Time, Ying Qing Chen, Charles A. Rohde, and Mei-Cheng Wang
A Mechanistic Latent Variable Model for Estimating Drug Concentrations in the Male Genital Tract, Leena Choi, Brian Caffo, Charles A. Rohde, Themba T. Ndovi, and Craig W. Hendrix
Analysis of Affymetrix GeneChip Data Using Amplified RNA, Leslie Cope, Scott M. Hartman, Hinrich W.H. Gohlmann, Jay P. Tiesman, and Rafael A. Irizarry
ON THE USE OF NON-EUCLIDEAN ISOTROPY IN GEOSTATISTICS, Frank C. Curriero
Searching for Differentially Expressed Gene Combinations, Marcel Dettling, Edward Gabrielson, and Giovanni Parmigiani
A Partial Likelihood for Spatio-temporal Point Processes, Peter J. Diggle
Spatio-temporal Point Processes: Methods and Applications, Peter J. Diggle
Does the Effect of Micronutrient Supplementation on Neonatal Survival Vary with Respect to the Percentiles of the Birth Weight Distribution?, Francesca Dominici, Scott L. Zeger, Giovanni Parmigiani, Joanne Katz, and Parul Christian
THE ROLE OF AN EXPLICIT CAUSAL FRAMEWORK IN AFFECTED SIB PAIR DESIGNS WITH COVARIATES , Constantine E. Frangakis, Fan Li, and Betty Q. Doan
Understanding the Continual Reassessment Method for Dose Finding Studies: An Overview for Non-Statisticians, Elizabeth Garrett-Mayer
MODELING DIFFERENTIATED TREATMENT EFFECTS FOR MULTIPLE OUTCOMES DATA, Hongfei Guo and Karen Bandeen-Roche
Analyzing Panel Count Data with Informative Observation Times, Chiung-Yu Huang, Mei-Cheng Wang, and Ying Zhang
Comparison of Affymetrix GeneChip Expression Measures, Rafael A. Irizarry, Zhijin Wu, and Harris A. Jaffee
Fixed-Width Output Analysis for Markov Chain Monte Carlo, Galin L. Jones, Murali Haran, Brian S. Caffo, and Ronald Neath
Designs in Partially Controlled Studies: Messages from a Review, Fan Li and Constantine E. Frangakis
Polydesigns and Causal Inference, Fan Li and Constantine E. Frangakis
Model Choice in Time Series Studies of Air Pollution and Mortality, Roger D. Peng, Francesca Dominici, and Thomas A. Louis
When Should One Substract Background Fluorescence in Two Color Microarrays?, Robert B. Scharpf, Christine A. Iacobuzio-Donahue, Julie B. Sneddon, and Giovanni Parmigiani
Estimation and Projection of Indicence and Prevalence Based on Doubly Truncated Data with Application to Pharmacoepidemiological Databases, Henrik Stovring and Mei-Cheng Wang
A Statistical Framework for the Analysis of Microarray Probe-Level Data, Zhijin Wu and Rafael A. Irizarry
Papers from 2004
Quantitative Methods for Tracking Cognitive Change 3 Years After Coronary Artery Bypass Surgery, Sarah Barry; Scott L. Zeger; Ola A. Selnes; Maura A. Grega; Louis M. Borowicz, Jr.; and Guy M. McKhann
Ozone and Mortality: A Meta-Analysis of Time-Series Studies and Comparison to a Multi-City Study (The National Morbidity, Mortality, and Air Pollution Study), Michelle L. Bell, Jonathan M. Samet, and Francesca Dominici
The Genomes of Recombinant Inbred Lines: The Gory Details, Karl W. Broman
A Hypothesis Test for the End of a Common Source Outbreak, Ron Brookmeyer and Xiaojun You
BayesMendel: An R Environment for Mendelian Risk Prediction, Sining Chen, Wenyi Wang, Karl Broman, Hormuzd A. Katki, and Giovanni Parmigiani
Accuracy of MSI Testing in Predicting Germline Mutations of MSH2 and MLH1: A Case Study in Bayesian Meta-Analysis of Diagnostic Tests Without a Gold Standard, Sining Chen, Patrice Watson, and Giovanni Parmigiani
Power and Robustness of Linkage Tests for Quantitative Traits in General Pedigrees, Weimin Chen, Karl Broman, and Kung-Yee Liang
Optimal Sampling Times in Bioequivalence Studies Using a Simulated Annealing Algorithm , Leena Choi, Brian Caffo, and Charles Rohde
MergeMaid: R Tools for Merging and Cross-Study Validation of Gene Expression Data, Leslie Cope, Xiaogang Zhong, Elizabeth S. Garrett-Mayer, and Giovanni Parmigiani
Spatially Adaptive Bayesian P-Splines with Heteroscedastic Errors, Ciprian M. Crainiceanu, David Ruppert, and Raymond J. Carroll
Bayesian Geostatistical Design, Peter J. Diggle and Soren Lophaven
Point Process Methodology for On-line Spatio-temporal Disease Surveillance, Peter J. Diggle, Barry Rowlingson, and Ting-li Su
Estimating Percentile-Specific Causal Effects: A Case Study of Micronutrient Supplementation, Birth Weight, and Infant Mortality, Francesca Dominici, Scott L. Zeger, Giovanni Parmigiani, Joanne Katz, and Parul Christian
The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities, Elizabeth Garrett-Mayer, Steven N. Goodman, and Ralph H. Hruban
Clustering and Classification Methods for Gene Expression Data Analysis, Elizabeth Garrett-Mayer and Giovanni Parmigiani
Cross-study Validation and Combined Analysis of Gene Expression Microarray Data, Elizabeth Garrett-Mayer, Giovanni Parmigiani, Xiaogang Zhong, Leslie Cope, and Edward Gabrielson
Semiparametric Regression in Capture-Recapture Modelling, O. Gimenez, C. Barbraud, Ciprian M. Crainiceanu, S. Jenouvrier, and B.T. Morgan
ON MARGINALIZED MULTILEVEL MODELS AND THEIR COMPUTATION, Michael E. Griswold and Scott L. Zeger
Bayesian Hierarchical Distributed Lag Models for Summer Ozone Exposure and Cardio-Respiratory Mortality, Yi Huang, Francesca Dominici, and Michelle L. Bell
Multiple Lab Comparison of Microarray Platforms, Rafael A. Irizarry et al.
Choosing Smoothness Parameters for Smoothing Splines by Minimizing and Estimate of Risk, Rafael A. Irizarry
Inequity Measures for Evaluations of Environmental Justice: A Case Study of Close Proximity to Highways in NYC, Jerry O. Jacobson, Nicolas W. Hengartner, and Thomas A. Louis
Effect of Misreported Family History on Mendelian Mutation Prediction Models, Hormuzd A. Katki
Ranking USRDS Provider-Specific SMRs from 1998-2001, Rongheng Lin, Thomas A. Louis, Susan M. Paddock, and Greg Ridgeway
Screening for Differentially Expressed Genes: Are Multilevel Models Helpful?, Dongmei Liu, Giovanni Parmigiani, and Brian Caffo
Optimal Sample Size for Multiple Testing: the Case of Gene Expression Microarrays, Peter Muller, Giovanni Parmigiani, Christian Robert, and Judith Rousseau
Seasonal Analyses of Air Pollution and Mortality in 100 U.S. Cities, Roger D. Peng, Francesca Dominici, Roberto Pastor-Barriuso, Scott L. Zeger, and Jonathan M. Samet
The National Morbidity, Mortality, and Air Pollution Study Database in R, Roger D. Peng, Leah J. Welty, and Aidan McDermott
A Hierarchical Multivariate Two-Part Model for Profiling Providers' Effects on Healthcare Charges, John W. Robinson, Scott L. Zeger, and Christopher B. Forrest
Studying Effects of Primary Care Physicians and Patients on the Trade-Off Between Charges for Primary Care and Specialty Care Using a Hierarchical Multivariate Two-Part Model, John W. Robinson, Scott L. Zeger, and Christopher B. Forrest
Self-Reported Memory Symptoms with Coronary Artery Disease: A Prospective of CABG Patients and Nonsurgical Controls, Ola A. Selnes; Maura A. Grega; Louis M. Borowicz, Jr.; Sarah Barry; Scott L. Zeger; and Guy M. McKhann
