Biostatistics creates and applies methods for quantitative research in the health sciences. Our faculty conduct research across the spectrum of statistical science from foundations of inference to the discovery of new methodology to health applications. Our designs and analytic methods enable health scientists and professionals in academia, government, pharmaceutical companies, medical research organizations and elsewhere to efficiently acquire knowledge and draw valid conclusions from their ever-expanding sources of information.
A collection of working papers and related research documents from the department faculty may be found here.
Further information about the department may be found at www.biostat.jhsph.edu.
Papers from 2009
A MULTILEVEL MODEL TO ADDRESS BATCH EFFECTS IN COPY NUMBER ESTIMATION USING SNP ARRAYS, Robert B. Scharpf, Ingo Ruczinski, Benilton Carvalho, Betty Doan, Aravinda Chakravarti, and Rafael A. Irizarry
A MULTILEVEL MODEL TO ADDRESS BATCH EFFECTS IN COPY NUMBER USING SNP ARRAYS, Robert B. Scharpf, Ingo Ruczinski, Benilton Carvalho, Betty Doan, Aravinda Chakravarti, and Rafael A. Irizarry
Estimating effects by combining instrumental variables with case-control designs: the role of principal stratification, Russell T. Shinohara, Constantine E. Frangakis, Elizabeth Platz, and Konstantinos Tsilidis
LASAGNA PLOTS: A SAUCY ALTERNATIVE TO SPAGHETTI PLOTS, Bruce Swihart, Brian Caffo, Bryan D. James, Matthew Strand, Brian S. Schwartz, and Naresh M. Punjabi
Modeling multilevel sleep transitional data via Poisson log-linear multilevel models, Bruce J. Swihart, Brian Caffo, Ciprian Crainiceanu, and Naresh M. Punjabi
A BAYESIAN SHRINKAGE MODEL FOR INCOMPLETE LONGITUDINAL BINARY DATA WITH APPLICATION TO THE BREAST CANCER PREVENTION TRIAL, C. Wang, M.J. Daniels, Daniel O. Scharfstein, and S. Land
REDEFINING CpG ISLANDS USING A HIDEEN MARKOV MODEL, Hao Wu, Brain Caffo, Harris A. Jaffee, Andrew P. Feinberg, and Rafael A. Irizarry
Subset Quantile Normalization using Negative Control Features, Zhijin Wu
Analyzing Bivariate Survival Data with Interval Sampling and Application to Cancer Epidemiology, Hong Zhu and Mei-Cheng Wang
Papers from 2008
LIKELIHOOD ESTIMATION OF CONJUGACY RELATIONSHIPS IN LINEAR MODELS WITH APPLICATIONS TO HIGH-THROUGHPUT GENOMICS, Brian S. Caffo, Liu Dongmei, Robert Scharpf, and Giovanni Parmigiani
AN OVERVIEW OF OBSERVATIONAL SLEEP RESEARCH WITH APPLICATION TO SLEEP STAGE TRANSITIONING, Brian S. Caffo, B. Swihart, A. Laffan, C. Crainiceanu, and N. Punjabi
Bayesian Model Averaging for Clustered Data: Imputing Missing Daily Air Pollution Concentration, Howard H. Chang, Francesca Dominici, and Roger D. Peng
GENERALIZED MULTILEVEL FUNCTIONAL REGRESSION, Ciprian M. Crainiceanu, Ana-Maria Staicu, and Chongzhi Di
Multilevel Latent Class Models with Dirichlet Mixing Distribution, Chongzhi Di and Karen Bandeen-Roche
GEOSTATISTICAL INFERENCE UNDER PREFERENTIAL SAMPLING, Peter J. Diggle, Raquel Menezes, and Ting-li Su
MODEL SELECTION AND HEALTH EFFECT ESTIMATION IN ENVIRONMENTAL EPIDEMIOLOGY, Francesca Dominici, Chi Wang, Ciprian Crainiceanu, and Giovanni Parmigiani
A NOVEL AND SIMPLE RULE OF THUMB FOR MULTIPLICITY CONTROL IN EQUIVALENCE TESTING USING TWO ONE-SIDED TESTS, Carolyn Lauzon and Brian S. Caffo
JOINTLY MODELING CONTINUOUS AND BINARY OUTCOMES FOR BOOLEAN OUTCOMES: AN APPLICATION TO MODELING HYPERTENSION, Xianbin Li, Brian S. Caffo, and Elizabeth Stuart
BAYESIAN INFERENCE FOR SMOKING CESSATION WITH A LATENT CURE STATE, Sheng Luo, Ciprian M. Crainiceanu, Thomas A. Louis, and Nilanjan Chatterjee
LEARNING FROM NEAR MISSES IN MEDICATION ERRORS: A BAYESIAN APPROACH, Jessica A. Myers, Francesca Dominici, and Laura Morlock
DESIGN AND ANALYSIS ISSUES IN GENOME-WIDE SOMATIC MUTATION STUDIES OF CANCER, Giovanni Parmigiani, Simina Boca, Jimmy Lin, Kenneth W. Kinzler, Victor E. Velculescu, and Bert Vogelstein
A Method for Visualizing Multivariate Time Series Data, Roger D. Peng
Caching and Distributing Statistical Analyses in R, Roger D. Peng
Spatial Misalignment in time series studies of air pollution and health data, Roger D. Peng and Michelle L. Bell
ANALYSIS OF SUBGROUP EFFECTS IN RANDOMIZED TRIALS WHEN SUBGROUP MEMBERSHIP IS INFORMATIVELY MISSING: APPLICATION TO THE MADIT II STUDY, Daniel O. Scharfstein, Georgiana Onicescu, and Steven Goodman
ON THE MERITS OF VOXEL-BASED MORPHOMETRIC PATH-ANALYSIS FOR INVESTIGATING VOLUMETRIC MEDIATION OF A TOXICANT'S INFLUENCE ON COGNITIVE FUNCTION, Shu-chih Su, Brian S. Caffo, Lynn E. Eberly, Elizabeth Garrett-Mayer, Walter F. Stewart, Sining Chen, David Yousem, Christos Davatzikos, and Brian Schwartz
A BAYESIAN APPROACH TO EFFECT ESTIMATION ACCOUNTING FOR ADJUSTMENT UNCERTAINTY, Chi Wang, Giovanni Parmigiani, Ciprian Crainiceanu, and Francesca Dominici
Estimating the Causal Effect of Lower Tidal Volume Ventilation on Survival in Patients with Acute Lung Injury, Weiwei Wang, Daniel Scharfstein, Roy Brower, and Dale Needham
Causal Inference in Observational Studies with Outcome-Dependent Sampling, Weiwei Wang, Daniel Scharfstein, Zhiqiang Tan, and Ellen J. MacKenzie
STATISTICAL METHODS FOR AUTOMATED DRUG SUSCEPTIBILITY TESTING: BAYESIAN MINIMUM INHIBITORY CONCENTRATION PREDICTION FROM GROWTH CURVES, Xi Zhou, Merlise A. Clyde, James Garrett, Viridiana Lourdes, Michael O'Connell, Giovanni Parmigiani, David J. Turner, and Tim Wiles
Papers from 2007
A BAYESIAN HIERARCHICAL FRAMEWORK FOR SPATIAL MODELING OF fMRI DATA, F. DuBois Bowman, Brian S. Caffo, Susan Spear Bassett, and Clinton Kilts
FORECASTING THE GLOBAL BURDEN OF ALZHEIMER'S DISEASE, Ron Brookmeyer, Elizabeth Johnson, Kathryn Ziegler-Graham, and H. Michael Arrighi
IS MRI-BASED VOLUME A MEDIATOR OF THE ASSOCIATION OF CUMULATIVE LEAD DOSE WITH COGNITIVE FUNCTION?, Brian S. Caffo, Sining Chen, Walter Stewart, Karen Bolla, David Yousem, Christos Davatzikos, and Brian S. Schwartz
A CASE STUDY IN PHARMACOLOGIC IMAGING USING PRINCIPAL CURVES IN SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY, Brian S. Caffo, Ciprian M. Crainiceanu, Lijuan Deng, and Craig W. Hendrix
A SURVEY OF THE LIKELIHOOD APPROACH TO BIOEQUIVALENCE TRIALS, Leena Choi, Brian S. Caffo, and Charles Rohde
RANDOM EFFECTS MODELS IN A META-ANALYSIS OF THE ACCURACY OF DIAGNOSTIC TESTS WITHIN A GOLD STANDARD IN THE PRESENCE OF MISSING DATA, Haitao Chu, Sining Chen, and Thomas A. Louis
The Integrative Correlation Coefficient: a Measure of Cross-study Reproducibility for Gene Expressionea Array Data, Leslie M. Cope, Liz Garrett-Mayer, Edward Gabrielson, and Giovanni Parmigiani
Bayesian Analysis for Penalized Spline Regression Using Win BUGS, Ciprian M. Crainiceanu, David Ruppert, and M.P. Wand
IDENTIFYING EFFECT MODIFIERS IN AIR POLLUTION TIME-SERIES STUDIES USING A TWO-STAGE ANALYSIS, Sandrah P. Eckel and Thomas A. Louis
ASSESSING THE UNRELIABILITY OF THE MEDICAL LITERATURE: A RESPONSE TO "WHY MOST PUBLISHED RESEARCH FINDINGS ARE FALSE", Steven Goodman and Sander Greenland
MULTIPLE MODEL EVALUATION ABSENT THE GOLD STANDARD VIA MODEL COMBINATION, Edwin J. Iversen, Jr.; Giovanni Parmigiani; and Sining Chen
TRENDS IN PARTICULATE MATTER AND MORTALITY: AN APPROACH TO THE ASSESSMENT OF UNMEASURED CONFOUNDING, Holly Janes, Francesca Dominici, and Scott Zeger
MULTIPLE DISEASES IN CARRIER PROBABILITY ESTIMATION: ACCOUNTING FOR SURVIVING ALL CANCERS OTHER THAN BREAST AND OVARY IN BRCAPRO, Hormuzd A. Katki, Amanda Blackford, Sining Chen, and Giovanni Parmigiani
FAST ADAPTIVE PENALIZED SPLINES, Tatyana Krivobokova, Ciprian M. Crainiceanu, and Goran Kauermann
EFFECTIVE COMMUNICATION OF STANDARD ERRORS AND CONFIDENCE INTERVALS, Thomas A. Louis and Scott L. Zeger
DECOMPOSITION OF REGRESSION ESTIMATORS TO EXPLORE THE INFLUENCE OF "UNMEASURED" TIME-VARYING CONFOUNDERS, Yun Lu and Scott L. Zeger
OPTIMAL PROPENSITY SCORE STRATIFICATION, Jessica A. Myers and Thomas A. Louis
TRAB: TESTING WHETHER MUTATION FREQUENCIES ARE ABOVE AN UNKNOWN BACKGROUND, Giovanni Parmigiani, Sining Chen, and Victor E. Velculescu
STATISTICAL METHODS FOR THE ANALYSIS OF CANCER GENOME SEQUENCING DATA, Giovanni Parmigiani, J. Lin, Simina Boca, T. Sjoblom, K.W. Kinzler, V.E. Velculescu, and B. Vogelstein
A REPRODUCIBLE RESEARCH TOOLKIT FOR R, Roger Peng
A BAYESIAN HIERARCHICAL MODEL FOR CONSTRAINED DISTRIBUTED LAG FUNCTIONS: ESTIMATING THE TIME COURSE OF HOSPITALIZATION ASSOCIATED WITH AIR POLLUTION EXPOSURE, Roger Peng, Francesca Dominici, and Leah J. Welty
DISTRIBUTED REPRODUCIBLE RESEARCH USING CACHED COMPUTATIONS, Roger Peng and Sandrah P. Eckel
SEMIPARAMETRIC BIVARIATE QUANTILE-QUANTILE REGRESSION FOR ANALYZING SEMI-COMPETING RISKS DATA, Daniel O. Scharfstein, James M. Robins, and Mark van der Laan
A HIDDEN MARKOV MODEL FOR JOINT ESTIMATION OF GENOTYPE AND COPY NUMBER IN HIGH-THROUGHPUT SNP CHIPS, Robert B. Scharpf, Giovanni Parmigiani, Jonathan Pevnser, and Ingo Ruczinski
A BAYESIAN MODEL FOR CROSS-STUDY DIFFERENTIAL GENE EXPRESSION, Robert B. Scharpf, Hakon Tjelemeland, Giovanni Parmigiani, and Andrew B. Nobel
INFERENCE FOR SURVIVAL CURVES WITH INFORMATIVELY COARSENED DISCRETE EVENT-TIME DATA: APPLICATION TO ALIVE, Michelle Shardell, Daniel O. Scharfstein, David Vlahov, and Noya Galai
MODIFIED TEST STATISTICS BY INTER-VOXEL VARIANCE SHRINKAGE WITH AN APPLICATION TO fMRI, Shu-chih Su, Brian Caffo, Elizabeth Garrett-Mayer, and Susan Bassett
MORTALITY IN THE MEDICARE POPULATION AND CHRONIC EXPOSURE TO FINE PARTICULATE AIR POLLUTION , Scott L. Zeger, Francesca Dominici, Aidan McDermott, and Jonathan M. Samet
OPTIMIZED CROSS-STUDY ANALYSIS OF MICROARRAY-BASED PREDICTORS, Xiaogang Zhong, Luigi Marchionni, Leslie Cope, Edwin S. Iversen, Elizabeth S. Garrett-Mayer, Edward Gabrielson, and Giovanni Parmigiani
A SMOOTHING APPROACH TO DATA MASKING, Yijie Zhous, Francesca Dominici, and Thomas A. Louis
RACIAL DISPARITIES IN MORTALITY RISKS IN A SAMPLE OF THE U.S. MEDICARE POPULATION, Yijie Zhou, Francesca Dominici, and Thomas A. Louis
Papers from 2006
USE OF HIDDEN MARKOV MODELS FOR QTL MAPPING, Karl W. Broman
A FLEXIBLE GENERAL CLASS OF MARGINAL AND CONDITIONAL RANDOM INTERCEPT MODELS FOR BINARY OUTCOMES USING MIXTURES OF NORMALS, Brian Caffo, Ming-Wen An, and Charles A. Rohde
EXPLORATION, NORMALIZATION, AND GENOTYPE CALLS OF HIGH DENSITY OLIGONUCLEOTIDE SNP ARRAY DATA, Benilton Carvalho, Terence P. Speed, and Rafael A. Irizarry
BIVARIATE BINOMIAL SPATIAL MODELLING LOA loa PREVALENCE IN TROPICAL AFRICA, Ciprian M. Crainiceanu, Peter J. Diggle, and Barry Rowlingson
Adjustment Uncertainty in Effect Estimation, Ciprian M. Crainiceanu, Francesca Dominici, and Giovanni Parmigiani
COX MODELS WITH NONLINEAR EFFECT OF COVARIATES MEASURED WITH ERROR: A CASE STUDY OF CHRONIC KIDNEY DISEASE INCIDENCE, Ciprian M. Crainiceanu, David Ruppert, and Josef Coresh
PENALIZED LIKELIHOOD AND BAYESIAN METHODS FOR SPARSE CONTINGENCY TABLES: AN ANALYSIS OF ALTERNATIVE SPLICING IN FULL-LENGTH cDNA LIBRARIES, Corinne Dahinden, Giovanni Parmigiani, Mark C. Emerick, and Peter Buhlmann
INTERACTING WITH LOCAL AND REMOTE DATA RESPOSITORIES USING THE stashR PACKAGE, Sandrah P. Eckel and Roger Peng
A Comparative Analysis of the Chronic Effects of Fine Particulate Matter, Sorina E. Eftim, Holly Janes, Aidan McDermott, Jonathan M. Samet, and Francesca Dominici
INVESTIGATING MEDIATION WHEN COUNTERFACTUALS ARE NOT METAPHYSICAL: DOES SUNLIGHT UVB EXPOSURE MEDIATE THE EFFECT OF EYEGLASSES ON CATARACTS?, Brian Egleston, Daniel O. Scharfstein, Beatriz Munoz, and Sheila West
MULTIVARIATE ANALYSIS AND VISUALIZATION OF SPLICING CORRELATIONS IN SINGLE-GENE TRANSCRIPTOMES, Mark C. Emerick, Giovanni Parmigiani, and William S. Agnew
PRINCIPAL STRATIFICATION DESIGNS TO ESTIMATE INPUT DATA MISSING DUE TO DEATH, Constantine E. Frangakis, Donald B. Rubin, Ming-Wen An, and Ellen MacKenzie
FEATURE-LEVEL EXPLORATION OF THE CHOE ET AL. AFFYMETRIX GENECHIP CONTROL DATASET, Rafael A. Irizarry, Leslie Cope, and Zhijin Wu
ON THE POTENTIAL FOR ILL-LOGIC WITH LOGICALLY DEFINED OUTCOMES, Xianbin Li, Brian S. Caffo, and Daniel O. Scharfstein
RECURRENT EVENT MODELS IN THE PRESENCE OF A TERMINAL EVENT: COMPARISON, INFERENCE AND DATA ANALYSIS, Xianghua Luo and Mei-Cheng Wang
ON THE EQUIVALENCE OF CASE-CROSSOVER AND TIME SERIES METHODS IN ENVIRONMENTAL EPIDEMIOLOGY, Yun Lu and Scott L. Zeger
POOR PERFORMANCE OF BOOTSTRAP CONFIDENCE INTERVALS FOR THE LOCATION OF A QUANTITATIVE TRAIT LOUCS, Ani Manichaikul, Josee Dupuis, Saunak Sen, and Karl W. Broman
FDR and Bayesian Multiple Comparisons Rules, Peter Muller, Giovanni Parmigiani, and Kenneth Rice
INTERACTING WITH DATA USING THE FILEHASH PACKAGE FOR R, Roger Peng
GAMMA SHAPE MIXTURES FOR HEAVY-TAILED DISTRIBUTIONS, Sergio Venturini, Francesca Dominici, and Giovanni Parmigiani
ESTIMATING GENOME-WIDE COPY NUMBER USING ALLELE SPECIFIC MIXTURE MODELS, Wenyi Wang , Benilton Caravalho, Nate Miller, Jonathan Pevsner, Aravinda Chakravarti, and Rafael A. Irizarry
Papers from 2005
NONPARAMETRIC ESTIMATION OF BIVARIATE FAILURE TIME ASSOCIATIONS IN THE PRESENCE OF A COMPETING RISK, Karen Bandeen-Roche and Jing Ning
A User-Friendly Introduction to Link-Probit-Normal Models, Brian S. Caffo and Michael Griswold
Additive Hazards Models with Latent Treatment Effectiveness Lag Time, Ying Qing Chen, Charles A. Rohde, and Mei-Cheng Wang
A Mechanistic Latent Variable Model for Estimating Drug Concentrations in the Male Genital Tract, Leena Choi, Brian Caffo, Charles A. Rohde, Themba T. Ndovi, and Craig W. Hendrix
Analysis of Affymetrix GeneChip Data Using Amplified RNA, Leslie Cope, Scott M. Hartman, Hinrich W.H. Gohlmann, Jay P. Tiesman, and Rafael A. Irizarry
ON THE USE OF NON-EUCLIDEAN ISOTROPY IN GEOSTATISTICS, Frank C. Curriero
Searching for Differentially Expressed Gene Combinations, Marcel Dettling, Edward Gabrielson, and Giovanni Parmigiani
A Partial Likelihood for Spatio-temporal Point Processes, Peter J. Diggle
Spatio-temporal Point Processes: Methods and Applications, Peter J. Diggle
Does the Effect of Micronutrient Supplementation on Neonatal Survival Vary with Respect to the Percentiles of the Birth Weight Distribution?, Francesca Dominici, Scott L. Zeger, Giovanni Parmigiani, Joanne Katz, and Parul Christian
THE ROLE OF AN EXPLICIT CAUSAL FRAMEWORK IN AFFECTED SIB PAIR DESIGNS WITH COVARIATES , Constantine E. Frangakis, Fan Li, and Betty Q. Doan
Understanding the Continual Reassessment Method for Dose Finding Studies: An Overview for Non-Statisticians, Elizabeth Garrett-Mayer
MODELING DIFFERENTIATED TREATMENT EFFECTS FOR MULTIPLE OUTCOMES DATA, Hongfei Guo and Karen Bandeen-Roche
Analyzing Panel Count Data with Informative Observation Times, Chiung-Yu Huang, Mei-Cheng Wang, and Ying Zhang
Comparison of Affymetrix GeneChip Expression Measures, Rafael A. Irizarry, Zhijin Wu, and Harris A. Jaffee
Fixed-Width Output Analysis for Markov Chain Monte Carlo, Galin L. Jones, Murali Haran, Brian S. Caffo, and Ronald Neath
Designs in Partially Controlled Studies: Messages from a Review, Fan Li and Constantine E. Frangakis
Polydesigns and Causal Inference, Fan Li and Constantine E. Frangakis