Biostatistics creates and applies methods for quantitative research in the health sciences. Our faculty conduct research across the spectrum of statistical science from foundations of inference to the discovery of new methodology to health applications. Our designs and analytic methods enable health scientists and professionals in academia, government, pharmaceutical companies, medical research organizations and elsewhere to efficiently acquire knowledge and draw valid conclusions from their ever-expanding sources of information.
A collection of working papers and related research documents from the department faculty may be found here.
Further information about the department may be found at www.biostat.jhsph.edu.
Papers from 2012
BOOTSTRAP-BASED INFERENCE ON THE DIFFERENCE IN THE MEANS OF TWO CORRELATED FUNCTIONAL PROCESSES, Ciprian M. Crainiceanu, Ana-Maria Staicu, Shubankar Ray, and Naresh Punjabi
AUTOMATED DIAGNOSES OF ATTENTION DEFICIT HYPERACTIVE DISORDER USING MAGNETIC RESONANCE IMAGING, Ani Eloyan, John Muschelli, Mary Beth Nebel, Han Liu, Fang Han, Tuo Zhao, Anita Barber, Suresh Joel, James J. Pekar, Stewart Mostofsky, and Brian Caffo
PENALIZED FUNCTION-ON-FUNCTION REGRESSION, A.E. Ivanescu, A.M. Staicu, S. Greven, F. Scheipl, and Ciprian M. Crainiceanu
CONFIDENCE INTERVALS FOR THE SELECTED POPULATION IN RANDOMIZED TRIALS THAT ADAPT THE POPULATION ENROLLED, Michael Rosenblum
GLOBAL LIKELIHOOD RATIO TESTS FOR THE MEAN STRUCTURE OF CORRELATED FUNCTIONAL PROCESSES, Ana-Maria Staicu, Yingxing Li, Ciprian Crainiceanu, and David M. Ruppert
Modeling populations of sleep hypnograms, Bruce J. Swihart, Naresh M. Punjabi, and Ciprian M. Crainiceanu
Papers from 2011
MOVELETS: A DICTIONARY OF MOVEMENT, Jiawei Bai, Jeff Goldsmith, Brian Caffo, Thomas A. Glass, and Ciprian M. Crainiceanu
Reduced Bayesian Hierarchical Models: Estimating Health Effects of Simultaneous Exposure to Multiple Pollutants, Jennifer F. Bobb, Francesca Dominici, and Roger D. Peng
MODIFICATION BY FRAILTY STATUS OF AMBIENT AIR POLLUTION EFFECTS ON LUNG FUNCTION IN OLDER ADULTS IN THE CARDIOVASCULAR HEALTH STUDY, Sandrah P. Eckel, Thomas A. Louis, Paulo H.M. Chaves, Linda P. Fried, and Helene G. Margolis
LIKELIHOOD BASED POPULATION INDEPENDENT COMPONENT ANALYSIS, Ani Eloyan, Ciprian M. Crainiceanu, and Brian S. Caffo
CORRECTED CONFIDENCE BANDS FOR FUNCTIONAL DATA USING PRINCIPAL COMPONENTS, Jeff Goldsmith, Sonja Greven, and Ciprian M. Crainiceanu
REMOVING TECHNICAL VARIABILITY IN RNA-SEQ DATA USING CONDITIONAL QUANTILE NORMALIZATION, Kasper D. Hansen, Rafael A. Irizarry, and Zhijin Wu
POPULATION FUNCTIONAL DATA ANALYSIS OF GROUP ICA-BASED CONNECTIVITY MEASURES FROM fMRI, Shanshan Li, Brian S. Caffo, Suresh Joel, Stewart Mostofsky, James Pekar, and Susan Spear Bassett
Flexible Distributed Lag Models using Random Functions with Application to Estimating Mortality Displacement from Heat-Related Deaths, Roger D. Peng
CONSONANT MULTIPLE TESTING PROCEDURES FOR OVERALL AND SUBPOPULATION TREATMENT EFFECTS IN RANDOMIZED TRIALS, Michael Rosenblum
SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION, Michael Rosenblum and Mark J. van der Laan
POPULATION-WIDE MODEL-FREE QUANTIFICATION OF BLOOD-BRAIN-BARRIER DYNAMICS IN MULTIPLE SCLEROSIS, Russell T. Shinohara, Ciprian Crainiceanu, Brian Caffo, María Inés Gaitán, and Daniel Reich
LONGITUDINAL ANALYSIS OF SPATIOTEMPORAL PROCESSES: A CASE STUDY OF DYNAMIC CONTRAST-ENHANCED MAGNETIC RESONANCE IMAGING IN MULTIPLE SCLEROSIS, Russell T. Shinohara, Ciprian M. Crainiceanu, Brian S. Caffo, and Daniel S. Reich
A BROAD SYMMETRY CRITERION FOR NONPARAMETRIC VALIDITY OF PARAMETRICALLY-BASED TESTS IN RANDOMIZED TRIALS, Russell T. Shinohara, Constantine E. Frangakis, and Constantine G.. Lyketos
Assessing Association for Bivariate Survival Data with Interval Sampling: A Copula Model Approach with Application to AIDS Study, Hong Zhu and Mei-Cheng Wang
FUNCTIONAL PRINCIPAL COMPONENTS MODEL FOR HIGH-DIMENSIONAL BRAIN IMAGING, Vadim Zipunnikov, Brian S. Caffo, David M. Yousem, Christos Davatzikos, Brian S. Schwartz, and Ciprian Crainiceanu
LONGITUDINAL HIGH-DIMENSIONAL DATA ANALYSIS, Vadim Zipunnikov, Sonja Greven, Brian Caffo, Daniel S. Reich, and Ciprian Crainiceanu
Papers from 2010
ACCURATE GENOME-SCALE PERCENTAGE DNA METHYLATION ESTIMATES FROM MICROARRAY DATA, Martin J. Aryee, Zhijin Wu, Christine Ladd-Acosta, Brian Herb, Andrew P. Feinberg, Srinivasan Yegnasurbramanian, and Rafael A. Irizarry
A DECISION-THEORY APPROACH TO INTERPRETABLE SET ANALYSIS FOR HIGH-DIMENSIONAL DATA, Simina Maria Boca, Hector C. Bravo, Brian Caffo, Jeffrey T. Leek, and Giovanni Parmigiani
WAVELET BASED FUNCTIONAL MODELS FOR TRANSCRIPTOME ANALYSIS WITH TILING ARRAYS, Lieven Clement, Kristof DeBeuf, Ciprian Crainiceanu, Olivier Thas, Marnik Vuylsteke, and Rafael Irizarry
POPULATION VALUE DECOMPOSITION, A FRAMEWORK FOR THE ANALYSIS OF IMAGE POPULATIONS, Ciprian M. Crainiceanu, Brian S. Caffo, Sheng Luo, and Vadim Zipunnikov
MULTILEVEL SPARSE FUNCTIONAL PRINCIPAL COMPONENT ANALYSIS, Chong-Zhi Di and Ciprian M. Crainiceanu
Likelihood Ratio Testing for Admixture Models with Application to Genetic Linkage Analysis, Chong-Zhi Di and Kung-Yee Liang
SURROGATE SCREENING MODELS FOR THE LOW PHYSICAL ACTIVITY CRITERION OF FRAILTY, Sandrah P. Eckel, Karen Bandeen-Roche, Paulo H.M. Chaves, Linda P. Fried, and Thomas A. Louis
LONGITUDINAL PENALIZED FUNCTIONAL REGRESSION, Jeff Goldsmith, Ciprian M. Crainiceanu, Brian Caffo, and Daniel Reich
PENALIZED FUNCTIONAL REGRESSION, Jeff Goldsmith, Jennifer Feder, Ciprian M. Crainiceanu, Brian Caffo, and Daniel Reich
ESTIMATING TEMPORAL ASSOCIATIONS IN ELECTROCORTICOGRAPHIC (ECoG) TIME SERIES WITH FIRST ORDER PRUNING, Haley Hedlin, Dana Boatman, and Brian Caffo
REGRESSION ADJUSTMENT AND STRATIFICATION BY PROPENSTY SCORE IN TREATMENT EFFECT ESTIMATION, Jessica A. Myers and Thomas A. Louis
USING THE R PACKAGE crlmm FOR GENOTYPING AND COPY NUMBER ESTIMATION, Robert B. Scharpf, Rafael Irizarry, Walter Ritchie, Benilton Carvalho, and Ingo Ruczinski
MODELING FUNCTIONAL DATA WITH SPATIALLY HETEROGENEOUS SHAPE CHARACTERISTICS, Ana-Maria Staicu, Ciprian M. Crainiceanu, Daniel S. Reich, and David Ruppert
THE USE OF PROPENSITY SCORES TO ASSESS THE GENERALIZABILITY OF RESULTS FROM RANDOMIZED TRIALS, Elizabeth A. Stuart, Stephen R. Cole, Catherine P. Bradshaw, and Philip J. Leaf
A unified approach to modeling multivariate binary data using copulas over partitions, Bruce J. Swihart, Brian Caffo, and Ciprian Crainiceanu
Mixed effect Poisson log-linear models for clinical and epidemiological sleep hypnogram data, Bruce J. Swihart; Brian S. Caffo PhD; Ciprian Crainiceanu PhD; and Naresh M. Punjabi PhD, MD
MULTILEVEL FUNCTIONAL PRINCIPAL COMPONENT ANALYSIS FOR HIGH-DIMENSIONAL DATA, Vadim Zipunnikov, Brian Caffo, Ciprian Crainiceanu, David M. Yousem, Christos Davatzikos, and Brian S. Schwartz
Papers from 2009
QUANTIFYING UNCERTAINTY IN GENOTYPE CALLS, Benilton Carvalho, Thomas A. Louis, and Rafael A. Irizarry
BAYESIAN FUNCTIONAL DATA ANALYSIS USING WinBUGS, Ciprian M. Crainiceanu and A. Jeffrey Goldsmith
COMBINATIONAL MIXTURES OF MULTIPARAMETER DISTRIBUTIONS, Valeria Edefonti and Giovanni Parmigiani
NONLINEAR TUBE-FITTING FOR THE ANALYSIS OF ANATOMICAL AND FUNCTIONAL STRUCTURES, Jeff Goldsmith, Brian S. Caffo, Ciprian Crainiceanu, Daniel Reich, Yong Du, and Craig Hendrix
A Spatio-Temporal Approach for Estimating Chronic Effects of Air Pollution, Sonja Greven, Francesca Dominici, and Scott L. Zeger
On the Behaviour of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models, Sonja Greven and Thomas Kneib
COVARIATE-ADJUSTED NONPARAMETRIC ANALYSIS OF MAGNETIC RESONANCE IMAGES USING MARKOV CHAIN MONTE CARLO, Haley Hedlin, Brian S. Caffo, Ziyad Mahfoud, and Susan Spear Bassett
GENERALIZED LIQUID ASSOCIATION, Yen-Yi Ho, Leslie Cope, Thomas A. Louis, and Giovanni Parmigiani
MODEL-BASED QUALITY ASSESSMENT AND BASE-CALLING FOR SECOND-GENERATION SEQUENCING DATA, Rafael A. Irizarry and Hector Corrada Bravo
GENE SET ENRICHMENT ANALYSIS MADE SIMPLE, Rafael A. Irizarry, Chi Wang, Yun Zhou, and Terence P. Speed
TRIO LOGIC REGRESSION - DETECTION OF SNP - SNP INTERACTIONS IN CASE-PARENT TRIOS, Qing Li, Thomas A. Louis, M. Daniele Fallin, and Ingo Ruczinski
EFFICIENT EVALUATION OF RANKING PROCEDURES WHEN THE NUMBER OF UNITS IS LARGE WITH APPLICATION TO SNP IDENTIFICATION, Thomas A. Louis and Ingo Ruczinski
FROZEN ROBUST MULTI-ARRAY ANALYSIS (fRMA), Matthew N. McCall, Benjamin M. Bolstad, and Rafael A. Irizarry
Caching and Visualizing Statistical Analyses, Roger D. Peng and Duncan Temple Lang
ASSOCIATON TESTS THAT ACCOMMODATE GENOTYPING ERRORS, Ingo Ruczinski, Qing Li, Benilton Carvalho, M. Daniele Fallin, Rafael A. Irizarry, and Thomas A. Louis
A MULTILEVEL MODEL TO ADDRESS BATCH EFFECTS IN COPY NUMBER ESTIMATION USING SNP ARRAYS, Robert B. Scharpf, Ingo Ruczinski, Benilton Carvalho, Betty Doan, Aravinda Chakravarti, and Rafael A. Irizarry
A MULTILEVEL MODEL TO ADDRESS BATCH EFFECTS IN COPY NUMBER USING SNP ARRAYS, Robert B. Scharpf, Ingo Ruczinski, Benilton Carvalho, Betty Doan, Aravinda Chakravarti, and Rafael A. Irizarry
Estimating effects by combining instrumental variables with case-control designs: the role of principal stratification, Russell T. Shinohara, Constantine E. Frangakis, Elizabeth Platz, and Konstantinos Tsilidis
LASAGNA PLOTS: A SAUCY ALTERNATIVE TO SPAGHETTI PLOTS, Bruce Swihart, Brian Caffo, Bryan D. James, Matthew Strand, Brian S. Schwartz, and Naresh M. Punjabi
Modeling multilevel sleep transitional data via Poisson log-linear multilevel models, Bruce J. Swihart, Brian Caffo, Ciprian Crainiceanu, and Naresh M. Punjabi
A BAYESIAN SHRINKAGE MODEL FOR INCOMPLETE LONGITUDINAL BINARY DATA WITH APPLICATION TO THE BREAST CANCER PREVENTION TRIAL, C. Wang, M.J. Daniels, Daniel O. Scharfstein, and S. Land
REDEFINING CpG ISLANDS USING A HIDEEN MARKOV MODEL, Hao Wu, Brain Caffo, Harris A. Jaffee, Andrew P. Feinberg, and Rafael A. Irizarry
Subset Quantile Normalization using Negative Control Features, Zhijin Wu
Analyzing Bivariate Survival Data with Interval Sampling and Application to Cancer Epidemiology, Hong Zhu and Mei-Cheng Wang
Papers from 2008
LIKELIHOOD ESTIMATION OF CONJUGACY RELATIONSHIPS IN LINEAR MODELS WITH APPLICATIONS TO HIGH-THROUGHPUT GENOMICS, Brian S. Caffo, Liu Dongmei, Robert Scharpf, and Giovanni Parmigiani
AN OVERVIEW OF OBSERVATIONAL SLEEP RESEARCH WITH APPLICATION TO SLEEP STAGE TRANSITIONING, Brian S. Caffo, B. Swihart, A. Laffan, C. Crainiceanu, and N. Punjabi
Bayesian Model Averaging for Clustered Data: Imputing Missing Daily Air Pollution Concentration, Howard H. Chang, Francesca Dominici, and Roger D. Peng
GENERALIZED MULTILEVEL FUNCTIONAL REGRESSION, Ciprian M. Crainiceanu, Ana-Maria Staicu, and Chongzhi Di
Multilevel Latent Class Models with Dirichlet Mixing Distribution, Chongzhi Di and Karen Bandeen-Roche
GEOSTATISTICAL INFERENCE UNDER PREFERENTIAL SAMPLING, Peter J. Diggle, Raquel Menezes, and Ting-li Su
MODEL SELECTION AND HEALTH EFFECT ESTIMATION IN ENVIRONMENTAL EPIDEMIOLOGY, Francesca Dominici, Chi Wang, Ciprian Crainiceanu, and Giovanni Parmigiani
A NOVEL AND SIMPLE RULE OF THUMB FOR MULTIPLICITY CONTROL IN EQUIVALENCE TESTING USING TWO ONE-SIDED TESTS, Carolyn Lauzon and Brian S. Caffo
JOINTLY MODELING CONTINUOUS AND BINARY OUTCOMES FOR BOOLEAN OUTCOMES: AN APPLICATION TO MODELING HYPERTENSION, Xianbin Li, Brian S. Caffo, and Elizabeth Stuart
BAYESIAN INFERENCE FOR SMOKING CESSATION WITH A LATENT CURE STATE, Sheng Luo, Ciprian M. Crainiceanu, Thomas A. Louis, and Nilanjan Chatterjee
LEARNING FROM NEAR MISSES IN MEDICATION ERRORS: A BAYESIAN APPROACH, Jessica A. Myers, Francesca Dominici, and Laura Morlock
DESIGN AND ANALYSIS ISSUES IN GENOME-WIDE SOMATIC MUTATION STUDIES OF CANCER, Giovanni Parmigiani, Simina Boca, Jimmy Lin, Kenneth W. Kinzler, Victor E. Velculescu, and Bert Vogelstein
A Method for Visualizing Multivariate Time Series Data, Roger D. Peng
Caching and Distributing Statistical Analyses in R, Roger D. Peng
Spatial Misalignment in time series studies of air pollution and health data, Roger D. Peng and Michelle L. Bell
ANALYSIS OF SUBGROUP EFFECTS IN RANDOMIZED TRIALS WHEN SUBGROUP MEMBERSHIP IS INFORMATIVELY MISSING: APPLICATION TO THE MADIT II STUDY, Daniel O. Scharfstein, Georgiana Onicescu, and Steven Goodman
ON THE MERITS OF VOXEL-BASED MORPHOMETRIC PATH-ANALYSIS FOR INVESTIGATING VOLUMETRIC MEDIATION OF A TOXICANT'S INFLUENCE ON COGNITIVE FUNCTION, Shu-chih Su, Brian S. Caffo, Lynn E. Eberly, Elizabeth Garrett-Mayer, Walter F. Stewart, Sining Chen, David Yousem, Christos Davatzikos, and Brian Schwartz
A BAYESIAN APPROACH TO EFFECT ESTIMATION ACCOUNTING FOR ADJUSTMENT UNCERTAINTY, Chi Wang, Giovanni Parmigiani, Ciprian Crainiceanu, and Francesca Dominici
Estimating the Causal Effect of Lower Tidal Volume Ventilation on Survival in Patients with Acute Lung Injury, Weiwei Wang, Daniel Scharfstein, Roy Brower, and Dale Needham
Causal Inference in Observational Studies with Outcome-Dependent Sampling, Weiwei Wang, Daniel Scharfstein, Zhiqiang Tan, and Ellen J. MacKenzie
STATISTICAL METHODS FOR AUTOMATED DRUG SUSCEPTIBILITY TESTING: BAYESIAN MINIMUM INHIBITORY CONCENTRATION PREDICTION FROM GROWTH CURVES, Xi Zhou, Merlise A. Clyde, James Garrett, Viridiana Lourdes, Michael O'Connell, Giovanni Parmigiani, David J. Turner, and Tim Wiles
Papers from 2007
A BAYESIAN HIERARCHICAL FRAMEWORK FOR SPATIAL MODELING OF fMRI DATA, F. DuBois Bowman, Brian S. Caffo, Susan Spear Bassett, and Clinton Kilts
FORECASTING THE GLOBAL BURDEN OF ALZHEIMER'S DISEASE, Ron Brookmeyer, Elizabeth Johnson, Kathryn Ziegler-Graham, and H. Michael Arrighi
IS MRI-BASED VOLUME A MEDIATOR OF THE ASSOCIATION OF CUMULATIVE LEAD DOSE WITH COGNITIVE FUNCTION?, Brian S. Caffo, Sining Chen, Walter Stewart, Karen Bolla, David Yousem, Christos Davatzikos, and Brian S. Schwartz
A CASE STUDY IN PHARMACOLOGIC IMAGING USING PRINCIPAL CURVES IN SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY, Brian S. Caffo, Ciprian M. Crainiceanu, Lijuan Deng, and Craig W. Hendrix
A SURVEY OF THE LIKELIHOOD APPROACH TO BIOEQUIVALENCE TRIALS, Leena Choi, Brian S. Caffo, and Charles Rohde
RANDOM EFFECTS MODELS IN A META-ANALYSIS OF THE ACCURACY OF DIAGNOSTIC TESTS WITHIN A GOLD STANDARD IN THE PRESENCE OF MISSING DATA, Haitao Chu, Sining Chen, and Thomas A. Louis
The Integrative Correlation Coefficient: a Measure of Cross-study Reproducibility for Gene Expressionea Array Data, Leslie M. Cope, Liz Garrett-Mayer, Edward Gabrielson, and Giovanni Parmigiani
Bayesian Analysis for Penalized Spline Regression Using Win BUGS, Ciprian M. Crainiceanu, David Ruppert, and M.P. Wand
IDENTIFYING EFFECT MODIFIERS IN AIR POLLUTION TIME-SERIES STUDIES USING A TWO-STAGE ANALYSIS, Sandrah P. Eckel and Thomas A. Louis
ASSESSING THE UNRELIABILITY OF THE MEDICAL LITERATURE: A RESPONSE TO "WHY MOST PUBLISHED RESEARCH FINDINGS ARE FALSE", Steven Goodman and Sander Greenland
MULTIPLE MODEL EVALUATION ABSENT THE GOLD STANDARD VIA MODEL COMBINATION, Edwin J. Iversen, Jr.; Giovanni Parmigiani; and Sining Chen
TRENDS IN PARTICULATE MATTER AND MORTALITY: AN APPROACH TO THE ASSESSMENT OF UNMEASURED CONFOUNDING, Holly Janes, Francesca Dominici, and Scott Zeger
MULTIPLE DISEASES IN CARRIER PROBABILITY ESTIMATION: ACCOUNTING FOR SURVIVING ALL CANCERS OTHER THAN BREAST AND OVARY IN BRCAPRO, Hormuzd A. Katki, Amanda Blackford, Sining Chen, and Giovanni Parmigiani
FAST ADAPTIVE PENALIZED SPLINES, Tatyana Krivobokova, Ciprian M. Crainiceanu, and Goran Kauermann
EFFECTIVE COMMUNICATION OF STANDARD ERRORS AND CONFIDENCE INTERVALS, Thomas A. Louis and Scott L. Zeger
DECOMPOSITION OF REGRESSION ESTIMATORS TO EXPLORE THE INFLUENCE OF "UNMEASURED" TIME-VARYING CONFOUNDERS, Yun Lu and Scott L. Zeger
