Johns Hopkins University, Dept. of Biostatistics Working PapersCopyright (c) 2014 Johns Hopkins University All rights reserved.
http://biostats.bepress.com/jhubiostat
Recent documents in Johns Hopkins University, Dept. of Biostatistics Working Papersen-usMon, 24 Mar 2014 11:31:39 PDT3600INTER-ADAPT - AN INTERACTIVE TOOL FOR DESIGNING AND EVALUATING RANDOMIZED TRIALS WITH ADAPTIVE ENROLLMENT CRITERIA
http://biostats.bepress.com/jhubiostat/paper262
http://biostats.bepress.com/jhubiostat/paper262Fri, 14 Mar 2014 12:09:58 PDT
We consider the problem of designing a randomized trial when there is prior evidence that the experimental treatment may be more effective for certain groups of participants, such as those with a certain biomarker or risk score at baseline. Randomized trial designs have been proposed that dynamically adapt enrollment criteria based on accrued data, with the goal of learning if the treatment benefits the overall population, only a certain subpopulation, or neither. We introduce the interAdapt software tool, a Shiny application which provides a user friendly interface for constructing and evaluating certain adaptive trial designs. These designs are automatically compared to standard (non-adaptive) designs in terms of the following performance criteria: power, sample size, and trial duration. interAdapt is open-source and cross-platform, and is the first to implement the group sequential, adaptive enrichment designs of Rosenblum et al., 2013).
]]>
Aaron Joel Fisher et al.VARIABLE-DOMAIN FUNCTIONAL REGRESSION
http://biostats.bepress.com/jhubiostat/paper261
http://biostats.bepress.com/jhubiostat/paper261Wed, 05 Feb 2014 09:18:44 PST
We introduce a class of scalar-on-function regression models with subject-specific functional predictor domains. The fundamental idea is to consider a bivariate functional parameter that depends both on the functional argument and on the width of the functional predictor domain. Both parametric and nonparametric models are introduced to fit the functional coefficient. The nonparametric model is theoretically and practically invariant to functional support transformation, or support registration. Methods were motivated by and applied to a study of association between daily measures of the Intensive Care Unit (ICU) Sequential Organ Failure Assessment (SOFA) score and two outcomes: in-hospital mortality, and physical impairment at hospital discharge among survivors. Methods are generally applicable to a large number of new studies that record a continuous variables over unequal domains.
]]>
Jonathan E. Gellar et al.ADAPTIVE RANDOMIZED TRIAL DESIGNS THAT CANNOT BE DOMINATED BY ANY STANDARD DESIGN AT THE SAME TOTAL SAMPLE SIZE
http://biostats.bepress.com/jhubiostat/paper260
http://biostats.bepress.com/jhubiostat/paper260Fri, 31 Jan 2014 13:12:05 PST
Prior work has shown that certain types of adaptive designs can always be dominated by a suitably chosen, standard, group sequential design. This applies to adaptive designs with rules for modifying the total sample size. A natural question is whether analogous results hold for other types of adaptive designs. We focus on adaptive enrichment designs, which involve preplanned rules for modifying enrollment criteria based on accrued data in a randomized trial. Such designs often involve multiple hypotheses, e.g., one for the total population and one for a predefined subpopulation, such as those with high disease severity at baseline. We fix the total sample size, and consider overall power, defined as the probability of rejecting at least one false null hypothesis. We present adaptive enrichment designs whose overall power at two alternatives cannot simultaneously be matched by any standard design. In some scenarios there is a substantial gap between the overall power achieved by these adaptive designs and that of any standard design. We also prove that such gains in overall power come at a cost. To attain overall power above what is achievable by certain standard designs, it is necessary to increase power to reject some hypotheses and reduce power to reject others. We conclude by showing the class of adaptive enrichment designs allows certain power tradeoffs that are not available when restricting to standard designs. We illustrate our results in the context of planning a hypothetical, randomized trial of a new antidepressant, using data distributions from (Kirsch et al., 2008).
]]>
Michael RosenblumJoint Estimation of Multiple Graphical Models from High Dimensional Time Series
http://biostats.bepress.com/jhubiostat/paper259
http://biostats.bepress.com/jhubiostat/paper259Thu, 26 Dec 2013 07:09:18 PST
In this manuscript the problem of jointly estimating multiple graphical models in high dimensions is considered. It is assumed that the data are collected from n subjects, each of which consists of m non-independent observations. The graphical models of subjects vary, but are assumed to change smoothly corresponding to a measure of the closeness between subjects. A kernel based method for jointly estimating all graphical models is proposed. Theoretically, under a double asymptotic framework, where both (m,n) and the dimension d can increase, the explicit rate of convergence in parameter estimation is provided, thus characterizing the strength one can borrow across different individuals and impact of data dependence on parameter estimation. Empirically, experiments on both synthetic and real resting state functional magnetic resonance imaging (rs-fMRI) data illustrate the effectiveness of the proposed method.
]]>
Huitong Qiu et al.Sparse Median Graphs Estimation in a High Dimensional Semiparametric Model
http://biostats.bepress.com/jhubiostat/paper258
http://biostats.bepress.com/jhubiostat/paper258Thu, 26 Dec 2013 07:05:22 PST
In this manuscript a unified framework for conducting inference on complex aggregated data in high dimensional settings is proposed. The data are assumed to be a collection of multiple non-Gaussian realizations with underlying undirected graphical structures. Utilizing the concept of median graphs in summarizing the commonality across these graphical structures, a novel semiparametric approach to modeling such complex aggregated data is provided along with robust estimation of the median graph, which is assumed to be sparse. The estimator is proved to be consistent in graph recovery and an upper bound on the rate of convergence is given. Experiments on both synthetic and real datasets are conducted to illustrate the empirical usefulness of the proposed models and methods.
]]>
Fang Han et al.Soft Null Hypotheses: A Case Study of Image Enhancement Detection in Brain Lesions
http://biostats.bepress.com/jhubiostat/paper257
http://biostats.bepress.com/jhubiostat/paper257Wed, 26 Jun 2013 13:01:11 PDT
This work is motivated by a study of a population of multiple sclerosis (MS) patients using dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) to identify active brain lesions. At each visit, a contrast agent is administered intravenously to a subject and a series of images is acquired to reveal the location and activity of MS lesions within the brain. Our goal is to identify and quantify lesion enhancement location at the subject level and lesion enhancement patterns at the population level. With this example, we aim to address the difficult problem of transforming a qualitative scientific null hypothesis, such as "this voxel does not enhance", to a well-defined and numerically testable null hypothesis based on existing data. We call the procedure "soft null hypothesis" testing as opposed to the standard "hard null hypothesis" testing. This problem is fundamentally different from: 1) testing when a quantitative null hypothesis is given; 2) clustering using a mixture distribution; or 3) identifying a reasonable threshold with a parametric null assumption. We analyze a total of 20 subjects scanned at 63 visits (~30Gb), the largest population of such clinical brain images.
]]>
Haochang Shou et al.TRIAL DESIGNS THAT SIMULTANEOUSLY OPTIMIZE THE POPULATION ENROLLED AND THE TREATMENT ALLOCATION PROBABILITIES
http://biostats.bepress.com/jhubiostat/paper256
http://biostats.bepress.com/jhubiostat/paper256Tue, 18 Jun 2013 09:04:25 PDT
Standard randomized trials may have lower than desired power when the treatment effect is only strong in certain subpopulations. This may occur, for example, in populations with varying disease severities or when subpopulations carry distinct biomarkers and only those who are biomarker positive respond to treatment. To address such situations, we develop a new trial design that combines two types of preplanned rules for updating how the trial is conducted based on data accrued during the trial. The aim is a design with greater overall power and that can better determine subpopulation specific treatment effects, while maintaining strong control of the familywise Type I error rate. The first component of our design involves response-adaptive randomization, in which the probability of being assigned to the treatment or control arm is updated during the trial to target an optimal allocation. The second component of our design involves enrichment, where the criteria for patient enrollment may be modified to help learn which subpopulations benefit from the treatment. We do a simulation study to compare the power of our design, which we call a response-adaptive enrichment design, to three simpler designs: a standard randomized trial design, a response-adaptive design, and an enrichment design. Our simulation study compares these designs in scenarios that arise from the problem of testing the effectiveness of a hypothetical new antidepressant.
]]>
Brandon S. Luber et al.Structured Functional Principal Component Analysis
http://biostats.bepress.com/jhubiostat/paper255
http://biostats.bepress.com/jhubiostat/paper255Tue, 30 Apr 2013 09:35:21 PDT
Motivated by modern observational studies, we introduce a class of functional models that expands nested and crossed designs. These models account for the natural inheritance of correlation structure from sampling design in studies where the fundamental sampling unit is a function or image. Inference is based on functional quadratics and their relationship with the underlying covariance structure of the latent processes. A computationally fast and scalable estimation procedure is developed for ultra-high dimensional data. Methods are illustrated in three examples: high-frequency accelerometer data for daily activity, pitch linguistic data for phonetic analysis, and EEG data for studying electrical brain activity during sleep.
]]>
Haochang Shou et al.PENALIZED FUNCTION-ON-FUNCTION REGRESSION
http://biostats.bepress.com/jhubiostat/paper254
http://biostats.bepress.com/jhubiostat/paper254Tue, 23 Apr 2013 09:50:30 PDT
We propose a general framework for smooth regression of a functional response on one or multiple functional predictors. Using the mixed model representation of penalized regression expands the scope of function on function regression to many realistic scenarios. In particular, the approach can accommodate a densely or sparsely sampled functional response as well as multiple functional predictors that are observed: 1) on the same or different domains than the functional response; 2) on a dense or sparse grid; and 3) with or without noise. It also allows for seamless integration of continuous or categorical covariates and provides approximate confidence intervals as a by-product of the mixed model inference. The proposed methods are accompanied by easy to use and robust software implemented in the pffr function of the R package refund. Methodological developments are general, but were inspired by and applied to a Diffusion Tensor Imaging (DTI) brain tractography dataset.
]]>
Andrada E. Ivanescu et al.OPTIMAL TESTS OF TREATMENT EFFECTS FOR THE OVERALL POPULATION AND TWO SUBPOPULATIONS IN RANDOMIZED TRIALS, USING SPARSE LINEAR PROGRAMMING
http://biostats.bepress.com/jhubiostat/paper253
http://biostats.bepress.com/jhubiostat/paper253Tue, 23 Apr 2013 06:49:10 PDT
We propose new, optimal methods for analyzing randomized trials, when it is suspected that treatment effects may differ in two predefined subpopulations. Such sub-populations could be defined by a biomarker or risk factor measured at baseline. The goal is to simultaneously learn which subpopulations benefit from an experimental treatment, while providing strong control of the familywise Type I error rate. We formalize this as a multiple testing problem and show it is computationally infeasible to solve using existing techniques. Our solution involves a novel approach, in which we first transform the original multiple testing problem into a large, sparse linear program. We then solve this problem using advanced optimization techniques. This general method can solve a variety of multiple testing problems and decision theory problems related to optimal trial design, for which no solution was previously available. In particular, we construct new multiple testing procedures that satisfy minimax and Bayes optimality criteria. For a given optimality criterion, our new approach yields the optimal tradeoff between power to detect an effect in the overall population versus power to detect effects in subpopulations. We demonstrate our approach in examples motivated by two randomized trials of new treatments for HIV.
]]>
Michael Rosenblum et al.Homotopic Group ICA for Multi-Subject Brain Imaging Data
http://biostats.bepress.com/jhubiostat/paper252
http://biostats.bepress.com/jhubiostat/paper252Thu, 07 Mar 2013 11:53:29 PST
Independent Component Analysis (ICA) is a computational technique for revealing latent factors that underlie sets of measurements or signals. It has become a standard technique in functional neuroimaging. In functional neuroimaging, so called group ICA (gICA) seeks to identify and quantify networks of correlated regions across subjects. This paper reports on the development of a new group ICA approach, Homotopic Group ICA (H-gICA), for blind source separation of resting state functional magnetic resonance imaging (fMRI) data. Resting state brain functional homotopy is the similarity of spontaneous fluctuations between bilaterally symmetrically opposing regions (i.e. those symmetric with respect to the mid-sagittal plane) (Zuo et al., 2010). The approach we proposed improves network estimates by leveraging this known brain functional homotopy. H-gICA increases the potential for network discovery, effectively by averaging information across hemispheres. It is theoretically proven to be identical to standard group ICA when the true sources are both perfectly homotopic and noise-free, while simulation studies and data explorations demonstrate its benefits in the presence of noise. Moreover, compared to commonly applied group ICA algorithms, the structure of the H-gICA input data leads to significant improvement in computational efficiency. A simulation study comfirms its effectiveness in homotopic, non-homotopic and mixed settings, as well as on the landmark ADHD-200 dataset. From a relatively small subset of data, several brain networks were found including: the visual, the default mode and auditory networks, as well as others. These were shown to be more contiguous and clearly delineated than the corresponding ordinary group ICA. Finally, in addition to improving network estimation, H-gICA facilitates the investigation of functional homotopy via ICA-based networks.
]]>
Juemin Yang et al.PREDICTING HUMAN MOVEMENT TYPE BASED ON MULTIPLE ACCELEROMETERS USING MOVELETS
http://biostats.bepress.com/jhubiostat/paper251
http://biostats.bepress.com/jhubiostat/paper251Thu, 07 Mar 2013 11:52:56 PST
We introduce statistical methods for prediction of types of human movement based on three tri-axial accelerometers worn simultaneously at the hip, left, and right wrist. We compare the individual performance of the three accelerometers using movelets and propose a new prediction algorithm that integrates the information from all three accelerometers. The development is motivated by a study of 20 older subjects who were instructed to perform 15 different types of activities during in-laboratory sessions. The differences in the prediction performance for different activity types among the three accelerometers reveal subtle yet important insights into how the intrinsic physical features of human movements could be effectively utilized in prediction. The proposed integrative movelet method takes into account those findings to augment the prediction accuracy and improve our understanding of human movement measurements.
]]>
Bing He et al.Adaptive, Group Sequential Designs that Balance the Benefits and Risks of Wider Inclusion Criteria
http://biostats.bepress.com/jhubiostat/paper250
http://biostats.bepress.com/jhubiostat/paper250Mon, 04 Feb 2013 10:03:35 PST
In designing a Phase III randomized trial, care must be taken in selecting the target population. Advantages of enrolling from a larger population include wider generalizability of results and faster recruitment. However, earlier trials (e.g. Phase II trials) and medical knowledge may provide stronger evidence of a treatment effect for certain subpopulations. This makes a Phase III trial that targets the overall population more risky, since if the treatment only benefits a subpopulation, there may be low power to detect this. We propose new adaptive, group sequential designs aimed at gaining the advantages of wider generalizability and faster recruitment, while mitigating the risks of including a population for which there is greater a priori uncertainty. These designs use preplanned rules for changing the enrollment criteria if the participants from predefined subpopulations are not benefiting from the new treatment. .We demonstrate these adaptive designs in the context of a Phase III trial of a new treatment for stroke, and compare them to standard, group sequential designs in terms of expected sample size..
]]>
Michael Rosenblum et al.FAST COVARIANCE ESTIMATION FOR HIGH-DIMENSIONAL FUNCTIONAL DATA
http://biostats.bepress.com/jhubiostat/paper249
http://biostats.bepress.com/jhubiostat/paper249Wed, 09 Jan 2013 09:11:16 PST
For smoothing covariance functions, we propose two fast algorithms that scale linearly with the number of observations per function. Most available methods and software cannot smooth covariance matrices of dimension J x J with J>500; the recently introduced sandwich smoother is an exception, but it is not adapted to smooth covariance matrices of large dimensions such as J \ge 10,000. Covariance matrices of order J=10,000, and even J=100,000$ are becoming increasingly common, e.g., in 2- and 3-dimensional medical imaging and high-density wearable sensor data. We introduce two new algorithms that can handle very large covariance matrices: 1) FACE: a fast implementation of the sandwich smoother and 2) SVDS: a two-step procedure that first applies singular value decomposition to the data matrix and then smoothes the eigenvectors. Compared to existing techniques, these new algorithms are at least an order of magnitude faster in high dimensions and drastically reduce memory requirements. The new algorithms provide instantaneous (few seconds) smoothing for matrices of dimension J=10,000 and very fast ($<$ 10 minutes) smoothing for J=100,000. Although SVDS is simpler than FACE, we provide ready to use, scalable R software for FACE. When incorporated into R package {\it refund}, FACE improves the speed of penalized functional regression by an order of magnitude, even for data of normal size (J <500). We recommend that FACE be used in practice for the analysis of noisy and high-dimensional functional data.
]]>
Luo Xiao et al.LONGITUDINAL FUNCTIONAL MODELS WITH STRUCTURED PENALTIES
http://biostats.bepress.com/jhubiostat/paper248
http://biostats.bepress.com/jhubiostat/paper248Fri, 02 Nov 2012 11:29:56 PDT
Collection of functional data is becoming increasingly common including longitudinal observations in many studies. For example, we use magnetic resonance (MR) spectra collected over a period of time from late stage HIV patients. MR spectroscopy (MRS) produces a spectrum which is a mixture of metabolite spectra, instrument noise and baseline profile. Analysis of such data typically proceeds in two separate steps: feature extraction and regression modeling. In contrast, a recently-proposed approach, called partially empirical eigenvectors for regression (PEER) (Randolph, Harezlak and Feng, 2012), for functional linear models incorporates a priori knowledge via a scientifically-informed penalty operator in the regression function estimation process. We extend the scope of PEER to the longitudinal setting with continuous outcomes and longitudinal functional covariates. The method presented in this paper: 1) takes into account external information; and 2) allows for a time-varying regression function. In the proposed approach, we express the time-varying regression function as linear combination of several time-invariant component functions; the time dependence enters into the regression function through their coefficients. The estimation procedure is easy to implement due to its mixed model equivalence. We derive the precision and accuracy of the estimates and discuss their connection with the generalized singular value decomposition. Real MRS data and simulations are used to illustrate the concepts.
]]>
Madan G. Kundu et al.RESTRICTED LIKELIHOOD RATIO TESTS FOR FUNCTIONAL EFFECTS IN THE FUNCTIONAL LINEAR MODEL
http://biostats.bepress.com/jhubiostat/paper247
http://biostats.bepress.com/jhubiostat/paper247Tue, 07 Aug 2012 13:38:17 PDT
The goal of our article is to provide a transparent, robust, and computationally feasible statistical approach for testing in the context of scalar-on-function linear regression models. In particular, we are interested in testing for the necessity of functional effects against standard linear models. Our methods are motivated by and applied to a large longitudinal study involving diffusion tensor imaging of intracranial white matter tracts in a susceptible cohort. In the context of this study, we conduct hypothesis tests that are motivated by anatomical knowledge and which support recent findings regarding the relationship between cognitive impairment and white matter demyelination. R-code and data are provided to reproduce the application.
]]>
Bruce J. Swihart et al.Component extraction of Complex Biomedical signal and performance analysis based on different algorithm
http://biostats.bepress.com/jhubiostat/paper246
http://biostats.bepress.com/jhubiostat/paper246Sat, 14 Jul 2012 12:07:46 PDT
Biomedical signals can arise from one or many sources including heart ,brains and endocrine systems. Multiple sources poses challenge to researchers which may have contaminated with artifacts and noise. The Biomedical time series signal are like electroencephalogram(EEG),electrocardiogram(ECG),etc The morphology of the cardiac signal is very important in most of diagnostics based on the ECG. The diagnosis of patient is based on visual observation of recorded ECG,EEG,etc, may not be accurate. To achieve better understanding , PCA (Principal Component Analysis) and ICA algorithms helps in analyzing ECG signals . The immense scope in the field of biomedical-signal processing Independent Component Analysis( ICA ) is gaining momentum due to huge data base requirement for quality testing This paper describes some algorithms of ICA in brief, such as Fast-ICA, Kernel-ICA, MS –ICA, JADE, EGLD-ICA ,Robust ICA etc. The quality & performance of some of the ICA algorithms are tested and analysis of each can be done with respect to Noise/Artifacts, SIR(Signal Interference Ratio),PI(performance Index). The most common bioelectric signals are EEG and ECG. The experimental results presented in the paper show that the proposed here to indentify the various components with higher accuracy in the particular algorithm based on classifying biomedical data.
]]>
hemant pasusangai kasturiwaleANALYTIC PROGRAMMING WITH fMRI DATA: A QUICK-START GUIDE FOR STATISTICIANS USING R
http://biostats.bepress.com/jhubiostat/paper245
http://biostats.bepress.com/jhubiostat/paper245Sat, 14 Jul 2012 12:07:18 PDT
Functional magnetic resonance imaging (fMRI) is a thriving field that plays an important role in medical imaging analysis, biological and neuroscience research and practice. This manuscript gives a didactic introduction to the statistical analysis of fMRI data using the R project along with the relevant R code. The goal is to give tatisticians who would like to pursue research in this area a quick start for programming with fMRI data along with the available data visualization tools.
]]>
Ani Eloyan et al.MODELING SLEEP FRAGMENTATION IN POPULATIONS OF SLEEP HYPNOGRAMS
http://biostats.bepress.com/jhubiostat/paper243
http://biostats.bepress.com/jhubiostat/paper243Tue, 05 Jun 2012 09:21:01 PDT
We introduce methods for the analysis of large populations of sleep architectures (hypnograms) that respect the 5-state 20-transition-type structure defined by the American Academy of Sleep Medicine. By applying these methods to the hypnograms of 5598 subjects from the Sleep Heart Health Study we: 1) provide the firrst analysis of sleep hypnogram data of such size and complexity in a community cohort with a 4-level comorbidity; 2) compare 5-state 20-transition-type sleep to 3-state 6-transition-type sleep for a check of feasibility and information-loss; 3) extend current approaches to multivariate survival data analysis to populations of time-to-transition processes; and 4) provide scalable solutions for data analyses required by the case study. This allows us to provide detailed new insights into the association between sleep apnea and sleep architecture. Supporting R as well as SAS code and data are included in the online supplementary materials.
]]>
Bruce J. Swihart et al.LIKELIHOOD RATIO TESTS FOR THE MEAN STRUCTURE OF CORRELATED FUNCTIONAL PROCESSES
http://biostats.bepress.com/jhubiostat/paper242
http://biostats.bepress.com/jhubiostat/paper242Wed, 02 May 2012 09:29:29 PDT
The paper introduces a general framework for testing hypotheses about the structure of the mean function of complex functional processes. Important particular cases of the proposed framework are: 1) testing the null hypotheses that the mean of a functional process is parametric against a nonparametric alternative; and 2) testing the null hypothesis that the means of two possibly correlated functional processes are equal or differ by only a simple parametric function. A global pseudo likelihood ratio test is proposed and its asymptotic distribution is derived. The size and power properties of the test are confirmed in realistic simulation scenarios. Finite sample power results indicate that the proposed test is much more powerful than competing alternatives. Methods are applied to testing the equality between the means of normalized δ-power of sleep electroencephalograms of subjects with sleep-disordered breathing and matched controls.
]]>
Ana-Maria Staicu et al.