Linear discriminant analysis (LDA), a classical method in pattern recognition and machine learning, has been widely used to characterize or separate multiple classes via linear combinations of features. However, the high-dimensionality of the high-throughput features obtained from modern biological experiments, for example, microarray or proteomics, defies traditional discriminant analysis techniques. The possible interfeature correlations present additional challenges and are often under-utilized in modeling. In this paper, by incorporating the possible inter-feature correlations, we propose a Covariance-Enhanced Discriminant Analysis (CEDA) method that simultaneously and consistently selects informative features and identifies the corresponding discriminable classes. We show that, under mild regularity conditions, the proposed method can achieve consistency in parameter estimation as well as in model selection, and attain asymptotic optimal misclassification rate. Extensive simulations have verified the utility of the method. We have applied the method to study a renal transplantation trial, which was designed to identify genomic signatures that can identify kidneys with various functional types, a crucial step in drug development.



Included in

Biostatistics Commons