Background: New health status instruments are described by psychometric properties, such as Reliability, Effect Size, and Responsiveness. For cluster-randomized trials, another important statistic is the Intraclass Correlation for the instrument within clusters. Studies using better instruments can be performed with smaller sample sizes, but better instruments may be more expensive in terms of dollars, lost opportunities, or poorer data quality due to the response burden of longer instruments. Investigators often need to estimate the psychometric properties of a new instrument, or of an established instrument in a new setting. Optimal sample sizes for estimating these properties have not been studied in detail.
Methods: We examined the power of a two-sample test as a function of the Reliability, Effect Size, Responsiveness, and Intraclass Correlation of the instrument. We calculated the “cost-effectiveness” of using a 1-item versus a 5-item measure of mental health status. We also used simulation to determine formulas for the sample size needed to estimate the psychometric statistics accurately.
Findings: Under the usual model for measurement error, the psychometric statistics are all functions of the same error term. In randomized trials, a poorer instrument can achieve the desired power if the number of persons per treatment group is increased. In cluster-randomized trials, adequate power may be obtained by increasing the number of clusters per treatment group (and often the number of persons per cluster), as well as by choosing a better instrument. The 1-item measure of mental health status may be more cost-effective than the 5-item measure in some settings. Most published psychometric values are situation-specific. Very large samples are required to estimate Responsiveness and the Intraclass Correlation accurately.
Conclusion: If the goal is to diagnose or refer individual patients, an instrument with high Validity and Reliability is needed. In settings where the sample sizes can be increased easily, less reliable instruments may be cost-effective. It is likely that many values of published psychometric statistics were derived from samples too small to provide accurate values, or are importantly specific to the setting in which they were derived.
Note: A paper based on some of the material in this technical report has been published. (Diehr P, Chen L, Patrick D, Feng Z, Yasui Y. Reliability, effect size, and responsiveness of health status measures in the design of randomized and cluster-randomized trials. Contemporary Clinical Trials. 2005; 26:45-58. B). That paper does not include the material on estimating the sample size required to provide an accurate estimate of the reliability of a new instrument. That material is included in this technical report.
Biostatistics | Health Services Research
Diehr, Paula; Chen, Lu; Patrick, Donald L.; Feng, Ziding; and Yasui, Yutaka, "Reliability, Effect Size, and Responsiveness and Intraclass Correlation of Health Status Measures Used in Randomized and Cluster-Randomized Trials" (March 2006). UW Biostatistics Working Paper Series. Working Paper 284.