## Calculating Reliability

Estimates of

reliability are based on statistical calculations, such as split-half correlations, alternative-form correlations, and test-retest correlations, that attempt to correlate a test to itself. Split-half estimates of reliability randomly divide test questions into two halves and then compare the halves to each other. The

Spearman-Brown prediction formula is used to adjust for the fact that the reliability of the test is reduced by cutting the item length in half.

Test-retest reliability compares test scores at multiple points in time. It is a valuable measure of reliability when the construct being measured is assumed to be stable over time (e.g., IQ). It is less useful, however, when measuring a construct that is expected to change over time (such as mood).

The most common method of estimating reliability of questionnaires used to measure treatment outcomes is

Cronbach's coefficient alpha. This is usually described as the mean of all possible split half correlations. The statistic is a measure of the internal consistency of the measure and the degree to which the test can be seen as measuring a single underlying

latent variable.

Item analyses employing the coefficient alpha permit the researcher to test reliability of sets of items, in order to examine the effects of dropping or adding individual items to a test. Ideally, a test used to assess treatment outcomes should have a reliability of .9 or higher.

The use of coefficient alpha is closely related to

factor analysis. In psychometric theory, items that correlate strongly with one another are said to load on the same factor and are assumed to measure a shared construct. A questionnaire with items loading on two or more separate and unrelated factors will have lower overall reliability due to the fact that the questionnaire is in fact measuring numerous latent variables. In this case, the test should be scored using subscales of items on the same factor. Each of the subscales may have higher reliability than the whole test, if the whole test was scored as a single scale.

Factor analyses of outcome questionnaires have identified numerous factors such as symptoms of depression or anxiety, interpersonal difficulties, health concerns, etc. These factors, however, also tend to correlate with one another. This finding has led many researchers to conclude that patient self-report outcome questionnaires are largely measuring a shared latent variable, referred to as the

global distress factor.

Reliability of a test increases with the number of items, but only up to a point. Adding additional items, beyond 10-15 well chosen items loading on a single factor, does not tend to produce significant gains in reliability. This fact supports the use of relatively brief measures of outcomes, comprised of items loading on the global distress factor, especially if the questionnaire will be administered at frequent intervals during treatment.

--

JebBrown - 06 Jan 2007