Reliability refers to the consistency of a test, or the degree to which the test produces approximately the same results over time under similar conditions. Ultimately, reliability can be seen as a measure of a test's precision.
A number of different methods for estimating reliability can be used, depending on the types of items on the test, the characteristic(s) a test is intended to measure, and the test user's needs. The most commonly used methods to assess reliability are the test-retest, alternate form, and split-half methods. Each of these methods attempts to isolate particular sources and types of error.
Error is defined as variation due to extraneous factors. Such factors may be related to the test-taker, if for instance he or she is tired or ill the day of the test and it affects the score. Error may also be due to environmental factors in the testing situation, such as an uncomfortable room temperature or distracting noise.
Test-retest methods look at the stability of test scores over time by giving the same test to the same people after a reasonable time interval. These methods try to separate out the amount of error in a score related to the passing of time. In test-retest studies, scores from the first administration of a test are compared mathematically through correlation with later score(s).
Test-retest methods have some serious limitations, one of the most important being that the first test-taking experience may affect performance on the second test administration. For instance, the individual may perform better at the second testing, having learned from the first experience. Moreover, tests rarely show perfect test-retest reliability because many factors unrelated to the tested characteristic may affect the test score. In addition, test-retest methods are only suitable to use with tests of characteristics that are assumed to be stable over time, such as intelligence. They are unsuitable for tests of unstable characteristics like emotional states such as anger or anxiety.
The alternate-form method of assessing reliability is very similar to test-retest reliability except that a different form of the test in question is administered the second time. Here two forms of a test are created to be as similar as possible so that individual test items should cover the same material at the same level of ease or difficulty. The tests are administered to a sample and the scores on the two tests are correlated to yield a coefficient of equivalence. A high coefficient of equivalence indicates the overall test is reliable in that most or all of the items seem to be assessing the same characteristic. Low coefficients of equivalence indicate the two test forms are not assessing the same characteristic.
Alternate form administration may be varied by the time interval between testing. Alternate form with immediate testing tries to assess error variance in scores due to various errors in content sampling. Alternate form with delayed administration tries to separate out error variance due to both the passage of time and to content sampling. Alternate-form reliability methods have many of the same limitations as test-retest methods.
Split-half reliability methods consist of a number of methods used to assess a test's internal consistency, or the degree to which all of the items are assessing the same characteristic. In split-half methods a test is divided into two forms and scores on the two forms are correlated with each other. This correlation coefficient is called the coefficient of reliability. The most common way to split the items is to correlate even-numbered items with odd-numbered items.