Variance
Variance is a mathematical expression of how data points are spread across a data set. Such expressions are known as measures of dispersion since they indicate how values are dispersed throughout a population. The variance is the average or mean of the squares of the distance each data point in a set is from the mean of all the data points in the set. Mathematically, variance is represented as σ2, according to the equation: σ2 = [(x1-μ)2 + (x2-μ)2 + (x3-μ)2 +...(xn-μ)2]/n; where x1,2,3,.....n are the values of specific variables; μ is the mean, or average, of all the values; and n is the total number of values. Variance is commonly replaced in applications by its square root, which is known as the standard deviation or σ.
Variance is one of several measures of dispersion which are used to evaluate the spread of a distribution of numbers. Such measures are important because they provide ways of obtaining information about data sets without considering all of the elements of the data individually.
To understand variance, one must first understand something about other measures of dispersion. One measure of dispersion is the "average of deviations." This value is equal to the average, for a set of numbers, of the differences between each number and the set's mean. The mean (also known as the average) is simply the sum of the numbers in a given set divided by the number of entries in the set. For the set of eight test scores: 7 + 25 + 36 + 44 + 59 + 71 + 85 + 97, the mean is 53. The deviation from the mean for any given value is that value minus the value of the mean. For example, the first number in the set, 7, has a deviation from the mean of -46; the second number, 25, has a deviation from the mean of -28; and so on. However, the sum of these deviations across the entire data set will be equal to 0 (since by definition the mean is the "middle" value with all other values being above or below it.) A measure that will show how much deviation is involved without having these deviations add up to zero would be more useful in evaluating data. Such a nonzero sum can be obtained by adding the absolute values of the deviations. This average is the absolute mean deviation. However, for reasons that will not be dealt with here, even this expression has limited application.
A still more informative measure of dispersion can be obtained by squaring the deviations from the mean, adding them, and dividing by the number of scores; this value is known as the average squared deviation or "variance." For example, in the series of test scores cited above, the variance can be calculated as follows:
Theoretically, the value of σ2 should relate valuable information regarding the spread of data. However, in order for this concept to be applied in practical situations (we cannot talk about squared test scores) we may elect to use the square root of the variance. This value is called the standard deviation of the scores. For this series of test scores the standard deviation is the square root of 825.38 or 28.73. In general, a small standard deviation indicates that the data are clustered closely around the mean; a large standard deviation shows that the data are more spread apart.
While modern computerization reduces the need for laborious statistical calculations, it is still necessary to understand and interpret the concept of variance and its daughter, standard deviation, in order to digest the statistical significance of data. For example, teachers must be thoroughly familiar with these statistical tools in order to properly interpret test data.
See also Set theory; Statistics.
Resources
Books
Dunham, William. Journey Through Genius. New York: John Wiley & Sons Inc., 1990.
Facade, Harold P., and Kenneth B. Cummins. The Teaching of Mathematics from Counting to Calculus. Columbus, OH: Charles E. Merrill Publishing Co., 1970.
Lloyd, G.E.R. Early Greek Science: Thales to Aristotle. New York: W.W. Norton and Company, 1970.
Randy Schueller
Additional topics
Science EncyclopediaScience & Philosophy: Two-envelope paradox to Venus