Standard Deviation

Standard deviation is a mathematical value that is used to show the degree to which a given data point might deviate from the average, or mean, of the set to which it belongs. A small standard deviation means the data points are clustered close to the mean, while a larger value indicates that they are more spread out. The two main types of standard deviation are population standard deviation and sample standard deviation; which calculation to use depends on whether the data set being analyzed is complete or merely a representative sample of a larger set.

The standard deviation of a data set is the square root of the variance, which describes how far the data points are spread out from the mean. While the variance and the standard deviation are quite similar concepts, standard deviation is more useful in a real-world context, as it is expressed in the same units as the original data points, while the variance is expressed in those units squared. Thus, to determine the standard deviation of a data set, one must first determine the variance.

The first step in calculating the variance is to calculate the mean, which is done by adding all the values in the set together and then dividing them by the number of values in the set. Then subtract the mean from each individual value in the set and square each resulting difference. The goal of squaring the differences is to avoid dealing with negative numbers. Finally, calculate the mean of the squared differences by adding them all together and once again dividing by the number of values in the set. The number that results is the variance of the set. To determine the standard deviation, simply take the square root of the variance.

The above calculation is one of two basic formulas for standard deviation. It is often called the “population standard deviation” in order to differentiate it from the sample standard deviation. The population standard deviation is most accurate when the data points in the set represent the entirety of the data being analyzed. However, sometimes the data set is merely a sample of a larger population, and the results will be used to generalize about that larger population, such as when a fraction of a nation's residents are polled on a political issue and the results are extrapolated to represent the political attitudes of the entire nation. In these cases, the population standard deviation generally produces a value that is too low, so the sample standard deviation should be used instead. Though it still does not produce an entirely unbiased result, it is significantly more accurate.

To calculate the sample standard deviation, one must recalculate the variance to produce a sample variance. The calculation is the same except for the last step. Instead of dividing the sum of the squared differences by the number of values in the set, divide the sum by the number of values in the set minus one. This corrects for the tendency for the population standard deviation to be too low. Then, as before, simply take the square root of the sample variance to produce the sample standard deviation.

Altman, Douglas G., and J. Martin Bland. “Standard Deviations and Standard Errors.” BMJ 331.7521 (2005): 903. Print.

Hand, David J. Statistics: A Very Short Introduction. New York: Oxford UP, 2008. Print.

Kalla, Siddharth. “Calculate Standard Deviation.” Explorable. Explorable.com , 27 Sept. 2009. Web. 4 Oct. 2013.

Lane, David M. “Measures of Variability.” Online Statistics Education: An Interactive Multimedia Course of Study. Lane, n.d. Web. 4 Oct. 2013.

Orris, J. B. “A Visual Model for the Variance and Standard Deviation.” Teaching Statistics 33.2 (2011): 43–45. Print.

Taylor, Jeremy J. “Confusing Stats Terms Explained: Standard Deviation.” Stats Make Me Cry. Taylor, 1 Aug. 2010. Web. 4 Oct. 2013.

Urdan, Timothy C. Statistics in Plain English. 3rd ed. New York: Routledge, 2010. Print.

Weisstein, Eric W. “Standard Deviation.” Wolfram MathWorld. Wolfram Research, n.d. Web. 4 Oct. 2013.