3. Describing Data Numerically

Standard Deviation

3. Describing Data Numerically

Standard Deviation: Videos & Practice Problems

Topic summary

The standard deviation (s) is a crucial measure of variation that indicates how spread out data values are around the mean. A higher standard deviation signifies greater dispersion. To calculate it, use the formula: $\sqrt{\frac{1}{(n - 1)} (\sum x^{2} - \frac{\sum}{x} / n)}$ . Understanding standard deviation is essential for analyzing data distributions and conducting statistical tests.

concept

Calculating Standard Deviation

Video duration:

Video transcript

So we've spent a lot of time talking about different measures of center, specifically the mean and median as the most important ones, which are numbers that try to tell us where the center of a data set is. But oftentimes, we'll need to know more information, like about how those values are distributed. So in this video, we're going to talk about something a bit different, and I'm going to talk about an extremely important calculation that you need to know, a variable called the standard deviation. By the end of this video, you'll understand in plain English what the standard deviation is. I'll show you how to calculate it using some equations, and we'll go over some examples and practice problems.

So let's just jump right in. The standard deviation is not a measure of center; it's what's called a measure of variation. It's a number just like the mean and the median. It's a number that represents essentially how spread out the data values are. The letter that we use for standard deviation is "s," and basically, s is a number that's greater than or equal to zero. And the higher it is, the more spread out the numbers are. Here's an example:

I've got these two sets of data: thirteen, fourteen, fifteen, sixteen, seventeen, and then five, ten, fifteen, twenty, twenty-five. You can pause the video yourself if you don't believe me, but the means of both of these sets of data are actually both 15. So they're both 15, but there's clearly something different about them. In this example over here, these numbers are a little bit more bunched up. And if you were to calculate the standard deviation, you would find that it's 1.58, which doesn't mean anything by itself.

If you were to calculate the standard deviation for the right numbers—five, ten, fifteen, twenty, and twenty-five—you would find that the standard deviation is much, much higher. Here, the s is low because the numbers are less spread out. They're more bunched up around 15. Here, the s is higher, somewhere like eight, and it's because the data is more spread out around 15.

That's the basic idea of standard deviation. Okay, I didn't tell you how to calculate the s's because that's actually what we're going to do in this example here. Let's get started with our main example. We're going to calculate and find the mean and standard deviation of the sample of numbers that we find here. So, let's get started with the first part over here because finding the mean is something that we've done a bunch.

To calculate the mean, remember this equation over here. If we want to figure out a mean of a sample, that's x bar, just add up everything and divide by the total number of observations. So in other words, we have five plus ten plus twelve plus fourteen plus three plus four, and then divide by the number of values, which in this case, n is equal to six. If you add all these things up, what you're going to get is 48 divided by six, which is a mean of eight.

One of the things I love to do in these problems when you're given a list of numbers like this, or if you're given horizontal numbers like numbers that are arranged horizontally, I like to put them in a table where the numbers actually are going down the columns so we can add them up much more easily. So we can see here that another way you could have done this is to arrange them like this and then add all these things up, and you would have gotten 48. The mean of this sample over here is eight.

That's basically a measure where the center of the numbers are. That doesn't tell us how spread out the data is, and that's what we're going to calculate in part b. So how do we calculate the standard deviation? Well, here's where we're going to take a look at the equations here. There are basically two different forms of this equation that you're going to see. One looks a lot nastier but actually turns out to be easier to use, and a more compact version of this that's shorter but involves more calculations. Now ultimately, if your professor has a preference or your course requires you to use a certain method, go ahead and stick to that. But otherwise, I always find that the easiest one to use is this equation over here.

So that's the one I'm going to use. Both will get you the right answer. What is this equation telling me? It looks really nasty at first, but, basically, I'm going to take the square roots of a massive set of numbers, and I've got one over n minus one. So that goes out in front. I actually know what n is because n is just equal to six. So, in other words, this just becomes one over six minus one. Then I've got a parenthesis over here, and I've got this giant sigma, which remember means I'm going to add up a bunch of stuff of x squared.

So essentially, that's just going to be a number. I'm going to take a bunch of x squared, whatever those are, and I'm going to add them up. I'm going to add them all up, and I'm going to get a number out of this. And I'm going to subtract another number that's sigma x squared divided by n. So in other words, there will be another number over here that gets squared, and then I'm going to divide by n again, which, in this case, is six.

Even though this equation looks somewhat intimidating, it's basically once you know what n is, there are really only two numbers that you have to figure out. Let's start with the easiest one, which is actually going to be the sigma x that's in parentheses because we've already used that. Sigma x is just the formula that we used or the symbol that we used in our mean calculation. It's where we added up all of the numbers. Essentially, it's this 48 over here. Once you take all your data values and add them all up, that's 48.

What we're going to do here is we're going to take this 48 and remember we're going to have to square it. That's one of the missing numbers that we have in the box. The only thing we have to do is just figure out what this other one is. Let's take a look at that. This 48 was sigma x, right, in parentheses squared. That's 48 squared. That's not the same thing as this sigma x squared. In this case, you're actually adding up all of the data values that are already squared. So in these problems, what I usually do is add another column. Again, I always add these columns like this for x squared values. And what you're going to do is just square each one of these things, and I'm just going to write them over here.

So I have five squared, which is 25, ten squared, which is a hundred, then I've got twelve squared, which is 144, fourteen squared, which is 196, then I've got three squared, which is nine, and then four squared, which is 16. Now are any of these my answers over here? Well, no. Because just as we did with the x column where we added everything up to get to the bottom, we're going to do the exact same thing over here. This number over here when you add everything up is going to be 490. And this represents sigma x squared, not parentheses sigma x squared as this 48 squared. So what goes in here is going to be 490.

So in other words, just to color code these things, the 490 actually comes from here and the 48 actually comes from this column over here. Now that we've gotten these numbers, we're done. All we have to do is just simplify. So this actually ends up being one over, one over five times this is 490 minus 48 squared, which turns out to be 2,304, divided by six. And if you work this out, by the way, what you're going to get is s equals 4.6.

So that is your sigma of 4.6. Cool. So that's the calculation of standard deviation. One last thing I want to talk about here, by the way, is how you would actually use this equation just in case you really have to know how to use it. Again, I mentioned that it's shorter and involves more calculations. But essentially what you're going to do here is you will take each one of the values, x, subtract it from the mean, and then square that. So what you'll do in these cases is you'll create two other columns over here where you'll take each one of these x's and subtract it from x bar, which, remember, is the eight, the mean that you calculated over here.

So in this case, what you would do is you'd take five minus eight, and you'll get a negative number, which is negative three. It's perfectly fine to have negatives in this column here. Once you do all of that for the rest of the values, you won't have to add those up, but the next thing you'll do is essentially square all of those things. So in this case, I got a negative three when I did five minus eight, a number minus the mean. And I'm just going to take negative three and square that, and that becomes nine. You just repeat this for the rest of the columns down here.

So I've got all these numbers that are already filled out. And, basically, if you add all of these things straight down, what you'll end up getting is 106. So another way you could have calculated this—and I'll write this down—is that s could have been the square root of 106 divided by five, but you have to do all these intermediate calculations to get there.

And by the way, you would still get the same answer of 4.6. So that's how to calculate the standard deviation using both of those equations. Again, you may have to use one versus the other. The last point I want to make here is that we used s for the standard deviation. But for populations, you may see other symbols. You may see little sigma, you may see mu, and then you may see big N instead of s, x bar, and little n. These are just formatting things depending on whether you're talking about a sample versus a population. But otherwise, the equations are mostly the same.

Unless it's explicitly stated, you should always just use the s equation. Alright, folks, that's it for this one. Let me know if you have any questions, and I'll see you in the next one. Let's get some practice.

Problem

An economist analyzes the quarterly GDP growth over the past 5 quarters, shown below. Calculate the standard deviation of the data.

3.36%

0.29%

2.50%

2.27%

example

Calculating Standard Deviation Example 1

Video duration:

Video transcript

Everyone, let's go ahead and see if we can figure this out. We have three samples of students taking some kind of quiz, and we're going to create histograms out of all those three samples. These histograms show the number of correct answers, where the higher the bar, the more number of correct answers. Now without calculating standard deviation by hand, which would be incredibly tedious for this problem, we're supposed to just rank the standard deviations of each sample from least to greatest.

Alright? In other words, the whole problem here is I'm just going to have three numbers, in which they're going to be assorted from least to greatest. So, I just have to figure out where S1, S2, and S3 sort of go here. Let's take a look.

I've got these three samples, and they're all kind of different. I've got these numbers, which are sort of like I've got these two columns that are kind of off by themselves, and I've got this really bunched-up column. And then, I've got this column that is somewhere in the middle.

It kind of looks like a normal distribution. Remember that basically, S is related to how spread out the numbers are.

Standard deviation is a measure of how spread out the data is. Looking at each one of these groups, which one is the least spread out? I've got these two batches of numbers in which one is really low here, and I've got these two that are kind of high in terms of numbers of questions asked. The mean is going to be somewhere in the middle.

This is X bar, and all the values are actually pretty spread out and far from that mean. This is the most spread out. In contrast, in sample number two, all of these numbers are basically bunched up together. This actually is the least spread out.

Again, if S is the standard deviation, a measure of how spread out the numbers are, the higher the S, the more spread out. So you can see here that the second sample will have a low S. Grading them from least to greatest, the second sample is going to have the least standard deviation, then S3, and then the first sample, which will have the highest S. This is the correct order without calculating anything.

Let me know if you have any questions, and let's move on.

Here’s what students ask on this topic:

What is the standard deviation and why is it important in statistics?

The standard deviation is a measure of variation that indicates how spread out data values are around the mean. It is crucial in statistics because it provides insight into the dispersion of data points within a dataset. A higher standard deviation signifies greater variability, meaning the data points are more spread out from the mean. Conversely, a lower standard deviation indicates that the data points are closer to the mean, suggesting less variability. Understanding standard deviation is essential for analyzing data distributions effectively, as it helps in comparing the consistency of different datasets and assessing the reliability of statistical conclusions.

Created using AI

How do you calculate the standard deviation of a sample?

To calculate the standard deviation of a sample, use the formula: $\frac{1}{n - 1} \sum (x^{2} - \frac{\sum}{x} / n$ . First, find the mean of the sample data. Then, subtract the mean from each data point and square the result. Sum these squared differences, divide by the number of observations minus one, and take the square root of the result. This calculation provides the standard deviation, which reflects the spread of the sample data around the mean.

Created using AI

What is the difference between standard deviation and variance?

Standard deviation and variance are both measures of dispersion in a dataset, but they differ in their calculation and interpretation. Variance is the average of the squared differences from the mean, providing a measure of how much the data points deviate from the mean. It is calculated using the formula: $\frac{\sum}{(^{x - mean) 2}} n$ . Standard deviation is the square root of the variance, offering a more intuitive measure of dispersion that is in the same units as the data. While variance provides a squared measure, standard deviation gives a direct measure of spread, making it easier to interpret in the context of the original data.

Created using AI

How does standard deviation help in comparing datasets?

Standard deviation is a valuable tool for comparing datasets because it quantifies the amount of variation or dispersion within each dataset. When comparing two or more datasets, the standard deviation can reveal which dataset has more variability. A higher standard deviation indicates that the data points are more spread out, while a lower standard deviation suggests that the data points are closer to the mean. This comparison helps in understanding the consistency and reliability of the datasets. For example, in business, comparing the standard deviation of sales figures across different months can indicate which month had more consistent sales performance.

Created using AI

What are the common symbols used for standard deviation in statistics?

In statistics, the common symbols used for standard deviation vary depending on whether the data represents a sample or a population. For a sample, the standard deviation is typically denoted by the letter 's'. For a population, it is often represented by the Greek letter sigma (σ). Additionally, the mean of a sample is denoted by 'x̄', while the mean of a population is represented by the Greek letter mu (μ). These symbols help differentiate between sample and population statistics, ensuring clarity in statistical analysis and communication.

Created using AI

Your Statistics for Business tutor

Patrick Ford

Physics and Math Lead Instructor