So we've spent a lot of time talking about different measures of center, specifically the mean and median as the most important ones, which are numbers that try to tell us where the center of a data set is. But oftentimes, we'll need to know more information, like about how those values are distributed. So in this video, we're going to talk about something a bit different, and I'm going to talk about an extremely important calculation that you need to know, a variable called the standard deviation. By the end of this video, you'll understand in plain English what the standard deviation is. I'll show you how to calculate it using some equations, and we'll go over some examples and practice problems.
So let's just jump right in. The standard deviation is not a measure of center; it's what's called a measure of variation. It's a number just like the mean and the median. It's a number that represents essentially how spread out the data values are. The letter that we use for standard deviation is "s," and basically, s is a number that's greater than or equal to zero. And the higher it is, the more spread out the numbers are. Here's an example:
I've got these two sets of data: thirteen, fourteen, fifteen, sixteen, seventeen, and then five, ten, fifteen, twenty, twenty-five. You can pause the video yourself if you don't believe me, but the means of both of these sets of data are actually both 15. So they're both 15, but there's clearly something different about them. In this example over here, these numbers are a little bit more bunched up. And if you were to calculate the standard deviation, you would find that it's 1.58, which doesn't mean anything by itself.
If you were to calculate the standard deviation for the right numbers—five, ten, fifteen, twenty, and twenty-five—you would find that the standard deviation is much, much higher. Here, the s is low because the numbers are less spread out. They're more bunched up around 15. Here, the s is higher, somewhere like eight, and it's because the data is more spread out around 15.
That's the basic idea of standard deviation. Okay, I didn't tell you how to calculate the s's because that's actually what we're going to do in this example here. Let's get started with our main example. We're going to calculate and find the mean and standard deviation of the sample of numbers that we find here. So, let's get started with the first part over here because finding the mean is something that we've done a bunch.
To calculate the mean, remember this equation over here. If we want to figure out a mean of a sample, that's x bar, just add up everything and divide by the total number of observations. So in other words, we have five plus ten plus twelve plus fourteen plus three plus four, and then divide by the number of values, which in this case, n is equal to six. If you add all these things up, what you're going to get is 48 divided by six, which is a mean of eight.
One of the things I love to do in these problems when you're given a list of numbers like this, or if you're given horizontal numbers like numbers that are arranged horizontally, I like to put them in a table where the numbers actually are going down the columns so we can add them up much more easily. So we can see here that another way you could have done this is to arrange them like this and then add all these things up, and you would have gotten 48. The mean of this sample over here is eight.
That's basically a measure where the center of the numbers are. That doesn't tell us how spread out the data is, and that's what we're going to calculate in part b. So how do we calculate the standard deviation? Well, here's where we're going to take a look at the equations here. There are basically two different forms of this equation that you're going to see. One looks a lot nastier but actually turns out to be easier to use, and a more compact version of this that's shorter but involves more calculations. Now ultimately, if your professor has a preference or your course requires you to use a certain method, go ahead and stick to that. But otherwise, I always find that the easiest one to use is this equation over here.
So that's the one I'm going to use. Both will get you the right answer. What is this equation telling me? It looks really nasty at first, but, basically, I'm going to take the square roots of a massive set of numbers, and I've got one over n minus one. So that goes out in front. I actually know what n is because n is just equal to six. So, in other words, this just becomes one over six minus one. Then I've got a parenthesis over here, and I've got this giant sigma, which remember means I'm going to add up a bunch of stuff of x squared.
So essentially, that's just going to be a number. I'm going to take a bunch of x squared, whatever those are, and I'm going to add them up. I'm going to add them all up, and I'm going to get a number out of this. And I'm going to subtract another number that's sigma x squared divided by n. So in other words, there will be another number over here that gets squared, and then I'm going to divide by n again, which, in this case, is six.
Even though this equation looks somewhat intimidating, it's basically once you know what n is, there are really only two numbers that you have to figure out. Let's start with the easiest one, which is actually going to be the sigma x that's in parentheses because we've already used that. Sigma x is just the formula that we used or the symbol that we used in our mean calculation. It's where we added up all of the numbers. Essentially, it's this 48 over here. Once you take all your data values and add them all up, that's 48.
What we're going to do here is we're going to take this 48 and remember we're going to have to square it. That's one of the missing numbers that we have in the box. The only thing we have to do is just figure out what this other one is. Let's take a look at that. This 48 was sigma x, right, in parentheses squared. That's 48 squared. That's not the same thing as this sigma x squared. In this case, you're actually adding up all of the data values that are already squared. So in these problems, what I usually do is add another column. Again, I always add these columns like this for x squared values. And what you're going to do is just square each one of these things, and I'm just going to write them over here.
So I have five squared, which is 25, ten squared, which is a hundred, then I've got twelve squared, which is 144, fourteen squared, which is 196, then I've got three squared, which is nine, and then four squared, which is 16. Now are any of these my answers over here? Well, no. Because just as we did with the x column where we added everything up to get to the bottom, we're going to do the exact same thing over here. This number over here when you add everything up is going to be 490. And this represents sigma x squared, not parentheses sigma x squared as this 48 squared. So what goes in here is going to be 490.
So in other words, just to color code these things, the 490 actually comes from here and the 48 actually comes from this column over here. Now that we've gotten these numbers, we're done. All we have to do is just simplify. So this actually ends up being one over, one over five times this is 490 minus 48 squared, which turns out to be 2,304, divided by six. And if you work this out, by the way, what you're going to get is s equals 4.6.
So that is your sigma of 4.6. Cool. So that's the calculation of standard deviation. One last thing I want to talk about here, by the way, is how you would actually use this equation just in case you really have to know how to use it. Again, I mentioned that it's shorter and involves more calculations. But essentially what you're going to do here is you will take each one of the values, x, subtract it from the mean, and then square that. So what you'll do in these cases is you'll create two other columns over here where you'll take each one of these x's and subtract it from x bar, which, remember, is the eight, the mean that you calculated over here.
So in this case, what you would do is you'd take five minus eight, and you'll get a negative number, which is negative three. It's perfectly fine to have negatives in this column here. Once you do all of that for the rest of the values, you won't have to add those up, but the next thing you'll do is essentially square all of those things. So in this case, I got a negative three when I did five minus eight, a number minus the mean. And I'm just going to take negative three and square that, and that becomes nine. You just repeat this for the rest of the columns down here.
So I've got all these numbers that are already filled out. And, basically, if you add all of these things straight down, what you'll end up getting is 106. So another way you could have calculated this—and I'll write this down—is that s could have been the square root of 106 divided by five, but you have to do all these intermediate calculations to get there.
And by the way, you would still get the same answer of 4.6. So that's how to calculate the standard deviation using both of those equations. Again, you may have to use one versus the other. The last point I want to make here is that we used s for the standard deviation. But for populations, you may see other symbols. You may see little sigma, you may see mu, and then you may see big N instead of s, x bar, and little n. These are just formatting things depending on whether you're talking about a sample versus a population. But otherwise, the equations are mostly the same.
Unless it's explicitly stated, you should always just use the s equation. Alright, folks, that's it for this one. Let me know if you have any questions, and I'll see you in the next one. Let's get some practice.